News

Mount Sinai researchers reveal AI models often fail in medical ethics dilemmas, urging human oversight in clinical decision-making.
While one model, Google's Gemini 2.5 Pro, achieved a higher average score of 10.1 out of 42 points (~24 percent), the results otherwise showed a massive performance drop compared to AIME-level ...
According to a new study, AI models hold opposing views on topics like LGBTQ+ rights depending on how they're trained -- and who's training them.
overthinking it New Apple study challenges whether AI models truly “reason” through problems Puzzle-based experiments reveal limitations of simulated reasoning, but others dispute findings.
In a new study released last week, researchers at Stanford University asked 24 major AI models, from companies like OpenAI, Anthropic, and Google, what they thought of 30 current issues.
Published in the December 2024 issue of the medical journal The BMJ, the study examines five major large language models (LLMs): ChatGPT 4, GPT-4o, Claude 3.5 "Sonnet," and Gemini 1 and 1.5.
NEW YORK, July 22, 2025 (GLOBE NEWSWIRE) — ToltIQ, the leading AI-powered platform for private markets due diligence, today released findings from a comprehensive study evaluating the performance of ...
A new study showing that machine learning models are study-specific and difficult to generalise provides a "cautionary tale" about using AI in medicine, experts say. There is hope that artificial ...
The findings, which raise important questions about how and when to rely on large language models (LLMs), such as ChatGPT, in health care settings, were reported in the July 22 online issue of NPJ ...