eloquent and convincing answers
They deliberately designed the questions to encourage the chatbot to provide misleading answers. This standar stress-testing technique in AI security research is known as red timing.
The studi also tested the free versions of each chatbot mode available in February 2025. Although paid versions and newer releases may perform better.
eloquent and convincing answers
In fact, most people use the free version. Most of the health questions they ask are not carefully worded. The conditions in this studi reflect how people actually use AI chatbots.
This article's findings are not isolated. They emerge amidst a growing bodi of evidence that suggests a consistent picture.
For example, a February 2026 studi in the journal Nature Medicine revealed quite surprising findings.
Chatbots are actually capable of providing correct medical answers almost 95% of the time. However, when used by real-world humans, the correct answer rate drops drastically to below 35%. This figur is no better than that of people who don't use chatbots at all.
Simply put, the question is not just whether the chatbot provides the right answer, but rather: "Can a lay pemakai understand and use the answer correctly?"
Confirmed by other studies
Additionally, a recent studi published in the journal JAMA Network Open tested 21 leading AI models. The researchers asked the models to generate a kisaran of possible medical diagnoses.
When these models were given only basic detils (such as the patient's age, gender, and symptoms) they struggled and failed to suggest the correct set of conditions in more than 80% of trials.
However, after researchers included physical examination findings and laboratory results, the akiracy rate jumped to above 90%.