The Life-Threatening Consequences of Overhyping AI

Artificial intelligence will profoundly change the health care industry. But there are many more questions around how AI can best serve our public health needs.
Bronchiole asthma light micrograph
Steve Gschmeissner/Science Source

On February 11, The New York Times published a story with the headline “AI Shows Promise Assisting Physicians.” While the article focused on a scientific paper showing how an artificial intelligence system could help doctors diagnose certain conditions, it missed a key part of the AI story: Accuracy does not equal impact.

As the Times wrote, the AI software “was more than 90 percent accurate at diagnosing asthma; the accuracy of physicians in the study ranged from 80 to 94 percent. In diagnosing gastrointestinal disease, the system was 87 percent accurate, compared with the physicians’ accuracy of 82 to 90 percent.” The Times essentially sourced numbers from the first and third rows of a key table in the Nature article it was reporting on. Why not the row in the middle? The one that dealt with potentially life-threatening encephalitis? There, we see the AI was just 83.7 percent accurate, while the physician accuracies were all above 95 percent. In other words, the human doctors beat the AI system when it came to correctly diagnosing a more serious illness. The reporter doesn’t reference this point in the analysis, but I feel it’s vital to include this detail for consideration. [Editor’s note: Cade Metz, the Times reporter, is a former WIRED staff writer.]

The Nature article also points out that the scientists tested the AI against five sets of physicians with different levels of experience. It does not claim the AI performed better than experienced doctors, and in fact says, "Our model achieved an average F1 score [accuracy measure] higher than the two junior physician groups but lower than the three senior physician groups. The result suggests that this AI model may potentially assist junior physicians in diagnoses but may not necessarily outperform experienced physicians."

This is still impressive, right? That the AI did better than some groups of junior physicians? All doctors strive for accuracy, but “in health care, disease diagnoses are often subjective,” says Sreekanth Chalasani, an associate professor at the Salk Institute for Biological Studies. “For example, it is more important to diagnose life-threatening conditions correctly, even if that leads to less severe conditions being missed.”

Misdiagnosis can have far-reaching negative impacts, which is why doctors also focus on the Hippocratic oath. They might be extra careful with diagnosing a potentially life-threatening condition and prescribe additional tests out of an abundance of caution. At the same time, they may decide to not prescribe potentially unnecessary medication to a patient exhibiting mild asthma unless the condition continues to exacerbate. No doctor obeying the Hippocratic oath would ever put a patient’s life at risk to increase their accuracy score, and we as a society should never impose such a metric on doctors, or turn life-or-death decisions over to a machine incapable of understanding the value of a human life.

I asked Gregory La Blanc, a distinguished teaching fellow at at UC Berkeley’s Haas School of Business, what he thought of the article. He wrote, “I am a huge optimist about the application of AI in medicine, but we need to look beyond accuracy measures as presented in this article. The most accurate Ebola test ever invented is the one that always says no. The nature of the error matters.”

Also, the AI may be cheating. AI is famous for “cheating the test,” like how it beats certain games by exploiting bugs. In the case of the research published in Nature, the AI and the human doctors were reviewing the same write-up of symptoms by a group of trained physicians. Now, humans have quirks. Physicians may write much shorter notes when they know a patient doesn't have a disease but write more extensively when they suspect a disease. They may use shorter phrases or make more typos if they are really paying attention to the patient. Sometimes they update their notes if the patient had a disease but not if the patient was healthy. The AI, especially a deep-learning AI, would be really good at picking up such clues from the diagnosis notes, and that would give it a competitive advantage.

Finally, the AI may simply be competing in a different game. The accuracy of the AI and the doctors was measured against whether their diagnosis matched the original diagnosis by the doctor that conducted the examination. The AI would play the right “game” here, trying to guess what the original doctor diagnosed. If the human doctor fell into the trap of trying to make the correct diagnosis, instead of just trying to guess what the original physician diagnosed, they would have played the wrong game and lost.

Without question, AI will profoundly change the world. But overhyping the technology without first considering the implications of its behavior increases the risk that AI will be deployed in harmful ways. The next time you’re in your doctor’s office, think about the decisions he or she makes on a daily basis about the intricacies of providing the right care to maximize not just health but quality of life. When those decisions impact your life, or the lives of your loved ones, do you want your doctor focused on beating the accuracy of an AI model? Or carefully weighing the best options for each individual?

When it comes to evaluating AI, we should recognize that model accuracy is not the only measurement to consider. We have more questions to ask—and answer—about how AI can best operate both accurately and ethically so it properly augments medical professionals without adversely impacting overall public health.

WIRED Opinion publishes pieces written by outside contributors and represents a wide range of viewpoints. Read more opinions here. Submit an op-ed at opinion@wired.com


More Great WIRED Stories