Can a Crowdsourced AI Medical Diagnosis App Outperform Your Doctor?

The Human Dx platform aims to improve the accuracy of individual physicians

Shantanu Nundy recognized the symptoms of rheumatoid arthritis when his 31-year-old patient suffering from crippling hand pain checked into Mary’s Center in Washington, D.C. Instead of immediately starting treatment, though, Nundy decided first to double-check his diagnosis using a smartphone app that helps with difficult medical cases by soliciting advice from doctors worldwide. Within a day, Nundy’s hunch was confirmed. The app had used artificial intelligence (AI) to analyze and filter advice from several medical specialists into an overall ranking of the most likely diagnoses. Created by the Human Diagnosis Project (Human Dx)—an organization that Nundy directs—the app is one of the latest examples of growing interest in human–AI collaboration to improve health care.

Human Dx advocates the use of machine learning—a popular AI technique that automatically learns from classifying patterns in data—to crowdsource and build on the best medical knowledge from thousands of physicians across 70 countries. Physicians at several major medical research centers have shown early interest in the app. Human Dx on Thursday announced a new partnership with top medical profession organizations including the American Medical Association and the Association of American Medical Colleges to promote and scale up Human Dx’s system. The goal is to provide timely and affordable specialist advice to general practitioners serving millions of people worldwide, in particular so-called "safety net" hospitals and clinics throughout the U.S. that offer access to care regardless of a patient’s ability to pay.

“We need to find solutions that scale the capacity of existing doctors to serve more patients at the same or cheaper cost,” says Jay Komarneni, founder and chair of Human Dx. Roughly 30 million uninsured Americans rely on safety net facilities, which generally have limited or no access to medical specialists. Those patients often face the stark choice of either paying out of pocket for an expensive in-person consultation or waiting for months to be seen by the few specialists working at public hospitals, which receive government funding to help pay for patient care, Komarneni says. Meanwhile studies have shown that between 25 percent and 30 percent (pdf) of such expensive specialist visits could be conducted by online consultations between physicians while sparing patients the additional costs or long wait times.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Komarneni envisions “augmenting or extending physician capacity with AI” to close this “specialist gap.” Within five years Human Dx aims to become available to all 1,300 safety net community health centers and free clinics in the U.S. The same remote consultation services could also be made available to millions of people around the world who lack access to medical specialists, Komarneni says.

How It Works

When a physican needs help diagnosing or treating a patient they open the Human Dx smartphone app or visit the project’s Web page and type in their clinical question as well as their working diagnosis. The physician can also upload images and test results related to the case and add details such as any medication the patient takes regularly. The physician then requests help, either from specific colleagues or the network of doctors who have joined the Human Dx community. Over the next day or so Human Dx’s AI program aggregates all of the responses into a single report. It is the new digital equivalent of a “curbside consult” where a physician might ask a friend or colleague for quick input on a medical case without setting up a formal, expensive consultation, says Ateev Mehrotra, an associate professor of health care policy and medicine at Harvard Medical School and a physician at Beth Israel Deaconess Medical Center. “It makes intuitive sense that [crowdsourced advice] would be better advice,” he says, “but how much better is an open scientific question.” Still, he adds, “I think it’s also important to acknowledge that physician diagnostic errors are fairly common.” One of Mehrotra's Harvard colleagues has been studying how the AI-boosted Human Dx system performs in comparison with individual medical specialists, but has yet to publish the results.

Mehrotra's cautionary note comes from research that he and Nundy published last year in JAMA Internal Medicine. That study used the Human Dx service as a neutral platform to compare the diagnostic accuracy of human physicians with third-party “symptom checker” Web sites and apps used by patients for self-diagnosis. In this case, the humans handily outperformed the symptom checkers’ computer algorithms. But even physicians provided incorrect diagnoses about 15 percent of the time, which is comparable with past estimates of physician diagnostic error.

Human Dx could eventually help improve the medical education and training of human physicians, says Sanjay Desai, a physician and director of the Osler Medical Training Program at Johns Hopkins University. As a first step in checking the service's capabilities, he and his colleagues ran a study where the preliminary results showed the app could tell the difference between the diagnostic abilities of medical residents and fully trained physicians. Desai wants to see the service become a system that could track the clinical performance of individual physicians and provide targeted recommendations for improving specific skills. Such objective assessments could be an improvement over the current method of human physicians qualitatively judging their less experienced colleagues. The open question, Desai says, is whether the “algorithms can be created to provide finer insights into an [individual] doctor’s strengths and weaknesses in clinical reasoning.”

AI-Assisted Health Care

Human Dx is one of many AI systems being tested in health care. The IBM Watson Health unit is perhaps the most prominent, with the company for the past several years claiming that its AI is assisting major medical centers and hospitals in tasks such as genetically sequencing brain tumors and matching cancer patients to clinical trials. Studies have shown AI can help predict which patients will suffer from heart attacks or strokes in 10 years or even forecast which will die within five. Tech giants such as Google have joined start-ups in developing AI that can diagnose cancer from medical images. Still, AI in medicine is in its early days and its true value remains to be seen. Watson appears to have been a success at Memorial Sloan Kettering Cancer Center, yet it floundered at The University of Texas M. D. Anderson Cancer Center, although it is unclear whether the problems resulted from the technology or its implementation and management.

The Human Dx Project also faces questions in achieving widespread adoption, according to Mehrotra and Desai. One prominent challenge involves getting enough physicians to volunteer their time and free labor to meet the potential rise in demand for remote consultations. Another possible issue is how Human Dx's AI quality control will address users who consistently deliver wildly incorrect diagnoses. The service will also require a sizable user base of medical specialists to help solve those trickier cases where general physicians may be at a loss.

In any case, the Human Dx leaders and the physicians helping to validate the platform's usefulness seem to agree that AI alone will not take over medical care in the near future. Instead, Human Dx seeks to harness both machine learning and the crowdsourced wisdom of human physicians to make the most of limited medical resources, even as the demands for medical care continue to rise. “The complexity of practicing medicine in real life will require both humans and machines to solve problems,” Komarneni says, “as opposed to pure machine learning.”