Health-care AI: The potential and pitfalls of diagnosis by app
If health is a fundamental human right, health-care delivery must be improved globally to achieve universal access. However, the limited number of practitioners creates a barrier for all health-care systems.
Approaches to health-care delivery driven by artificial intelligence (AI) are poised to fill this gap. Whether in urban hospitals or in rural and remote homes, AI has the reach that health-care professionals cannot hope to achieve. People seeking health information can obtain it quickly and conveniently. For health care to be effective, patient safety must remain a priority.
The news is filled with examples of novel applications of AI. Riding the wave of recent interest in conversational agents, Google researchers have developed an experimental diagnostic AI, Articulate Medical Intelligence Explorer (AMIE). People seeking health information provide their symptoms through a text-chat interface and AMIE begins to ask questions and provide recommendations as a human clinician might. The researchers claim that, when compared against clinicians, AMIE outperformed clinicians in both diagnostic accuracy and performance.
AMIE dialogue. (Google)
The potential of large language models (LLMs) like AMIE are clear. By being trained on a large database of text, LLM can generate text, identify the underlying meaning, and respond in a human-like manner. Provided patients have access to the internet, health advice could be tailored to the patient, provided quickly and easily, and allowing for triage of cases that are best handled by human health-care professionals.
But these tools are still in the experimental stages and have limitations. AMIE researchers say further study is needed to “envision a future in which conversational, empathic and diagnostic AI systems might become safe, helpful and accessible.”
Precautions must be taken. Health-care delivery is a complicated task. Left unregulated — professionally or internationally — it presents challenges to quality of care, privacy and security.
Medical decision-making
Medical decision-making is among the most complicated and consequential of any activities. It might seem unlikely that an AI could work as effectively as a human clinician, however, decades of research suggest that algorithmic approaches to decision-making can be equal, or superior to, clinical intuition.
Pattern recognition represents the core of medical expertise. Like other forms of expertise, medical experts require extensive training to learn the diagnostic patterns, provide treatment recommendations and deliver care. Through effective instruction, learners narrow the focus of their attention to diagnostic features, while ignoring non-diagnostic features.
Yet, effective health-care delivery requires more than just the ability to recognize patterns. Health-care professionals must be capable of communicating this information to their patients. Beyond the difficulties of translating technical knowledge to patients with varying levels of health literacy, health information is often emotionally charged, leading to communication traps where doctors and patients withhold information. By developing a strong relationship with their patients, health-care professionals can bridge these gaps.
The conversational features of LLMs, like ChatGPT, have generated considerable public interest. While claims that ChatGPT has “broken the Turing Test” are overstated, their human-like responses make LLM more engaging than previous chatbots. Future LLMs like AMIE might prove to fill gaps in health-care delivery, however, they must be adopted with caution.
Promise of accurate, explainable AI in health-care
Effective health-care delivery requires more than just the ability to recognize patterns. Health-care professionals must be capable of communicating this information to their patients. (Shutterstock)
AMIE is not Google’s first health-care technology. In 2008, Google Flu Trends (GFT) was used to estimate the prevalence of influenza within a population by using aggregated search terms. They assumed that users’ search behaviour should be related to the prevalence of the flu, with the search trends of the past predicting future cases.
GFT’s early predictions were quite promising. Until they failed, with old data being identified as the source of bias. Later efforts to retrain the model with updated search trends again proved successful.
IBM’s Watson provides another cautionary tale. IBM invested considerable capital in developing Watson and implemented over 50 health-care projects. Watson’s potential failed to materialize, with the underlying technologies quietly being sold off. Not only did the system fail to engender trust, that distrust was well deserved as it produced “unsafe and incorrect” treatment recommendations.
AIs developed to diagnose, triage and predict the progression of COVID-19 provide the best example of the readiness of AIs in health care to handle public health challenges. Broad reviews of these efforts cast doubt on the outcomes. The validity and accuracy of the models and their predictions were generally lacking. This was largely attributed to the quality of data.
One of the lessons that can be gleaned from the use of AI during COVID is that there is no shortage of researchers and algorithms, however, there is a dire need for human quality control. This has led to calls for human-centred design.
This is also true of expert reviews of the technologies themselves. Like Google’s AMIE, many publications that assess these technologies are released as pre-prints before or during the peer review process. There can also be extensive lags between a pre-print and its eventual publication. Rather than quality, research has demonstrated that the number of mentions on social media is a greater predictor of a publication’s download rate.
Without ensuring the validity of the methods for training and implementation, health technologies might be adopted without any formal means of quality control.
Technology as folk medicine
The problem of AI in health-care is made clear when we acknowledge that many health ecosystems can exist in parallel. Medical pluralism is observed when two or more systems are available to health consumers. This typically takes the form of traditional medicine and a western biomedical approach.
As apps are direct-to-consumer health technologies, they represent a new folk medicine. Users adopt these technologies based on trust rather than understanding how they operate. In the absence of medical knowledge and technical understanding of an AIs operations, users are left to look for cues about a technology’s effectiveness. App store ratings and endorsements can replace the expert review of health-care professionals.
Users might prefer to use AI-enabled technologies rather than humans in cases where their health concerns are associated with stigma or chronic emotional distress. However, the accuracy of these systems might lag due to failures to update data.
The provision of user data also creates challenges. Much like 23andMe, if users disclose personal information, it might leave clues to others in their social networks.
If left unregulated, these technologies pose challenges for the quality of care. Professional and national regulations are required to ensure these technologies truly benefit the public.
This article is republished from The Conversation under a Creative Commons license.