Multimodal AI is the next big thing in health care. Except the previous next big thing — regular ‘garden variety’ unimodal AI — hasn’t even fully arrived yet.
“In health care, AI is a promising technology that has been deployed in very small pockets here and there, but it has not broadly impacted health care in a meaningful way,” said Dr. Amol Verma, a physician, scientist, Assistant Professor in General Internal Medicine at St. Michael’s Hospital and the University of Toronto and the 2023 Temerty Professor of AI Research and Education in Medicine at the University of Toronto.
“The vast majority of clinicians don’t really use AI in their clinical practice.”
That fact doesn’t mean we’re not at the outset of a multimodal AI revolution in health care, with transformative changes including diagnostics, enabling remote care, and driving efficiencies in an industry not exactly known for them.
But before we get too far ahead of ourselves here — what exactly is multimodal AI?
Multimodal AI systems are capable of processing and synthesizing multiple forms of data in order to provide outputs that include decisions, recommendations, and predictions.
Just like people.
“We are multimodal,” said Elham Dolatabadi, a data scientist and Assistant Professor at York University focused on machine learning in health care. “We can smell, see, and hear. These are all different modalities. Our brains integrate them to come to one decision.”
So, where a unimodal AI system in health care might only be able to assess MRI scans, a multimodal AI system could process the information from that MRI scan, along with clinical notes, lab tests, genomic data and real-time patient health tracking information, amongst other inputs. In this way, a multimodal AI system would replicate the way actual doctors process patient health information — by considering and analyzing multiple sources of data.
What multimodal AI could mean for health care
Multimodal AI systems could bring almost unfathomable computational power to health care, especially when it comes to diagnostics and predictive capabilities. This could mean a much more personalized and precise approach to health care for individual patients. Think earlier, more accurate diagnoses and better outcomes — all at scale.
For example, a computer could process radiology images and incorporate that data into its predictions to make them more accurate. It could also incorporate actual conversations between a patient and physician and even assess clinical deterioration from the sound of a patient’s voice.
Multimodal AI could also unlock the potential of telemedicine. Right now, health care providers are limited to assessments based on their conversations and observations over video calls. But if a patient had sensory technology in their home capturing their personal data and feeding it into a multimodal AI system, that could change health care dynamics for millions of patients.
“It could dramatically improve access to care for people who are far away from clinical care or who, for whatever reason, have a hard time getting out of their house,” said Dr. Verma.
Wait times could also be improved. For example, patients regularly face wait times for ultrasound imaging. That’s partly because operating an ultrasound machine is what’s called ‘operator dependent.’ A clinician needs to be properly trained to use it. But AI could actually help patients capture their own images if it was embedded in a device they could use themselves.
“It could change who can deliver health care, not just who receives it,” said Dr. Verma.
Ethical considerations remain central to AI deployment in health care
For many, the widespread use of AI in health care still seems risky. What about privacy? What about the risks of entrusting critical patient care decisions to AI?
Shaina Raza is an Applied Machine Learning Scientist for Responsible AI at the Vector Institute, an independent, not-for-profit focused on AI research. Her work focuses on the ethics of responsible AI in public health.
“With critical decisions about the life or death of a person, the doctors should make those decisions,” said Raza. “It’s not appropriate for generative AI models to do that. But if AI is used to facilitate research or help the doctors’ decisions, that’s different. We can save hundreds or thousands of hours that way.”
Raza notes that ethical AI in health care is ultimately about humans creating the frameworks for the AI before it’s ever fully deployed. That’s how best to address issues like patient privacy and systemic biases.
“Patient privacy is very sensitive. We need to de-identify or mask patient data before feeding it into the AI models.” said Raza. “We can also clean the data for biases before we feed it into the models, with what we call prompt engineering, the instructions we provide to the models.”
Once these sticky issues are addressed, multimodal AI will likely have revolutionary, positive impacts on our health-care system. But since the health-care industry is typically slow to adopt new technologies, that revolution may have to wait a few years.
This might actually be a good thing, according to Dr. Verma.
“Technology is deployed much more widely in the general society than it is in medicine,” said Dr. Verma. “We’re talking about a profession that still frequently uses fax machines. I can do more sophisticated things on my mobile device than I can for applications that are medical-specific. That’s a good thing because the stakes in medicine are very high — literally life and death. We don’t want to be deploying unproven technology so rapidly in that context.”
So while multimodal AI isn’t yet ready for broad deployment, the industry is still preparing for what’s next.
“Our aim is to include as many modalities in the models as possible, including images, text, electronic medical records, wearables, signals like ECG and EEG, and genomics,” said Dolatabadi, who is currently conducting research in multimodal learning with generative AI.
“Then the models could be used for different applications. And once the models are built, hospitals or health-care organizations can fine tune them for their own applications and patient populations.”
How health-care organizations and providers can prepare for the age of multimodal AI
With multimodal AI a matter of when and not if, it’s incumbent upon everyone in the health-care industry to be prepared for the changes ahead.
Dr. Verma outlines four key key areas of focus over the next few years:
- Design the right systems to safely deploy AI technologies. “We need to be designing the right institutions, relationships and incentives. That means creating, effectively, centres of excellence with the infrastructure, data and skilled personnel that can monitor AI technologies. These centres would then connect to primary providers, who you can’t expect to assess whether an AI solution is working. That’s just not feasible or practical for them.”
- Create a plan for collecting critical data to avoid exacerbating system biases. “We know systems are prone to bias, so the second thing that must happen at a system level is creating a plan to collect data about patient race, language and gender, which we currently do not collect. If we don’t collect that information, we won’t know whether solutions are biased and will therefore perform poorly. We could end up exacerbating biases in our health-care system.”
- Building up AI expertise, skills and organizational capacity. “Organizations need to identify AI champions, either through recruitment or upskilling. They need people that have technical skills and skills related to the legality and ethics of AI. And they need people that have skills related to the change management aspects of AI implementation. There’s also the question of scale. Not every organization can do this. Big organizations should scale up and smaller organizations that can’t should partner with big organizations that can.”
- Get educated on AI technologies. “I think at the individual practitioner level, people basically just need to become more aware and educated through professional development. Some basic understanding of these technologies would be good.”
Despite all the complexity with AI, there are options for more people to live longer, healthier lives if multimodal AI is deployed across the health-care system. The fact that health-care organizations are still working to embed unimodal much less multimodal AI doesn’t change the trajectory. In the coming years, almost all of us will find our health care experiences enabled by AI in ways both obvious and hidden. It’s the inevitable next step.
