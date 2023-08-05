Despite the progress in Artificial Intelligent (AI) systems, current state-of-the-art (SOTA) systems are mostly unimodal single task systems. This poses a challenge in developing medical AI systems as medical tasks are inherently multimodal, involving text, imaging, genomics, and more.

To address this challenge, a research team from Google Research and Google DeepMind introduces Med-PaLM Multimodal (Med-PaLM M), a large multimodal generative model. The team’s main contributions include the curation of MultiMedBench, a multimodal biomedical benchmark containing 14 diverse tasks for training and evaluating generalist biomedical AI systems.

Med-PaLM M is the first generalist biomedical AI system that can process clinical language, imaging, and genomics using a single set of model weights. It can perform medical image classification, medical question answering, radiology report generation, genomic variant calling, and more. The team observed evidence of zero-shot medical reasoning, generalization to novel medical concepts and tasks, and positive transfer across tasks.

The outputs of Med-PaLM M were evaluated by radiologists, demonstrating its potential in generating chest X-ray reports.

The team addresses the absence of comprehensive multimodal medical benchmarks by proposing MultiMedBench, a benchmark that covers various multimodal data sources. They leverage MultiMedBench to develop Med-PaLM M by fine-tuning and aligning the PaLM-E model to the biomedical domain.

Med-PaLM M performs near or exceeds the state-of-the-art baselines on all tasks in MultiMedBench while showcasing strong zero-shot generalization capabilities. This research represents a crucial step towards the development of a generalist biomedical AI system.