Researchers from the University of Sheffield and the Alan Turing Institute have developed a new "blueprint" for building AI. This framework highlights how the technology can learn from diverse kinds of data—moving beyond just vision and language—to significantly enhance its real-world deployability.
Published in the journal Nature Machine Intelligence, the framework serves as a roadmap for developing multimodal AI: systems designed to learn by integrating various data types, such as text, images, sound, and sensor readings. While AI typically learns from only one type of information, these advanced multimodal systems combine different data sources to form a more complete and accurate picture of the world.
Addressing a data limitation
A recent study discovered that while multimodal systems—those that use multiple types of information—have many benefits, most current research mainly focuses on just using vision and language data. The researchers believe this limits the technology's ability to address more complicated issues that need a wider range of information.
For instance, if self-driving cars could combine information from cameras, sensors, and their surroundings, they could navigate tricky situations more safely. Similarly, if AI tools could integrate various types of medical information, such as clinical records and genetic data, they might become much better at diagnosing illnesses and helping with drug development.
The need for this new framework is underscored by the finding that 88.9 per cent of papers on the open repository arXiv featuring AI that draws on exactly two different types of data in 2024 involved vision or language. The blueprint is intended for use by both developers in industry and researchers in academia.
The vision for broader AI
Professor Haiping Lu, who led the study from the University of Sheffield’s School of Computer Science and Centre for Machine Intelligence, emphasised the necessity of this shift: “AI has made great progress in vision and language, but the real world is far richer and more complex. To address global challenges like pandemics, sustainable energy, and climate change, we need multimodal AI that integrates broader types of data and expertise”.
He added, “The study provides a deployment blueprint for AI that works beyond the lab — focusing on safety, reliability, and real-world usefulness”.
The research, which brought together 48 contributors from 22 institutions worldwide, illustrates this new approach through three detailed use cases: pandemic response, self-driving car design, and climate change adaptation.
Foundation for Future Collaboration
The work originated through a collaboration supported by the Alan Turing Institute, specifically its Meta-learning for Multimodal Data Interest Group led by Professor Lu. This collaborative foundation also helped inspire the UK Open Multimodal AI Network (UKOMAIN), an £1.8 million EPSRC-funded network now led by Professor Lu, dedicated to advancing deployment-centric multimodal AI across the UK.
Dr. Louisa van Zeeland, Research Lead at the Alan Turing Institute, noted the impact on environmental work: “By integrating and modeling large, diverse sets of data through multimodal AI, our work is setting a new standard for environmental forecasting. This sophisticated approach enables us to generate predictions over various spatial and temporal scales, driving real-world results in areas from Arctic conservation to agricultural resilience”.
ALSO READ: IT hiring in India achieves stability in first half of FY26 as campus recruitment rises