Microsoft unveils Kosmos-1,a new AI model to race up with ChatGPT

Image Source : MICROSOFT Microsoft

As the war over artificial intelligence (AI) chatbots has been heating up in the past few months, Microsoft has unveiled Kosmos-1, a new AI model. The new model is capable to respond to visual cues or images, apart from text prompts or messages.

ALSO READ: International Women’s Day 2023: Best hearables and wearables to gift under Rs 3,000

The multimodal large language model (MLLM) can help the user with an array of new tasks, including visual question answering, image captioning and more.

Kosmos-1 could pave the way for the next stage beyond ChatGPT's text prompts.

ALSO READ: Snapchat+ subscribers will soon be able to freeze Streaks: Know-how?

Microsoft's AI researchers in a paper wrote: "A big convergence of language, multimodal perception, action, and world modelling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context and follow instructions.”

The paper further suggested that multimodal perception, or knowledge acquisition and ‘grounding’ in the real world, is needed to move beyond ChatGPT-like capabilities to artificial general intelligence (AGI), reports ZDNet.

ALSO READ: YouTube English Help forum disables new comments and posts: Know the reason

Why Microsoft discontinued the Authenticator app from Apple Watch?

Google loses over $100 billion market value after AI chatbot Bard provides a false response

Microsoft's ChatGPT-powered Bing now open for beta testing, starts sending invites for people

How to make big money by using ChatGPT?

Is Elon Musk against the Artificial Intelligence, as he calls it to be the biggest risks for future?

Microsoft Windows 11 can run on Apple's M1, and M2 Macs now: Know-how

OpenAI's maximum-profit is controlled by Microsoft: Elon Musk

Elon Musk criticizes Microsoft's for its control over ChatGPT parent company OpenAI

Microsoft Teams to get a new version soon: What to expect?

Lava launches Yuva 2 Pro at Rs 7,999: Availability, features and more

Microsoft Bing AI increases chat limits: Know-more

AI-written books from diverse genre flooded Amazon: All you need to know

'India gives hope for future, can solve big problems even when world is...', says Bill Gates

AI Chatbot features on Microsoft Bing, Edge and Skype for Smartphones

Bing, Edge now available on iOS and Android devices: Know how it works

Meta brings AI chatbot with own large language model for researchers

ChatGPT taking over human jobs, companies replacing human employees

Beware of the new threat for mobile gamers, lead by ChatGPT

Microsoft introduces Azure Operator Nexus to run their carrier-grade workloads

Windows 11 to get AI Bing search box: Know the benefits

The paper further reads, "More importantly, unlocking multimodal input greatly widens the applications of language models to more high-value areas, such as multimodal machine learning, document intelligence, and robotics."

The goal is to align perception with LLMs, so that the models are able to see and talk, mentioned IANS report.

Experimental results showed that the Kosmos-1 AI chatbot has achieved an impressive performance on language understanding, generation, and even when directly fed with document images.

It also showed good results in perception-language tasks, which include multimodal dialogue, visual question answering, image captioning and vision tasks, like image recognition with descriptions (specifying classification via text instructions).

The Microsoft team said, "We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.”