OpenAI, led by Sam Altman, has announced a significant upgrade to ChatGPT, introducing voice and image capabilities. This development allows the AI chatbot to now hear, see, and speak, creating a more interactive and intuitive interface. Altman himself expressed his enthusiasm for the new features, urging users to give the voice mode and vision a try.
According to the company, the rollout of these capabilities will be available to Plus and Enterprise users in the coming two weeks. Voice functionality will be accessible on both iOS and Android (through opt-in settings), while image capabilities will be accessible across all platforms.
The voice feature is powered by a cutting-edge text-to-speech model, capable of generating remarkably human-like audio from text input and a short sample of speech. The company collaborated with professional voice actors to craft each unique voice. Additionally, they utilised Whisper, their open-source speech recognition system, to transcribe spoken words into text.
Image understanding is made possible by the advanced GPT-3.5 and GPT-4 models. These models use their language comprehension abilities to interpret various types of images, including photos, screenshots, and documents containing both text and images.
The introduction of voice technology opens up a realm of creative and accessibility-focused possibilities. However, the company said it acknowledges the potential risks, such as the potential for impersonation or fraud. To mitigate these risks, the technology is being specifically applied to voice chat, with voices generated in collaboration with known voice actors.
Spotify is already using this technology for its Voice Translation feature pilot. This innovation enables podcasters to broaden the reach of their content by translating podcasts into additional languages, using the podcasters' own voices. The company mentions that they've implemented technical measures to restrict ChatGPT's ability to make direct statements about individuals, prioritising privacy and accuracy.
Inputs from IANS