OpenAI is further improving its text-to-speech API with a new tool. The company recently conducted a small-scale preview of a new tool called Voice Engine. It is a voice cloning technology that just needs a 15 second audio sample to mimic any speaker. It generates “natural-sounding speech” with “emotive and realistic voices” as per the company claim.

The Voice Engine technology is under development from 2022 and is based on OpenAI’s pre-existing text-to-speech API. The company already uses one of the versions of this tool in its text-to-speech API and the Read Aloud feature. It powers preset voices in these features.

As per the company’s claim, the technology will be helpful in reading assistance, language translation, and for those who suffer from sudden or degenerative speech conditions.

However, this technology has its own potential risk. It can certainly be used by bad actors for fraud and scams and more such activities, which is already a problem. The company is also aware of these risks. In a blog post, the company wrote, “We recognize that generating speech that resembles people's voices has serious risks, which are especially top of mind in an election year”.

The company has stated that it is taking feedback from various partners, including those from the US and international government, media, entertainment, education, civil society, and others to minimize the risk involved in launching its product. All preview testers have agreed to OpenAI's usage policies, which prohibit the impersonation of an individual without their consent or proper legal right, as per the company.

In addition to this, the company has asked testers to disclose to their audience the voices are AI-generated. It has also implemented some safety measures such as watermarking “to trace the origin of any audio generated by Voice Engine” and is “proactive monitoring” its usage.

OpenAI hasn't said anything on when the product will roll out but as per the company, there will be a list of no-go voices to detect and prevent the creation of voices, which are similar to prominent figures.

