OpenAI has introduced a new AI tool that can closely copy human voices. This tool could be useful for services that help people with disabilities, but it also raises worries about spreading false information and misuse.
On Friday, OpenAI presented examples from initial trials of a tool named Voice Engine. This tool takes a 15-second clip of a person’s speech and creates a realistic copy of their voice. After that, users can input a paragraph of text, and the tool will speak it using the AI-created voice.
Many services that create voices using AI are already out there, but OpenAI has shown a special skill in getting lots of people to use AI tools, like it did with the popular ChatGPT. OpenAI mentions that a tool that turns text into speech with AI’s help could be useful for translating languages, helping kids read, or supporting those who can’t speak.
However, there are concerns that it might also help spread false information or make scams more common.
OpenAI states that Voice Engine is presently in use by a “limited number of reliable partners,” such as firms in the education and healthcare tech sectors. The outcomes of these tests will help decide if and how the tool might be offered more broadly.
The current users have committed to not copying anyone’s voice without clear permission and to making sure listeners know they are hearing something produced by AI, as per the company.
“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” OpenAI said in a blog post.
The company recognized that significant adjustments are necessary as audio created by AI starts to spread more, even though it has no immediate plans to make Voice Engine public. For instance, the company proposed stopping the use of voice for confirming identity in banking services.
“Any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures,” OpenAI said.
Voice Engine can take a voice sample in one language and make a similar voice that can speak in many different languages.
In its blog, there’s an example where you hear a person talking about friendship, and then you hear what seems like the same person speaking the same words in Spanish, Mandarin, German, French, and Japanese, but it’s made by the AI. In all these AI-made clips, the original speaker’s way of speaking and accent are kept the same.
The sneak peek of Voice Engine arrives as people look forward to the public launch of Sora, OpenAI’s AI video creation tool previewed last month.
Sora can make realistic 60-second videos based on written instructions, capable of including several characters, specific movements, and detailed backgrounds. Additionally, OpenAI’s ChatGPT can create pictures from written prompts.
In a separate announcement on Monday, OpenAI revealed that ChatGPT is now accessible to everyone without the requirement of registration to use the service.
We're sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. https://t.co/yLsfGaVtrZ
— OpenAI (@OpenAI) March 29, 2024
However, those not signed up won’t have the ability to keep or look back at their chat history, nor will they have access to certain functions like voice chats and personalised commands.
Check out the Result here:
This is the clip of a real human voice:
This is the resulting AI-generated voice clip:
What do we think?
I think OpenAI’s new voice tool is really cool but a bit scary. It can make a voice sound just like someone else’s. This could help people who can’t speak but might also cause trouble, like spreading fake news.
OpenAI is careful, only letting some trusted groups use it now. They know it’s important to keep things safe, especially when voices can sound so real. It’s exciting but needs to be used wisely.