{"id":185568,"date":"2023-01-13T12:37:58","date_gmt":"2023-01-13T12:37:58","guid":{"rendered":"https:\/\/harchi90.com\/after-chatgpt-and-dall-e-meet-vall-e-the-text-to-speech-ai-that-can-mimic-anyones-voice\/"},"modified":"2023-01-13T12:37:58","modified_gmt":"2023-01-13T12:37:58","slug":"after-chatgpt-and-dall-e-meet-vall-e-the-text-to-speech-ai-that-can-mimic-anyones-voice","status":"publish","type":"post","link":"https:\/\/harchi90.com\/after-chatgpt-and-dall-e-meet-vall-e-the-text-to-speech-ai-that-can-mimic-anyones-voice\/","title":{"rendered":"After ChatGPT and DALL-E, meet VALL-E – the text-to-speech AI that can mimic anyone’s voice"},"content":{"rendered":"
Last year saw the emergence of artificial intelligence tools (AI) that can create images, artwork, or even video with a text prompt.<\/p>\n
There were also major steps forward in AI writing<\/strong>with OpenAI’s ChatGPT causing widespread excitement<\/strong> – and fear – about the future of writing.<\/p>\n Now, just a few days into 2023, another powerful use case for AI has stepped into the limelight – a text-to-voice tool that can impeccably mimic a person’s voice.<\/p>\n Developed by Microsoft, VALL-E can take a three-second recording of someone’s voice, and replicate that voice, turning written words into speech, with realistic intonation and emotion depending on the context of the text.<\/p>\n Trained with 60,000 hours worth of English speech recordings, it can deliver a speech in a “zero-shot situation,” which means without any prior examples or training in a specific context or situation.<\/p>\n Introducing VALL-E a paper published by Cornell University<\/strong>explained that the recording data of the developers more than 7,000 unique speakers.<\/p>\n The team say their Text To Speech system (TTS) used hundreds of times more data than the existing TTS systems, helping them to overcome the zero-shot issue.<\/p>\n The tool is not currently available for public use – but it does throw up questions about safety, given it could feasibly be used to generate any text coming from anybody’s voice.<\/p>\n Its creators have, however, provided a demo<\/strong>showcasing a number of three-second speaker prompts and a demonstration of the text-to-speech in action, with the voice correctly mimicked.<\/p>\n Alongside the speaker prompt and VALL-E’s output, you can compare the results with the “ground truth” – the actual speaker reading the prompt text – and the \u201cbaseline\u201d result from current TTS technology.<\/p>\n Microsoft has invested heavily in AI and is one of the backers of OpenAI, the company behind ChatGPT and DALL-E, a text-to-image or art tool.<\/p>\n The software giant invested $1 billion (\u20ac930 million) in OpenAI in 2019, and a report this week on semafor.com stated it was looking at investing another $10 billion (\u20ac9.3 billion) in the company.<\/p>\n<\/div>\n .<\/p>\n","protected":false},"excerpt":{"rendered":" Last year saw the emergence of artificial intelligence tools (AI) that can create images, artwork, or even video with a text prompt. There were also major steps forward in AI writingwith OpenAI’s ChatGPT causing widespread excitement – and fear – about the future of writing. Now, just a few days into 2023, another powerful use …<\/p>\nMicrosoft betting big on AI<\/h2>\n