Why AI Is An Essential Partner For Audio

The internet is buzzing about AI-generated images: evidence abounds on social media of the abilities of platforms like Midjourney, Dall-E, and Stable Diffusion. There’s certainly something to be said for the “see it to believe it” approach to understanding the power of artificial intelligence. And once you believe it, it should come as no surprise that AI is just as powerful in the audio space. While there are many use cases for AI-generated audio, there are three in particular worth noting: AI-generated soundtracks, voice cloning for narration, and real-time translation.

Make the Sound of Music

At his much-talked-about Hollywood Bowl performances in LA, lauded film composer John Williams famously plays a six-minute scene from Indiana Jones and the Last Crusade without music and then again underscored with his now-legendary composition. This lesson on the importance of a film’s score is one that no one who has been to his Hollywood Bowl performance will easily forget. However, knowing the importance of a film’s score and being able to commission or compose a powerful one are two very different things.

This is where AI can help. AI algorithms can compose original music or generate soundtracks that complement the mood and tone of a video. Case in point, Gareth Edwards, director of Rogue One: A Star Wars Story, tried composing a soundtrack for his upcoming movie about artificial intelligence, The Creator, by asking AI to compose something in the style of Hans Zimmer and got “pretty damn good” results. The AI generated a track that was maybe a “7 out of 10,” Edwards said, “but in the back of my head I was like, ‘But the reason you go to Hans Zimmer is for 10 out of 10.” He ultimately had the actual Zimmer compose the soundtrack. That being said, for creators just starting out, with no music experience or with little to no budget, a 7 out of 10 is better than a 5 out of 10 and certainly better than no soundtrack at all.

When Photoshop came out, Edwards said, the public discussion was about how the software “was sacrilegious.” “We got over that eventually. Now Photoshop has created so many opportunities for so many people doing art … I wouldn’t want to go back,” he said. Tim Simmons of Theoretically Media couldn’t agree more. He’s created a YouTube tutorial on how to compose epic film soundtracks. “You’ll be arranging and making your own music here,” Simmons says in the video, “and you can do so without actually having any musical knowledge or theory.”

Speak the Same Language

While the notion of an AI voice can feel anything but human, the truth is, it can actually make humans feel more connected to other humans. Distance is created when people don’t speak the same language and AI can provide real-time translation of spoken words—in videos, text, and real world situations—that allows people around the world to understand content—and each other—in their preferred language.

While this capability feels novel and futuristic, the truth is, the technology is already here. It was in 2020 that Alibaba broadcast the world’s first e-commerce livestream with real-time translation for multiple languages. Skype Translator has been using AI-generated translation to translate voice and text messages in real-time during video calls. In the medical space, the MyCareLink Heart mobile app uses AI-generated translation to provide patients with personalized care instructions in their preferred language, one that their doctors and nurses might not speak. And earlier this year, Meta “set the bar for AI translator models,” according to Mashable.com.

Hear the “Authentic” Voice

While the abilities of AI to clone the voices of actual individuals may be more frequently discussed as a security concern, like any powerful technological tool, whether it’s “good” or “bad” depends on how it’s used. AI voice cloning can have many advantages when used ethically and responsibly. Automated narration can be a huge help in videos, audiobooks, and other multimedia content.

Since AI can replicate a person’s voice based on a small audio sample, it can be a real time-saver for in-demand narrators. For example, a company might want their corporate videos to be narrated in the CEO’s voice, but the CEO doesn’t have the time available to spend in the recording studio. With AI, the CEO can approve the script—which takes substantially less time than a visit to the recording studio—and effectively “narrate” the video without having to put in the time to do so.

There’s a similarly valuable use case in the Hollywood system. The dubbing process for a movie takes place long after the film has been shot. As a result, the star may be off shooting a different movie and not have the time to record the audio track. If the movie studio is able to clone their voice with their permission and compensate them fairly for its use, it can be a real win-win situation: the actor gets compensated without having to do any additional work and the studio is able to complete their film faster, saving them money.

Amplifying Human “Voice”

While there is a persistent concern around AI “replacing” humans, none of the aforementioned use cases render an individual unnecessary. Rather, they build upon what a human individual has to offer. A person who has a vision for a movie but no musical abilities can now create their own original score with no budget. A person who wants to lend their actual voice to a project, but doesn’t have the time available, can now effectively be two places in once. And a person who would like to connect with someone who doesn’t speak their language can now express themselves in the other person’s language. Far from rendering human’s unnecessary, in these use cases, AI allows each person to do more.

Read the full article here