We’ve got Apple’s Siri, Microsoft’s Cortana, Amazon’s Alexa and Google Assistant — so do we really need more synthesized voices to do our bidding?
“We’re just solving a different problem,” co-founder and chief technology officer Michael Petrochuk told GeekWire. “Alexa and Google Home are trying to solve the problem of clearly, slowly communicating — pronouncing everything the same way, in a monotone format so it could be understood by everyone.”
WellSaid, in contrast, is developing a stable of AI-powered voices customized for different context, and sounding so lifelike that you wouldn’t believe they’re robots. During a recent video demonstration for a roomful of AI aficionadoes, most folks guessed that the images were generated by an algorithm, but not the voices:
“Our voices sound different each time,” Petrochuk said. “They always interpret the sentence differently, and they can be used in a video, or an audiobook, without making you fall asleep. I don’t imagine the Alexa character being very … hot.”
The venture grew from the work that was being done by Petrochuk and WellSaid’s other founder and CEO, Matt Hocking, under the aegis of AI2’s startup incubator. Now the technology is ready for its public reveal, and the two AI researchers are raising seed funding and seeking partners.
“We’re looking to partner with people who are looking to sell content production with voice, and also the next generation of voice experiences,” Hocking said. “We’re actively looking for people to explore opportunities.”
The technology could be applied to a wide range of opportunities: For example, a video game known as Red Dead Resumption 2 required the services of 700 voice actors. Theoretically, WellSaid could offer a huge catalog of synthesized voices to do the same job with AI.
WellSaid’s software platform could also spice up audiobooks, offer customized voice assistants or give companies “branded voices” that become part of their enduring image. Veteran announcer Don Pardo may no longer be with us, but his synthesized voice could continue to introduce “Saturday Night Live” for decades to come.
For those who have lost their ability to speak due to accident or illness, WellSaid could provide a synthesized voice with a natural lilt rather than the robotic monotone that became the trademark of the late physicist Stephen Hawking.
Hocking compared the concept to the use of stock images, stock video and stock music in creative productions. Now there’ll be stock voices.
“Anything which is written can now be voiced,” Hocking said.
Petrochuk and Hocking are very aware of the potential pitfalls associated with super-realistic synthetic voices. Deep-fake videos — such as a viral clip in which former President Barack Obama appears to make crazy statements like “Ben Carson is in the sunken place” — already show how the line between reality and fakery can be blurred beyond recognition:
“That’s just not a direction that our company wants to head in,” Petrochuk said. “Our focus is on allowing creators to create with voice, and we’re focusing on building a product for the common good, per AI2’s mission. With that, we have to recognize some possible negative implications of this technology.”
Petrochuk said WellSaid won’t allow anyone to create a voice. “All we’re doing is, we’re opening up a library of curated voices, with the appropriate cautions to make sure those voices aren’t used in a negative light,” he said.
WellSaid’s voices are generated by recording text spoken by voice actors who have given their consent, and then putting it through an algorithm that captures the voice’s natural-sounding “fingerprint.” That voice can then be used to speak any text entered into WellSaid’s software program, with appropriate tweaks to convey emotional content.
Won’t WellSaid’s stable of synthesized voices put actors out of business?
“At the moment, we’re working on the core technology, but we definitely do see a business model where you can look at a voice actor and liken it to a photographer,” Hocking said. “A voice actor could potentially have a synthetic version of their voice which they may be able to license out for larger-volume, lower-quality projects — but then do work on the high-end movie or television commercial that truly needs to be acted.”
The flip side is that the software can literally give voice to the voiceless.
“The positives far outweigh the negatives,” Hocking said. “You look at CGI, you look at existing technology, and it’s inevitable that voice is going to be a part of that. The applications that we’re focused on, and the way they’ll empower people who have trouble speaking, or can’t speak, or need access to voice in order to produce something valuable, is what we’re focused on. … We’re focused on bringing this amazing technology to the people who need it most.”
Here are some additional comparative examples: