Artificial intelligence has been used to discover drugs, predict floods, uncover fraud and cut greenhouse gas emissions.
A group of twentysomething Torontonians have put the technology to a different use. Their startup enables people to snap selfies and bring the still images to life. Within seconds, the photos transform into videos featuring exaggerated versions of themselves, with heads bobbing, eyes rolling, shoulders swaying and lips synching to snippets of pop songs.
Users of the app, called Wombo, can do the same with uploaded images of unsuspecting friends and relatives, celebrities, Yoda or even painted portraits. Imagine the Queen singing I Will Survive like a diva and you get the picture.
If it sounds silly, that’s the point. “We don’t want to fool anybody into thinking the video they’re looking at is real,” says Wombo Studios Inc. founder Ben-Zion Benkhin, a 25-year-old University of Toronto dropout with a chill, confident demeanour.
Wombo in action: Here is a compendium of videos made using the Wombo lip synching app, which transforms still pictures into videos using artificial intelligence. Featured are images of Globe journalist Sean Silcoff, Yoda, The Queen and E.T
The Globe and Mail
But Wombo is also intended to go extremely viral – and it has. According to the company, the app has been downloaded 49 million times in its first three months and used to generate 640 million shareable clips. (The Google Play store says Wombo has been downloaded more than 10 million times; Apple doesn’t provide similar data).
Wombo has also generated hundreds of thousands of dollars so far from in-app advertising and premium access to an extended library of songs. About two million people use it daily, making Wombo one of the most rapidly proliferating consumer apps to come out of Canada.
The app’s rapid initial success has also attracted investors. Wombo began as an idea last August when Mr. Benkhin was smoking a joint on his apartment roof, and it has raised US$6-million, valuing the 10-person startup at US$40-million.
Investors include Global Founders Capital (GFC), actor Ashton Kutcher’s Sound Ventures, the chief executives of Product Hunt and Machine Zone, Launch House and Germany’s 468 Capital. “I spent a weekend making Wombos; every single person I sent them to died of laugher – and then they sent it to 10 people who also died of laughter,” said Alexander McIsaac, a Toronto-based partner with GFC. Four days after meeting Wombo’s principals in March, he offered to invest.
Leading the deal is Shervin Pishevar, an early Uber and Airbnb backer who calls himself “one of the greatest venture capitalists of the last decade” and who stepped down from his firm Sherpa Capital in late 2017 after allegations of sexual misconduct. (He has denied the accusations and maintains he was the subject of smear campaigns).
Mr. Pishevar, who had been looking to invest in the “synthetic media” space for two years, was so taken by Wombo that he hosted the team at his lavish Miami home and flew them to Los Angeles to meet Hollywood luminaries who also invested.
“I believe Wombo has the potential to become … that synthetic media social networking company that brings happiness and joy to people and does things around music and dance and other things that drive a lot of other activity,” Mr. Pishevar says.
But is Wombo poised to become the next TikTok or just a passing fad that entertained people during the COVID-19 pandemic?
Either way, the technology that makes the fake lip-synching possible is likely to become an increasingly regular feature in our lives. Wombo is one of several simple-to-use apps, such as Reface, Avatarify and FaceApp, that put AI in the hands of the masses to distort images, animate the dead and graft user faces into movies.
Several observers have raised alarms about the potential to create truth-distorting “deepfake” videos – such as realistic-looking but faked Tom Cruise videos that appeared online in recent months – that become tools of harassment or propaganda.
“What concerns me and is also fascinating is that when we normalize the application of something like deep fakes in a way that’s fun and fast … it just becomes more normalized and part of life,” says Vass Bednar, executive director of McMaster University’s master of public policy in digital society program.
The founders of Wombo are playing it strictly for laughs – at least to start. “The application for [Wombo] is endless,” Mr. Benkhin says. “I think you can expect us to expand to just about any kind of media. In the next few years we’ll see a lot more of it, pretty much everywhere. Your YouTube videos and Netflix shows will be edited and generated using these techniques. You’ll be able to watch, instead of five famous actors, five people in your household. That will be powered by synthetic media … and our company will enable the technology and the infrastructure that makes that possible.”
The technology that enables this type of deep fakery is based on a brand of AI known as generative adversarial networks, which was pioneered by Apple director of machine learning Ian Goodfellow when he was a PhD student at the University of Montreal a decade ago.
First, Wombo films an actor lip-synching to a popular song, complete with head, shoulder and eye movements. That is called the “driving video;” Wombo has filmed dozens of them, matched to songs available through the app.
Once a user uploads a photo, Wombo’s AI, operating on cloud servers, generates a fake video by mapping the facial features in the still to the face of the person in the training video, as if one was attached to the other by hundreds of virtual marionette strings. As the song progresses, the still photo comes to life as the AI manipulates the still photo’s facial features in lockstep with the driving video to create the synthetic image that is downloaded to the user’s smartphone seconds later.
The technology is not rocket science – “a very good high-school or undergrad student in a computer science program could probably replicate this in three to four months” York University computer science professor Marcus Brubaker says. But scaling it up for mass usage is tricky.
That’s where Mr. Benkhin, who split his early years between his native Israel and Canada, found himself last fall after cycling through a couple of iterations of his founding team.
Mr. Benkhin, who had led a social group at U of T for AI students in the late 2010s, had also been watching with interest the evolution of deepfake technology. During his August epiphany, he decided to create a deepfake app that would be simple, funny and fast for people to use, and create harmless videos.
When he showed his friend Paul Pavel, a management consultant, what Wombo could do, Mr. Pavel decided to quit and join him. But the pair needed specialists to build out the technology so it could be used by millions of people simultaneously, so they recruited a handful of students they met through AI social circles.
One of them, a third-year student named Angad Arneja who came to Toronto to study commerce, glowingly calls Mr. Benkhin “the Steve Jobs of memes.” Mr. Arneja says his parents back in India “don’t really understand” what he’s doing, including his plans to drop out of school and give up his full-ride scholarship to become Wombo’s head of people.
In January, Mr. Benkhin and Mr. Pavel joined Launch House, a Beverly Hills incubator for entertainment-oriented startups operated out of a mansion previously occupied by Paris Hilton. Their trip was funded by Mr. Pavel’s working-class parents, who invested $20,000 for a stake now worth hundreds of thousands of dollars.
Launch House CEO Jacob Peters was impressed during the pair’s one-month stay and convinced them their “fun little toy” could become a “wedge into something much, much bigger” – one that could democratize the creativity process for people too shy to make TikTok or YouTube videos. “Timing is everything and we’re living in the age of memes,” Mr. Peters says. “Wombo enables everyday people to create the next viral thing.”
The pair couldn’t interest venture capitalists until the app launched on Feb 28. After one week, it had 500,000 users, then three million after two weeks. “After the second week the investor frenzy was in full effect,” Mr. Pavel said.
Wombo was close to closing a financing in March when Mr. Pishevar heard about the company, contacted the founders and convinced them to let him lead a bigger financing at a much higher valuation.
Mr. Benkhin says while he was raising money, “I was aware of the allegations against [Mr. Pishever]. I spoke with a number of people who are close to me who I trusted and respected about him. Based on what I heard, I felt and feel that I could trust him. As an investor, he’s been extremely valuable for our team.”
With the company looking to build on its gimmicky start, Mr. Pishevar insists the operating credo is “first, Do No Harm,” Mr. Benkhin points out that the ability for bad behaviour on Wombo is checked because users are limited to making videos based on a tightly-controlled selection of songs and driving videos. He says the company deletes all facial data soon after it’s used and retains no personal information.
Mr. Pishevar says Wombo must “continuously be diligent about making sure the product is something that does entertain and makes people laugh, but does not lead to being a platform for bullying or abuse or … deepfake technology. That’s not what this company is about. I would not have invested if it was.”
High-minded values, however, also initially guided other social-media platforms that later became tools used to undermine democracy, worsen the level of civil discourse and fuel mental-health strain among users. “While this is a fun, light kind of viral-memey-type application, it’s technology that will play into a lot of different applications that are much more serious,” Mr. Brubaker says.
Your time is valuable. Have the Top Business Headlines newsletter conveniently delivered to your inbox in the morning or evening. Sign up today.