Earlier this year, I visited the Montreal office of Fluent.ai, a speech-recognition technology company, and played my first round of voice-controlled Tetris. The familiar geometric pieces could be manoeuvred using six vocal commands the game had been trained to recognize: move left, move right, move down, rotate, pause and restart. My feeble Tetris skills were no better than usual, but the system responded flawlessly. Next came a device smaller than my palm, which lit up when it was successfully triggered by a “wake” word, akin to Apple’s “Hey, Siri” or “Okay, Google.” For my demonstration, I tried “шоколад”—a Russian word, “shokolad,” meaning chocolate—to see if Fluent.ai’s technology could learn a random phrase in a new language. Sure enough, four repetitions later, the system recognized the trigger. And unlike the voice-recognition technology developed by tech giants, Fluent.ai’s system required little support to do so. “This works directly on the device with no internet access,” says Vikrant Tomar, the founder and chief technology officer of Fluent.ai. “Our system is using 50 kilobytes of RAM, so this is really small—it can live on a remote that runs on AAA batteries.”
Tomar’s interest in speech recognition is driven by what he perceives as big gaps in the traditional approach to the technology used by major players like Siri or Amazon’s Alexa. “Most people in the world are not able to use voice-user interfaces in their native language,” he says. “Forget about recognizing someone with a foreign accent.” And the systems are too big to run on small devices, meaning they require an internet connection. “I like to joke that we need to get the internet out of the Internet of Things. We can just have smart devices—why do they have to be online?”
Big tech has bet big on voice assistants. They represent the next significant shift in how we interact with technology—following the web in the ’90s and smartphones about a decade ago—according to a recent Harvard Business Review report. And the field has developed at breakneck speed: “Voice shopping” alone is predicted to hit US$40 billion by 2022, up from US$2 billion in 2017. Amazon, Google and Apple have invested billions of dollars in voice-recognition technology, seeing applications in everything from digital advertising to enhanced search functions.
But Fluent.ai is not playing in the same ballpark as Siri and Alexa. Leaving smartphones and speakers to the big players, Fluent.ai focuses on embedding its technology in small devices like lights, remote controls and appliances. Says Vishwa Gupta, a senior researcher at the Computer Research Institute of Montreal specializing in speech recognition: “They’re in a niche market, but I think they’ll be very successful.”
There’s not just a difference in focus—under the hood, Fluent.ai’s tech is a huge departure from the big market players.
Traditional voice assistants do their speech processing online: The device picks up on a wake word (“Hey, Siri”) and sends the request that follows (“What’s the weather like today?”) to the cloud, where it then gets converted to text. The system runs the resulting text through a natural-language-processing model, a series of algorithms that converts the text into a command the device can understand.
The process works, but it’s not without its shortcomings. For one thing, the required computing resources are much too big to store locally on a device. That means they need an internet connection to work, creating privacy, latency and connectivity issues. And because these systems rely on dictionaries to understand the text, they’re limited to languages the system knows—and more often than not, an accent or speech impediment will throw them off.
Meanwhile, Fluent.ai’s tech completely bypasses speech-to-text transcription. Its neural networks work on an “acoustic-only” basis: The system learns to recognize words purely from the patterns of sound that make them up and then connects those patterns directly to actions.
The training happens quickly; four or five repetitions are enough to teach the system a new word. And since the networks don’t differentiate between languages, they can learn words in any tongue or dialect, along with differences in pronunciation—you could even train them to speak Klingon or Dothraki.
In practice, this means an appliance manufacturer can ship the same voice-enabled laundry machine all over the world without having to change anything about the technology. Users would simply train the product to learn their language once they receive it. (Or, they might not—learning directly from the user is an optional feature that can be available, depending on what a manufacturer requests from Fluent.ai.)
The audio-only approach has its own limitations. It struggles with catalog searches—like seeking out a movie title—since these typically involve a textual database. And since it’s not designed to work online, this system is of little help for devices that pull up the weather or tell you the news. But that’s not what this tech is meant to do. Instead, think of it powering lights, appliances and wearable fitness trackers. “And since our tech decodes speech as you’re talking, it works much faster than assistants that send data to the cloud,” says Tomar, adding his company is the only one he knows of that takes this approach.
Why is acoustic-only voice recognition such a novelty? Tomar says it’s largely because of the field’s origins. “Historically, the first application people saw for speech recognition was in transcription, which obviously involves text. As a result, there’s a lot of tech baggage in that area,” he says. “There are other reasons, too—for instance, Google wants to send you ads, so there’s no real incentive for them to have voice recognition work offline and keep the tech localized on the device.”
Privacy is a major selling point for Fluent.ai. Cloud-connected voice assistants have been targeted by privacy watchdogs in recent years. Last year, under pressure from German regulators, Google temporarily suspended its practice of having contractors listen to Google Assistant recordings. Apple has agreed to permanently halt a similar procedure following a wave of criticism about Siri. (Having humans listen to voice recordings, according to both companies, was intended to improve their speech-recognition capabilities.) Meanwhile, a couple of high-profile hacks have hit the smart home market. In 2018, a couple in Wisconsin claimed their Google Nest system was breached by a hacker who talked to them through a camera—an incident Google says could have been prevented by two-factor verification. Privacy in voice-enabled devices is a hot-button issue for both regulators and consumers; a 2018 study by PwC found a lack of trust in voice assistants is a major hurdle for tech companies.
“It’s easy to understand why a consumer wouldn’t want a device that’s connected to the internet listening to what’s going on in their home, and the same is true for businesses,” says Probal Lala, CEO of Fluent.ai. “Imagine the case of discussing a sensitive merger in a boardroom. You’ll want that conversation to stay in the room.”
Lala, who became involved in Fluent.ai as chair of the investing group Maple Leaf Angels Corp.—one of the company’s early funders—sees this year as a crucial one for the firm. Fluent.ai started out as a project at Montreal’s TandemLaunch accelerator in 2015 and graduated to a full-fledged company in 2017. It has raised US$5.3 million in three seed funding rounds since then, including an ongoing round led by Desjardins, BDC Capital and Maple Leaf Angels. (The latter two have been investors from the get-go.)
“Now that our internal development has progressed as far as it has, our intent is really to make it to market,” says Lala. The team was named an innovation honouree at this year’s Consumer Electronics Show. The company also has a partnership with Ambiq Micro, which manufactures low-power semiconductors. The companies showcased the voice-based Tetris game together at CES and intend to develop energy-efficient voice tech. They have also teamed with DSP Concepts, a software firm that makes TalkTo software, which filters ambient noise and helps isolate voice commands. Lala says Fluent.ai, Ambiq and DSP are working on a contract with a remote control provider, although he can’t yet disclose which one.
There has been a noticeable surge in demand for the companies’ technology in recent months as consumers avoid, well, touching things. “Preventing the spread of coronavirus has caused a huge increase in demand for voice-enabled devices,” says Chin Beckmann, CEO of DSP Concepts.
This was originally set to be Fluent.ai’s “year of deployment”—it expected to make major brand announcements in the summer and be installed in a couple of million devices by the end of the year. “When coronavirus hit in mid-March, we pushed out our forecast by nine months,” says Lala. “That’s more or less held true. Things are moving along, but slowly, since what would normally involve in-person visits, like demos and client meetings, are taking place virtually. And our clients’ deployment schedules have generally been pushed out.” Lala remains focused on the small embedded devices market, and says he’s turned down money from potential investors who wanted to steer the company off its thesis and into the cloud.
Though they’ve worked remotely through the pandemic, the team of 25 normally does business out of a small office in downtown Montreal, surrounded by the city’s typical trappings: hip coffee shops, boutique eateries and, as of a few years ago, a groundswell of activity in artificial intelligence. The boom of research and business in this area is no accident—it’s fuelled by money from the public and private sectors, and rooted in research out of its three universities.
Google, Facebook, Huawei and Samsung have all opened AI-focused research hubs or labs in Montreal over the past five years. In that time, AI has become a true phenomenon in the city, with labs, accelerators and startups spreading like wildfire to form something of an ecosystem for the field. “I think we’ve reached a point where the technology has, to some degree, caught up with the theory, and so we’re seeing a preponderance of applied AI with real-world use cases coming to the fore alongside the research,” says Stephane Paquet, president and CEO of Montreal International, an economic promotion agency.
“When I started my PhD in 2010, there wasn’t that much going on in AI in the city. That has definitely changed, with so much activity going on in the field,” says Tomar. “Now there are tons of research projects, events and labs. The great thing is that everyone is open to collaboration.”
Yoshua Bengio, who’s been hailed as one of the founders of the field of deep learning and a “godfather of AI,” alongside Geoffrey Hinton and Yann LeCun, is an often-cited global superstar with local roots. His alma mater is McGill, he’s been a faculty member at the Université de Montréal since 1993, and he’s scientific director at Mila, a collaborative research institute for AI founded in 2017. His foundational work on deep learning, now an area of AI research that underpins up-and-coming tech like self-driving cars and image recognition, was arguably the seed that sprouted into the city’s AI ecosystem.
An influx of money from multiple levels of government and industry is supporting Bengio’s work, along with that of many others. In 2016, the Canada First Research Excellence Fund allocated $84 million and $93.5 million, respectively, to McGill and the Université de Montréal to pursue AI research. The federal and provincial governments funnelled a combined $140 million into AI research in 2017, and in 2018, supply chain supercluster Scale AI joined forces with the federal government to pour $290 million into the city.
Fluent.ai is still a small fish in that increasingly larger pool, but its aims are high. “Fundamentally, we want to be a global brand,” says Lala. It’s a reasonable goal for a company built on technology that speaks any language—no dictionary required.
Your time is valuable. Have the Top Business Headlines newsletter conveniently delivered to your inbox in the morning or evening. Sign up today.