Go to the Globe and Mail homepage

Jump to main navigationJump to main content

Shaping the Future

Speech-recognition AI is smarter, but still tongue-tied Add to ...

It's a scenario familiar to most: You're on the phone with a computerized customer service agent that can't understand a word you say, and continuously asks the wrong questions - all in a friendly, even-toned voice. Or you're in the car, using a wireless headset to order a pizza, but the voice recognition program instead dials your office.

More related to this story

If we can create phones that let us make restaurant reservations, catch up on the latest news and take high-definition videos all at the same time, surely we can find a way for computers to understand a simple voice request.

But as it turns out, sometimes the tasks that seem simplest are the most challenging.

For about 60 years, artificial intelligence (AI) experts have been chasing the dream of creating machines that have human-like smarts. So far, they've had many successes - from cars that can parallel-park themselves to computer applications that can spot bank fraud. But significant challenges remain, and some computer scientists wonder when, or if, machines will ever truly become intelligent.

The ability to make computers understand and process speech and vision are two of the biggest obstacles the AI field is facing.

While machines have been taught to perform logic-based activities, such as playing chess, it's much more challenging to program speech and other abstract characteristics into a computer-based system, says Geoffrey Hinton, a computer science professor at the University of Toronto, and one of the world's leading authorities on AI.

Part of the reason it's so challenging is the sheer amount of information needed to perform high-quality speech recognition or vision functions, Prof. Hinton says. While a child can look around a room and simply see everything in it, machines must be taught to recognize objects and differentiate between them. Equally, they have to learn to understand combinations of words, differences in the way people pronounce them and what people want when they speak a command into their phone or computer.

Computer scientists are making progress, however - particularly when it comes to speech recognition. More companies are introducing devices and applications that are better at understanding commands and performing functions when asked.

One company, the Cambridge, Mass.-based Vlingo Inc., has introduced a "cloud-based virtual assistant" that can understand commands, provide directions and answer questions, and allows users to dictate e-mail or text messages. One key aspect of Vlingo, an application that can be downloaded on iPhones and most BlackBerrys, is that it uses AI to learn and adapt to its users' habits. For instance, it might not understand a strange restaurant name the first time, but it learns upon the second instance.

"This is a major step forward in terms of the accuracy of the technology," says Chris Barnett, executive vice-president of markets at Vlingo. "People are expecting it to be less satisfying, like older products were. We get a lot of positive surprises from users."

Momentum is growing in the fields of speech recognition, and what's known as natural language processing, or the ability of machines to understand the context of a sentence, says Yoshua Bengio, a professor in the department of computer science and operations research, and Canada Research Chair in statistical learning algorithms, at the University of Montreal.

"I'm really confident there will be a lot of progress in that direction in the next few years," Prof. Bengio says.

He pointed to the development of IBM's Watson computer as evidence. Watson became famous earlier this year after beating human challengers on the quiz show Jeopardy!

Although the idea of a computer trumping a human is remarkable in itself, most people likely missed why Watson is seen as a significant milestone in AI: The computer could understand the questions it was asked, and responded accordingly.

"This is something only people thought they could understand," says Cory Butz, computer science professor at the University of Regina.

Although experts agree they are still a long way from developing machines that can rival humans, others say there is a need to think ahead and create safeguards to protect against potential problems.

What controls will keep the power of computers in check? Will machines eventually pervade every aspect of our lives? What are the implications for social media sites that store our photos and personal thoughts?

The U.S. military already uses unmanned aerial vehicles, or drones, in battle. But experts say the ultimate goal is creating robots that can replace soldiers. What mechanisms, if any, will ensure the drones are always under the control of a human and never left to operate on their own?

"The big question is how do we as a society react ... so that we possibly develop the good uses more and control to the [maximum]extent possible the bad uses - the abuses of privacy, the data losses," says Stan Matwin, computer science professor at the University of Ottawa. "It creates a potential for problems."

Follow on Twitter: @carlyweeks

In the know

Most popular video »

Highlights

More from The Globe and Mail

Most Popular Stories