Wednesday, April 04, 2012

Searching for speech technology's holy grail

Telephone your credit-card company, health insurer or just about any big consumer-facing company, then speak into the receiver: for new accounts, say "new"; or billing, say "billing." Forget it. You shout and stumble through the phone maze and often land at the directory's start.

Recognize this experience? Somehow, voice recognition — despite some of technology's most awesome achievements (tablets! the remote control!) — remains an anathema. We still can't talk to computers like Captain Picard on Star Trek: The Next Generation.

Turns out building voice recognition to acknowledge a "yes" or "no" — let alone complex conversations — isn't so easy."There are 200 ways to say 'yes: Sure, okay, yeah, uh huh' " to name a few, says Alex Rudnicky, a speech recognition expert and researcher at Carnegie Mellon University in Pittsburgh, Penn. "Getting that right is surprisingly difficult."

The foundations of most voice technology used today date back to the '70s. But with the proliferation of smartphones and new voice technology, devices like Apple's iPhone 4S have upped the ante on voice recognition.

Growing, billion-dollar business
At the 2012 Consumer Electronics Show in Las Vegas, manufacturers including Samsung Electronics and Nuance Communications unveiled TVs with voice-activated functionality.

In fact, Nuance helps power the Siri "personal assistant" system on the Apple iPhone 4S. Nuance translates spoken words into text. Siri then analyzes the text, figures out what it means and translates the words to intended actions.

Voice recognition is a lucrative, yet decentralized market that spans everything from toys to healthcare. The voice-related enterprise space alone that includes automated customer service is a roughly $10-billion market, says Richard Mack, vice president of communications for Nuance. Clinical documentation is an estimated $15-billion market.

Mobile and consumer voice products including cars and electronics are estimated at $5 billion, adds Mack. That figure doesn't even include TVs or third-party developers, who are just gaining traction in the growing voice-recognition space.Next-generation speech recognition "is really going to be ubiquitous," says Seth Rosenblatt, a senior editor for CNET, which highlights tech trends and consumer-product reviews. But it won't happen overnight.

Great expectations
But soon after the new iPhone hit shelves in October 2011, some consumers experienced less success. Transcripts of dictated texts, for example, sometimes appeared jumbled. Consensus emerged that the voice technology, though ground-breaking, remained a work in progress.

Part of Siri's mixed reception stems from Forstall's demonstration."That great demo set expectations up pretty high," says author McFedries, also a technical writer. "A lot of people don't realize it's a beta product," he says. In a rare Apple move, the tech giant launched Siri as a developing beta product.

So why release Siri before it's ready? Apple needs consumers to road test the technology, and collect their data to fine tune its product. "The way voice recognition works, it needs a lot of data," McFedries says. "That's the only way to get this to be a really good product."

No comments: