The quest for better understanding

Researchers around the world are getting closer to cracking the code of a universal translation device

The universal translator is, by far, the handiest tool in the fictional star traveller’s universe. On Star Trek, the device mysteriously taps into the brainwaves of even the most far-out aliens, allowing for instant communication. And in Douglas Adams’s Hitch Hiker’s Guide to the Galaxy series, one simply inserts a squishy little Babel fish into the ear, where it eats the sound waves of foreign languages and excretes a translation into its host’s brain. It’s not just that the universal translator is a convenient plot device for dealing with Martians. The ability for anyone on the planet to speak directly to anyone else, regardless of language, would be a revolutionary innovation for businesses, and for society as a whole. But if sticking a fish in your ear is a far-fetched way to go about it, even the idea that computers could reliably translate for us seemed destined to remain in the realm of fantasy for decades to come.

Over the last few years, though, a slew of technological advances has brought the universal translator tantalizingly closer to reality. Tourists are already starting to carry rudimentary handheld translators that can hear, recognize and spit out preprogrammed phrases. Some computer security firms now use software to instantly translate alerts into dozens of languages, saving precious time when a software virus is unleashed on the Web. And millions of Internet users already take for granted that they can get a decent translation of almost any foreign Web page within seconds. None of that will get you far in the scruffier reaches of the Alpha Quadrant, let alone the boardrooms of Shanghai. There are, however, dozens of research teams and vast sums of government money focused on making real-time, speech-to-speech translation technology a reality. It’s a tall order, and there are plenty of skeptics who say it can’t be done. But from the battlefields of Iraq to emergency dispatch centres in Hispanic areas of Florida, cutting edge translator devices are already being put to the test.

With more than 6,000 languages on the planet and a highly integrated global economy, it scarcely needs pointing out that the market for a universal translator could be huge. In the European Union, for instance, there are 20 official languages plus untold dozens of minor dialects. Each year, mountains of documents must be translated, while governments and companies employ armies of interpreters. The European Commission has found that translation costs top five billion euros per year. On a global basis, the language services industry already generates $14 billion in revenues annually, according to a recent report by Common Sense Advisory in Lowell, Mass., a consulting firm focused on the challenges businesses face when expanding overseas. The demand is there, it seems, but is the technology? Or a better question is, what are the lost opportunity costs? The cornerstone of any economy is trust, and that depends largely on the ability to make oneself understood, and to understand those around you. If the global economy is ever going to reach its full potential, language may be the single greatest barrier to overcome. As Sergei Nirenburg, a computer scientist at the University of Maryland who spearheaded the quest for machine translation in the 1980s, has said: “Building a system for understanding text is more complex than building an atomic bomb.” And perhaps almost as revolutionary.

A universal translator may sound like a single piece of technology, but it’s actually made up of three very distinct tools, each posing their own set of problems, say researchers. The device must capture a sequence of words from one speaker and digitize them to text. Secondly, it must translate that text into another language. And finally it must give voice to the translated text. All far easier said than done. It helps that speech recognition technology has improved dramatically. In some instances it’s more accurate than a human being transcribing by keyboard. Yet there’s still lots of room for improvement when dealing with strong accents and changes in inflection. Depending on how you pronounce the phrase “That’s funny,” you could be sarcastic, perplexed, or genuinely amused. Likewise, computers are far better at reading text aloud than they were a decade ago. Now researchers are working to bring warmth to those electronic voices, thus enabling them to convey humour.

But the biggest hurdles still lie, not surprisingly, with the translation process itself, says Robert Frederking, a researcher with the Center for Machine Translation at Carnegie Mellon University. For close to 70 years scientists have been trying to crack the code of automated translation. And for most of that time progress proceeded at an infuriatingly slow pace. Back in 1981, the futurist Arthur C. Clarke predicted that talking handheld books capable of doing translations would be available to the masses within a matter of years. We’re still waiting. Part of the problem was in the approach. In the past, researchers worked with teams of linguists to painstakingly program computers to understand the rules for grammar and syntax. More recently, though, researchers began to harness the power of computers to crunch vast stores of written text from different languages using “statistical machine translation.” In essence, a system is fed massive amounts of text that’s already been translated into two or more languages by humans. In the case of Google’s translation tool, for instance, translations of United Nations documents were used. The system then uses statistical analysis and is trained to identify clusters of words in the two separate languages that may correspond to each other. From that, it builds statistical models that are used to translate new text. Suddenly there was no need for cumbersome linguistic rules anymore. When you click on the “Translate this page” link beside some Google search results, this is what’s at work behind the scenes.

The problem is, even the most powerful computers can be fooled by the simple vagaries of human language. Some sentences can be perfectly grammatical yet meaningless, while others are just plain ambiguous, such as the phrase “Flying planes can be dangerous.” Is that a warning that it’s dangerous to fly in planes, or that you’d better watch out for flying planes overhead? And when it comes to live, instantaneous speech translation, the problems quickly compound. For instance, it’s far more difficult to derive context from a single spoken phrase than from a large block of text.

Another problem is that statistical machine translation models are developed using written words, and that’s not necessarily how people speak. For instance, in Iraqi Arabic there’s a big difference between what is spoken by people on the street, and how it’s written in newspapers, says Frederking. Or closer to home, a speech translator trained using standard English text probably wouldn’t work very well in a Newfoundland fishing village, or inner-city Baltimore. “At this point the translation stuff is still at the point where it kind of works, sort of,” he says.

Even as researchers work through many of the technical challenges facing speech-to-speech translation, they’ve already been able to put working devices into the hands of U.S. soldiers on the ground in Iraq. Last year, for instance, IBM donated hundreds of laptops and handheld translation machines to the U.S. military. The initiative got its start when the son of an IBM employee serving in Iraq lost both his legs during an explosion. As the father spoke with colleagues about what had happened to his son, word of the incident spread to company CEO Samuel Palmisano. He had already heard from many returning IBM employees who served in Iraq that there is a severe shortage of reliable interpreters in the country.

To be sure, IBM’s MASTOR system, as it’s known, can’t handle wide-ranging conversations about everything under the sun. The handheld devices, roughly three times the size of a BlackBerry, have a vocabulary of roughly 100,000 Arabic and 60,000 English words. And it’s not clear how regular Iraqis would respond to a computerized voice coming out of a gadget. But if it works, and can be expanded, the technology could help bridge the communication gap in a country where there’s deep distrust of the Americans. “For the past nine months it’s been in the field being tested,” says Salim Roukos, a researcher with the translation technologies division at IBM Research. “There’s anecdotal evidence that people are finding it very useful.”

But you don’t need a war zone to push the limits of translation technology. At Carnegie Mellon, Frederking has been working on speech-to-speech translation for 911 emergency centres where many of the people calling only speak Spanish. As it is now, when a Hispanic call comes in, and the dispatcher only speaks English, an interpreter from an outside company must be patched in. This can take more than a minute and relies heavily on the ability of interpreters, who are often untrained to deal with emergencies, to remain calm. To get around this, several police departments have been sending recordings of their Spanish 911 calls to Frederking for analysis. The translation system he’s working on, known as Ayudame (Spanish for “Help me”) would essentially take the place of the outside interpreter. “All of this has an error rate,” he admits. “The crucial thing is to make it work well enough that it’s better than the other approach.”

There are plenty of critics who say computer translation could never be a substitute for a human interpreter. At least, not well enough to accurately conduct an in-depth conversation. Denis Bousquet, vice-president of the Canadian Translators, Terminologists and Interpreters Council, says the technology would require artificial intelligence in order to do the job, and that is still generations away. “Machines are a bunch of zeros and ones and language is way more interpretive than that,” he says. “Emotion and colloquial expressions come into play and you can’t translate those with a strictly mechanical process.”

Even so, it’s becoming obvious that in certain specific situations, computerized translators can play a big role. The technology has the potential for success in Iraq and 911 dispatch centres because the range of conversation is fairly limited, researchers say. Which is why they are focusing their efforts on perfecting computerized translation in so-called limited domains, such as for tourism, at border crossings, in hospitals and for humanitarian relief efforts. “After the tsunami a few years ago, people speaking a lot of different languages were affected,” says Kristin Precoda, head of the speech technology lab at SRI International, a research institute based in Menlo Park, Calif., which has also been testing its own English-Arabic translator tool, called IraqComm, with U.S. soldiers in Iraq. “Many of those wouldn’t be languages that the Red Cross had a thousand interpreters around for.”

At this point a major challenge remains how to actually make a universal translator just that: universal. Only about a dozen or so languages are being worked on at the moment, and to add new languages to a translator requires hundreds of hours of recorded speech and word data. Some researchers are trying to find ways to build translation models using less raw data, so that new languages can be added to a translator’s portfolio faster.

What’s more, many believe the technology must be squeezed into a cellphone before it will catch on with consumers. Researchers are optimistic that that time is coming, as handheld devices become more powerful. In the meantime, companies are finding ways to improvise. In August, Jajah, a voice-over-Internet phone provider, launched an English/Mandarin translation service ahead of the Olympic Games. An English speaker, for instance, could dial a free number and say a sentence into the phone. After handing over the phone to someone Chinese, the automated Jajah translator would repeat the sentence back in their language. The service got mixed reviews, but was a sign that, steadily, the barriers of language are coming down. “I think in the next few years you’re going to get a handheld speech translator that really works for some limited domains between major languages like English and Chinese,” says Frederking. “But it’ll still be a while before you can have a general conversation with somebody.”

Until then, there’s always the Babel fish.

Looking for more?

Get the Best of Maclean's sent straight to your inbox. Sign up for news, commentary and analysis.