Google seeks world of instant translations By Reuters Published: March 28, 2007, 4:57 AM PDT C/Net
In Google's vision of the future, people will be able to translate documents instantly into the world's main languages, with machine logic, not expert linguists, leading the way.
Google's approach, called statistical machine translation, differs from past efforts in that it forgoes language experts who program grammatical rules and dictionaries into computers.
Instead, they feed documents humans have already translated into two languages and then rely on computers to discern patterns for future translations.
While the quality is not perfect, it is an improvement on previous efforts at machine translation, said Franz Och, 35, a German who heads Google's translation effort at its Mountain View, Calif., headquarters, south of San Francisco.
"Some people that are in machine translations for a long time and then see our Arabic-English output, then they say, that's amazing, that's a breakthrough," said Och.
"And then other people who have never seen what machine translation was ... they read through the sentence and they say, the first mistake here in line five--it doesn't seem to work because there is a mistake there."
But for some tasks, a mostly correct translation may be good enough.
Speaking over lunch this week in a Google cafeteria famed for offering free, healthy food, Och showed a translation of an Arabic Web news site into easily digestible English.
Two Google workers speaking Russian at a nearby table said, however, that a translation of a news site from English into their native tongue was understandable but a bit awkward.
Feeding the machine Och, who speaks German, English and some Italian, feeds hundreds of millions of words from parallel texts such as Arabic and English into the computer, using United Nations and European Union documents as key sources.
Languages without considerable translated texts, such as some African languages, face greater obstacles.
"The more data we feed into the system, the better it gets," said Och, who moved to the United States from Germany in 2002.
The program applies statistical analysis, an approach he hopes will avoid diplomatic faux pas, such as when Russian leader Vladimir Putin's translator miffed then German Chancellor Gerhard Schroeder by calling him the German "Fuehrer." The word is verboten in that context because of its association with Adolf Hitler.
"I would hope that the language model would say, well, Fuehrer Gerhard Schroeder is ... very rare but Bundeskanzler Gerhard Schroeder is probably 100 times more frequent than Fuehrer and then it would make the right decision," Och said.
The center of Google's effort looks surprisingly modest. Och shares a Spartan office with two others on his team, with little clutter other than a shelf of linguistic books above his desk. That's because the muscle work is performed by machines.... More Comment in the Forums |