Thursday, February 28, 2013

Multilingual dictionary keeps humans in the loop

Jim Giles, consultant

rexfeatures_1845939a.jpg

(Image: Image Broker/Rex Features)

There's an old dream shared by artificial intelligence researchers: an algorithm that can create perfect translations of any text, in any language, at the press of a button.

And then there is today's automated translation technology. Services like Google Translate can provide the gist of a passage of text. But if you're a newspaper publisher seeking foreign readers or a public health expert wanting to educate speakers of another language, human translation remains the only option.

Martin Benjamin wants to change that, and the anthropologist-turned-lexicographer is betting on a radical means of doing so. Machine translation is largely a statistical business: computers learn to translate by searching for correlations in texts that have been translated by humans. Benjamin thinks it's time to put humans back in the loop.

The latest iteration of his attempt to do so launched this week. It's called Kamusi and it's a multilingual dictionary that could, give or take a few million dollars in funding, contain all the world's languages. Unlike other online dictionaries, Kamusi is built around concepts as well as words. So the word "spring", for instance, is linked to several concepts, including the season that comes before summer and a sudden upwards or forward motion.

This structure could solve one of the biggest challenges for machine translation. Asked to translate "spring in her step" into French, for example, Google chooses printemps?- the season - for "spring". Similar examples abound. The inability of computers to deal with homonyms - words that are spelled the same but have different meanings - is one reason why machine translations are often so garbled.

Kamusi avoids this problem by recognising that "spring" is associated with multiple concepts and prompting the user to say which is relevant. The demonstration version contains 100 words from 15 languages, including English, Swahili and Japanese.

There is a reason why Google opted for the algorithmic approach, however: once it's up and running, it's cheap and fast. Benjamin needs bilingual speakers to add words to his dictionary and, by comparison, humans are slow and expensive.

Kamusi has come this far by relying on volunteers and a grant from the US National Endowment for the Humanities. Benjamin thinks that speakers of minority languages will be motivated to add more terms for free, since they will gain the ability to translate their language into those that are already represented in the dictionary. Benjamin is also betting on some top-down support: companies that do business in Africa, a continent that is poorly served by existing dictionaries, for instance, might be motivated to pay for large numbers of local words to be added to Kamusi.

Either way, it won't be easy to build up Kamusi. Totting up wages and other expenses, Benjamin estimates it will cost around $5 to add each new concept. Representing 10,000 concepts in 100 languages would require $5 million.

Source: http://feeds.newscientist.com/c/749/f/10897/s/28fb0464/l/0L0Snewscientist0N0Cblogs0Conepercent0C20A130C0A20Cmultilingual0Edictionary0Ekamusi0Bhtml0Dcmpid0FRSS0QNSNS0Q20A120EGLOBAL0Qonline0Enews/story01.htm

Emanuel Steward college board nyc.gov SAT Notre Dame Football Schedule detroit tigers Tsunami

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.