For decades, predictions have been made that computers will soon become proficient at translation, but the day of the proficiency has never come. While many of the challenges of machine translation (MT) can be overcome to improve quality, language is an intricate human system that cannot be split into pieces and then reassembled into another language with algorithms.
In “The past, present and future of machine translation,” Julie Errens provides a summary of how state of translation has advanced since 1947 when Warren Weaver sent a letter wondering whether translation could be solved using cryptographic techniques. A major point she makes is that while humans understand language as a series of interconnected thoughts and concepts, MT sees only words or groups of words (n-grams):
The human mind attaches layers of meaning to every sentence – the machine only recognizes strings of commands.
However, as Gestalt theory, a school of thought from visual culture, teaches: it is impossible to understand the whole by merely examining its parts. The statistical approach has issues.
Among the points she makes is that humans do not adhere to analytical meanings when writing (or speaking); instead, we bend the meanings of words and syntax when expressing ideas.
A recent development in MT that has gotten a great deal of press is the use of neural networks. As Wikipedia describes, artificial neural networks use a collection of connected nodes that mimic the human brain, and this is commonly used in systems with a weighting system to combine individual data points into a gestalt whole, such as to identify the content of an photograph. An article by Khari Johnson describes how such neural nets boost translation accuracy.
An article that discusses what “deep neural networks” are and details some of the practical problems behind MT is “The Shallowness of Google Translate” by Douglas Hofstadter. As he notes, humans interpret the world with language in a way that reflects their interactions and experience with the world. He comes up with some excellent sentences to illustrate this, such as:
In their house, everything comes in pairs. There’s his car and her car, his towels and her towels, and his library and hers.
A human reading this instantly understands that this is about a man and a woman who live together and have corresponding items, even though the concepts of “living together” and “corresponding” are not explicitly stated. It is the human knowledge of the world that allows us to interpret the intended meaning (and for Hofstadter to create the sentence in the first place). In essence, computers and their programming might be getting more sophisticated, but they still don’t understand language. They are simply slicing it up with better algorithms.
Nevertheless, in March Microsoft announced that they have a system capable of translating news articles as accurately as a human being.
“Hitting human parity in a machine translation task is a dream that all of us have had,” Huang said. “We just didn’t realize we’d be able to hit it so soon.”
And so the predictions and claims continue.