Machine translation is an integral part of the translation and localization industry today as companies further try to scale, automate, and streamline translation output. But what is machine translation exactly and how does it work? How can we control translation quality and where are human translators involved?
How does machine translation work?
Machine translation, simply put, is the use of software to translate — either from text or speech — from one language to another. By utilizing algorithms, patterns, and language models taken from large databases of existing translations, it can either suggest a translation to language professionals or in some cases, automatically translate large quantities of texts without human involvement at all. For context, the software factors in the subject category (medical, legal, scientific, for example), online resources, and glossaries.
There are different types of machine translation with varying levels of sophistication, some continuously learning and improving suggestions over time. That being said, human linguists are still heavily needed to control quality and the localization for specific target audiences.
You may have also read about computer-aided translation, machine-aided human translation, and interactive translation. These are not the same as Machine Translation, each with its own unique characteristics and toolset.
Machine translation types
Humans have been tinkering with machine translation technology since as early as the 1940s, with each new technology improving the processes incrementally over time. And, in the past five years, emerging technologies, like AI and deep learning, have also been greatly integrated into its inner workings.
There are three types of machine translation; statistical machine translation (SMT), rule-based machine translation, and neural machine translation.
Rule-based machine translation (RbMT)
The first widely used machine translation software, which is still employed today, is a rule-based system – hence the name – that relies on a near infinite number of algorithms based on language grammar, syntax, and phraseology.
Statistical machine translation (SMT)
Statistical machine translation is a method that has been avidly developed over the past decade, though it was first conceptualized in 1949. SMT uses statistical language models with parameters that are based on language resources made up of large, structured sets of texts. Though it effectively uses human and data resources, it’s often known for its superficial fluency, like the typical non-fluent translations associated with Google Translate. It also doesn’t work well with language pairs whose syntax greatly differs. Linguists need to lend a heavy hand.
Neural machine translation (NMT)
The most relevant of all three is neural machine translation, which saw its debut in 2016. NMT uses artificial neural networks to predict the sequence of words and continuously improves translations by learning from resources, databases, glossaries, and the translation suggestions approved by translators. NMT software generally runs on the graphics units of CPUs to accommodate the huge processing power it needs to operate.
Many translation service companies use NMT as they have realized just how much it increases translation productivity and cuts costs, which is a key B2B selling point. Organizations that use it are Microsoft (including Skype, Bing, etc.), Systran, Reverso, and IBM.
Hybrid machine translation
Hybrid machine translations means that two of these mentioned types are used at the same time. Companies use this method as a fail-safe way of delivering accuracy and assuring control, instead of relying on one solution. Prompt, Systran, and Omniscien Technologies are some companies that use it.
Which machine translation type is better?
There are pros and cons for each kind of machine translation. RbMT is better in the way of consistency and predictable quality than SMT, while the latter presents much better fluency and is more apt at catching exceptions to rules. However, the most sought-after solution now is neural machine translation software.
Machine translation systems
There are three types of machine translation systems that can apply to any of the machine translation technologies:
Generic MT is the most basic of the MT systems that provide instant translations with little to no customization, such as Google Translate, Bing, Reverso, and Yandex.
Customizable MT uses the basis of Generic MT, but allows its users to tailor the terminology based on the context, category, style, target audience, etc.
Adaptive MT is the system most often used in CAT tools. It offers live translation suggestions to language professionals and learns from the choices that are made over time in order to improve what’s suggested. Adaptive MT works alongside translation memories and has proved to be one of the most helpful tools for translators, as it greatly speeds up work and output.
Machine translation technology, tools, and services
Machine translation has a broad availability, such as in the cloud, on platforms, on servers, or via software integration with the use of an API. For example, translation services from Google, Microsoft, and Amazon sell cloud API, while other developers like Systran and Prompt offer customizable MT either via server or desktop products. Professional translators, however, mainly use MT right in the CAT tools they need for their work, like Trados, MemoQ, and the like.
Users can also tap into independent and open-source machine translation options. They allow anyone with the technical know-how to build their own machine translation engine. To use any open source toolkit, you need to be equipped with a large collection of parallel texts in two languages.
Machine translation quality
Machine translation software, though being massively helpful in improving translator productivity as well as translating large volumes of texts, must also adhere to high standards of quality. This is why human language professionals are tasked with MT post-editing to make sure the result is a natural translation that fits the context, has a human conversational feel, and is accurately localized for target audiences.
Translation quality is also assured on a more technical side. Computational engineers are tasked with reviewing MT engines with A/B tests and experimentations on an ongoing basis. Some tests, like the BLEU auto-test (Bilingual Evaluation Understudy), ROUGE, NIST, and METEOR meticulously examine the similarity between machine and human translations of the same text.
Another concern next to quality is security. Seeing as a lot of the machine translation platforms are shared, the translations are not always kept confidential. Many companies combat this by setting up an on-site machine translation engine that runs inside the corporate network with no external access. Cloud solutions, on the other hand, use data encryption. That's why companies should avoid options open to the public, which are easy gateways for hackers.
Machine translation technology is an exciting interdisciplinary field that combines the latest in technology, linguistics, and localization. The ever-growing need for content localization will continue to push for technological advancements in MT at an accelerated pace. Language professionals on their side need to find effective ways to control the quality and human touch of machine translations.