Improving raw MT quality — user-created content case

May 27, 5:39 PM

UTC

Online

Save your spot

At the end of 2019, Agilent implemented an on-the-fly MT-based community portal with documentation and a forum for their service engineers to use. After six month in production it became clear that the quality of the raw MT output was not good enough — so it was time to make some changes. Join this session to find out what they did and what results they got

Transcription

Max Morkovkin 00:00
For the attendees let me introduce Natalia Kurysheva localization project manager Agilent. And so, Natalia had a great mission at Agilent because when the company implemented machine translation of the sometime it became obvious that raw machine translation output is not good enough. And you have to deal you have to do something with quality of it. Right. So, you will tell us about this challenging mission and what what you have achieved, that others have given the mic to you.

Natalia Kurysheva 00:43
Thanks Max for this introduction. Yes, I will try to, well not, I will share everything with it. And I hope you guys will find it useful. So, let me put my presentation on. Yeah, presentation. Let me know if you can see my slides well, oops. Okay. I suppose you can, right. All right. Okay, cool. So, um, thank you very much, everyone, to for coming here and attending my session, improving the raw empty quality user created content case. This is the first session and I'm delighted to open this wonderful event. Hope you find it interesting. So as Matt has already said, my name is Natalia CUDA, Shiva. I work as localization, Global Program Manager at Agilent Technologies, and a couple of words about Agilent. For those of you who don't know us, we are one of the biggest life science companies which manufactures and sells instruments and consumables for chemical analysis. And as a localization team, we mainly work with marketing and web content, technical literature and multimedia and will localize our content into nine standard languages. Before we get to the topic of the presentation of the session, I would like to put it all in context for you. And when I talk about Agilent, about this solution, I will be referring to agile and community portal, which is not our main website, but a portal for agile and service engineers, actually a place where they can collaborate, communicate with each other, and try to find some solutions for their issues they may find and also discuss our joint products and applications. Back in 2019, we integrated an on the fly empty plugin for this portal. It translates content of the portal into 37 languages. And we work we actually took as a standard a list of languages which public Google Translate engine used to translate and shortened it a little bit. To address our audience, we don't need too many of languages. On the portal, we have different reference materials and also a forum where our engineers can share their knowledge. And this form is actually the biggest part of the portal content. And yes, you understand it is a user created. So now we'll talk about about our journey. And we'll walk you through all the main milestones here. As I have said, We kicked off the project in 2019. And back then we started with only one machine translation engine, it was Google NMT. Last year, I joined Community Portal was migrated to a new platform and plugin was migrated as a part of this portal as well. Last December, we onboard it our second empty engine, Tencent. It covers our English to Chinese language pair translations. And yes, it actually took us one whole year to onboard more than one engine. And I will tell you why it happened a little later. That's a funny story and a little sad. Also, in December 2020, we started our so called Enhancement Program. Basically that will be the main topic of this presentation. So now let me talk a little bit about the plugin arc. adapter. It's it will be just a quick overview. So as you can, can see here. In the middle of this architecture, we have intento interprise, empty hub to connect us to multiple empty engines. And on the back end, we have Google NMT and Tencent at the moment. And now, let me tell you, what was the reason we started thinking about how we can improve raw MT. And many of you will say, Hey, this is all empty. What do you expect, right? And this is what I heard from many times when I was searching for information online or different events devoted to machine translation. But actually, we always had this idea that probably there are some things we can do. So right and T, actually, a obviously has no posted it in part. And there is no easy way to verify quality. For each request, the only thing which we can rely on is sample checks, which we do every quarter. Also, user feedback can be somewhat of help. But our users don't provide it regularly. And that's basically not their job to do that. So yeah, not much help there. And this is how we actually came up with this enhancement program. And I'm going to share with you our experience, and I just want to underline that it is actually how we did it. It is our Agilent experience. And probably it's not perfect, and probably something could have been done differently. And we actually started thinking about what we could have done differently. But this is what it is. And yeah, probably we can share some insights and maybe it can be useful for you. Before we start that, yes,

Max Morkovkin 07:09
I'm deeply sorry for interrupting you but we are getting some feedback about the sound quality. Can you please try to turn off the microphone and maybe use your laptop? Microphone maybe to do this with a better sound quality?

Natalia Kurysheva 07:25
Okay, can you hear me now?

Max Morkovkin 07:28
Yes, I can hear you now. And you can provide a feedback in chat and tell us if it's better this way.

Natalia Kurysheva 07:41
Yes, is it better?

Max Morkovkin 07:46
Seems like stones worse. Someone still is saying. Not today. Can you please tell something like maybe continue with this slide and we can hear

Natalia Kurysheva 07:57
it is fine. You're someone see but yeah. Just No, I actually. Okay, yeah. Seems like everybody's okay. Can you hear me now? He said better like this okay, let me try to do that. I'll do sittings. Okay, how about that? Is it better like this?

Max Morkovkin 08:34
For me sounds a bit better.

Natalia Kurysheva 08:36
A bit better. Like this flight was better.

Max Morkovkin 08:39
Okay, okay. Yeah, we can continue. I'm very sorry for this.

Natalia Kurysheva 08:45
I'm sorry for that. Guys. I checked my microphone before before the presentation, and it sounded okay. Okay, so

Max Morkovkin 08:55
you have great slides and very good speech. So

Natalia Kurysheva 09:01
before we start talking about engines, let me probably tell you how we call them. Okay. So, yeah, so first of all, we needed to identify how many language pairs are out there. And we came up with 37, as I mentioned, but how many of them really matter? And that was the question we asked ourselves. And after analyzing Google Analytics data, we got this list here, which you can see on the slide. And it was not really a surprise as we basically saw that community portal users mostly speak our nine standard languages into which we localize most of our content anyways. So next step, Last was to decide how many engines we want on the backend, right. And actually in terms of benchmark report says that to get the best quality across 48 language base, one means eight different engines and eight engines for us is a lot to onboard at once. So we adopted so called phasing in and started with one engine Google NMT. In the beginning of the project, as the best performing engine for bigger number of language prayers at that moment, that we added 10 cent for English to Chinese in December last year, and we onboard the third one in April for English to Japanese language pair. But we are just on board those engines without any checks or something. We first tested several engines on a simple piece of content and had a the translations reviewed by three different reviewers. These reviewers included linguists and technical subject matter experts. After that, we just chose the best performing engine for each language pair. And another thing which one should think of is how often to reevaluate your choice of allergens. theory says that you should do it every six months. But Agilent reality says differently. In Agilent, onboarding of one, Amgen can take up to three months. And this includes it checks and legal checks. And this is if nothing like extraordinary happens. Like for example, last year, in summer of 2020, American president signed a decree banning transactions with WeChat, the messaging app, which is owned by Tencent. And actually that was a Chinese company as Agilent is an American company. And it was not really clear if this decree will cover author Tencent products. All that also delayed the process of onboarding, Tencent empty for us, because we needed additional legal checks. So for agilon, the most realistic plan is to revisit our set of engines once a year, check recommendations of the subject matter experts, and actually also perform sample testing. Of course, if we speak about engines, we should mention engines training. And here several things should be considered. So first of all possibility of training at all. As far as I understand right now, most of the top performing engines nowadays are trainable. So this is not a big issue. Second thing to consider is corporate size. It is safe to start training from 12 to 15, and 1000 segments. But size really doesn't matter much here. And it is more about quality, right corpus quality. Your translation memories should be clean and consistent in order to improve quality of the output. Otherwise, your trained engine may perform even worse than the stock ones. So Agilent teams were being audited and cleaned up. Last year, it was a big project for us. And the goal of the project was both to get better leverage for our regular projects. And also to use this TMS for training. And now preliminary data shows that now our teams are in very good shape for training empty engines. And that's what we are going to do very soon. And now let's talk about another enhancement, which we had glossaries This is a new feature which more and more engines start to offer. So let's see what it is. So first of all, a glossary is a custom dictionary for Mt engine to use to consistently translate a customer's domain specific terminology here can you can find some examples of use cases and glossary. So for example, product names. Here, I have an example of agile and seahorse. This is the product name for Agilent. And you can imagine that if we translate seahorse directly how funny that will be in our context. So it's even more an example of not translatable term here. Agile and agile and seahorse should stay agile and seahorse then some ambiguous words. For example, the word injector can mean nozzle plunger or insertion device. And it's better to specify for your domain, which one you would like to choose another use case for glossaries or acronyms. And in Agilent, we have a lot of acronyms. So this is our case, for sure. So this is how we actually prepared our data. We first of all, downloaded the offline version of our community portal content and deed terms extraction. In we got the long list of 1000 terms. After that, we did the scoring, we checked frequency of the term in our content and frequency of the term in reference English corpus. And if frequency in reference English corpus was low, that means that there was a high probability of a mistake when translated by Mt. So we took terms with highest frequency in everyone code content and lowest frequency in reference English corpus, and got a list of approximately 500 terms to be added to our glossary. In addition to that, we also provided the list of product names and list of non translatable terms. And all of that actually created our our glossary even decided to translate glossaries into nine standard languages, as having been translated into all 37 is too expensive and really doesn't make any sense. So that is why we decided to stop on our only nine. But we encountered a problem with glossaries. So the situation is that glossaries work like a list of non translatable terms. Basically, it picks up a term from a glossary and inserts in the target context without changing the form. So this won't obviously work for languages with inflections right and form different forms. That's why for us the solution was to have most of these containing only acronyms for such languages, as German and Russian, and other languages to be monitored. In order to check with for example, plural forms were implemented correctly or didn't create any issues. Another issue which we found out on implementation stage was capitalization. capitalization of each word in the English titles may look a little weird in other languages. So here for example, you have Frank translation of Sal grace, and if you see, if we capitalize all the letters in it, all put letters in it, including prepositions and articles, it will clear that's why we decided not do that. And it actually happens when when you have situations like this, for example, you have an English menu, where you have all the words capitalized, or for example, you have capitalization in English article. So for us, we, in our trees, we agreed to capitalize on the first letter of the sentence and brand or product names. And here, I just wanted to show you some results, right, I just picked up three example terms from our glossary when you can see that it has some different situations here. So for example, if you look at the first term needle seat, you will see that in Korean it used to be translated right. And Agilent vial sampler was not completed because it's our product name. But all the other parts of the phrase was translated our subject matter experts. Actually, technical subject matter experts told us that for me, don't say it is better to use English word in Korean. And that's why we added it into our glossary. And now you can see how the phrase looks like for this acronym here MSD, we had an somewhat funny situation. So for example, for Brazilian Portuguese, before implementation of glossary, we had it used as an acronym. But, again, our technical subject matter experts prefer to have it in full form instead of using an acronym, and that's why we it added it to the glossary for Brazilian Portuguese with friends. We actually had a situation when it was used as an acronym. And it is supposed to be used as an acronym. But we anyway edited to French glossary probably here is one of the examples where we could maybe check all the terms from our glossary prior to implementation. But we thought that it will be a bit difficult to do due to the amount of terms. And that's why we had this situation. And here is the last example with istd acronym, which used to be in a nice translation. And here it is arranged for a full form. Similar to this example with MSD. Acronym corporate is important. So basically, a, here's what glossaries will will do for you glossaries will help ensure correct translations of terminology correct and consistent translations of the terminology. It will improve the readability because if it was terminologies translated correctly, then it will be more readable. glossaries will ensure a better customer experience and also brand and product names will be translated correctly for sure. And now, one last thing I would like to share with with you which can help you improve ra mt. It is pre translated UI. So UI pre translation is something we're working now at the moment at the moment, our team is working on extracting UI strings from our platform for pre translation and after it is translated, they are translated, they will be handled to over to intento. When whenever there is a request in future, to translate a page, UI strings, translations will be pulled out from the pre translated cache, and not sent to the engines for translation. Our UI strings will be translated into nine standard languages as well. So you can see here, UI translation, is in our roadmap for future, we plan to have them implemented in February this year. After that, we will also do endurance training in March. In April, we plan to have our search engine integrated to cover English to Japanese language pair. And also later in 2021, we plan to investigate new features such as caching automated tests, and whatever future offers us. So basically, this is it. Thank you very much for listening. Apologies for the problems with my sound. Hope the presentation was interesting for you. And yeah, I'm ready to answer the questions. If you have any.

Max Morkovkin 23:30
My perspective, it was really interesting. Thank you very much for great slides. And I think everybody liked the design. And it's also great that you've shared the future plans with with deadlines. So this is something that like technology companies try to avoid to tell the specific month when something will be achieved.

Natalia Kurysheva 23:55
Yeah, it's our plan. Actually, if you could see, we work with intento team very closely. And this is our like joint effort. We would like to actually come to a situation with when our plugin provides a really good translation without any post editing. So yeah,

Max Morkovkin 24:15
this is cool. Okay, great. Then, let's take a look on the questions that we have. Okay, so the question from I'm very sorry if I will pronounce it, mistakenly, Mariela Salia. The question is left in in smart, good community, by the way guys, join our market community and we have many hot topics discussed there. So what is the question? What were some specific negative effects of the bad MT quality did it result in money losses, measurable drop in user reviews or something like that?

Natalia Kurysheva 24:51
So, as I have said, this specific solution was implemented in the community portal and it is some How close to community right? Those are just engineers who work with our Agilent instruments, and they just provide service and support for those instruments. We do not foresee any big risks there because people who who work with these instruments, they're very, very qualified people, they have specific skills to provide their services. And the biggest risk would be that people won't understand each other in the forum. But I have never heard of any situations when their translation of this type of content lead to some breakdown of an instrument or something like that. So probably, we can speak about some relatively low risk content here.

Max Morkovkin 25:52
Okay, cool. There is another question from Johan de, when you talk about the training corpus. Do Did you mean using your corpus on top of the baseline generic engine of Google NMT? And or Tencent Mt?

Natalia Kurysheva 26:08
Yes, that's what I'm saying. We're going to use our translation memories. And we are trained engines and see what what the result will be. We have already started doing it for one language pair. And we check the results. We had some kind of an evaluation project, but we will continue working on that.

Max Morkovkin 26:34
Okay, good. Is that in the best interest for Chinese? Or are there also the bears that it covers? Well? With? Yes,

Natalia Kurysheva 26:45
yeah. Unfortunately, we didn't check it for other one, which bears we had a specific request to improve quality of English to Chinese translation, because China is our biggest market. And we really want to provide all the information in local language for them. And actually, in our regular projects, which we do with human translations, and like, everyday, Chinese the biggest volume, right we have, and that's why we tested only we tested several engines for English to Chinese language parents. So I'm sorry, I cannot answer that question for you.

Max Morkovkin 27:29
Sorry, of course. And another question from smart get community. Jana is asking, do you use tags in your translations?

Natalia Kurysheva 27:42
Are we talking about raw empty translations or in our teams?

Max Morkovkin 27:49
To double check it with IANA but

Natalia Kurysheva 27:52
yeah, for raw empty? As far as I understand, no, but I think that will be a more technical question, too. Yeah, to my colleagues from intento, probably. But for the regular projects in GMs, yes, we have tags. And it's like a normal thing in TM. But as far as I understand, that should be cleaned up before using for empty engine training. So basically, there is a difference between cleaning up a TM for regular use to improve leverage in your regular translations, right. And another thing is additional cleaning up of TMS before training the engine. And that is actually something the machine translation specialist would do. And yeah, in our company, we do not we do not have a person who specializes on training machine translation engines. That's why there will be outsourced and this part will be also there.

Max Morkovkin 28:54
Okay, good question from Marquette community. Maria Gonzalez is asking, are interested in it's get us back to the time when you were making the decision on using ro Mt. So what made you think using your MT would be a good idea in the first place? Seems like many non industry companies have these impression that empty can be a solution to all localization problems. does this idea come from?

Natalia Kurysheva 29:22
I don't know where this idea come from. But yeah, so for us. We didn't actually have other way out for for this exact case. So we do not use in our regular projects we do not use raw empty of course. So we only if we use machine translation we use it with posterity. We do not have many projects like this at the moment and this is another part of empty strategy development in our company for future. But in this exact case, we had a community portal It was big amount of content very useful content. Engineers using this every day, right searching for information and trying to find some answers to their questions. And we had no other way to have it translated, right. But we wanted to provide it in their local language. So we decided to try OMT. Before that, there was this public Google Plugin, right, which everybody was using. But basically, there was not really safe thing to use from data privacy perspective, and it was not welcomed in agilon by a joint legal department. That's why we decided to do this plugin. Yeah, for for us separately. And yeah, so royalty was chosen only because we have huge volumes, which needs to be needed to be translated. And we didn't have millions and millions and millions of dollars to do that. By a human translator.

Max Morkovkin 31:00
We have several more questions. So empty is a hot topic. Okay. I don't know the name. Oh, by bucho is asking, I guess it's all gone? Maybe? How did you extract the terms from the text corpus?

Natalia Kurysheva 31:19
So I think it's a simple term extract tool, which is usually used to extract terminology from any English or like source, source text. For any simple terminology project, nothing specific there.

Max Morkovkin 31:35
Probably, she's asking about some specific tool that you use and can recommend.

Natalia Kurysheva 31:42
Extract, I would say, that's the only one I use personally, this part was done also, by outsourced team, I can check with them. And we are pleased Olga, if you. If you need more information, contact me on LinkedIn. And we'll try to find information for you.

Max Morkovkin 32:01
And there are two questions about Arabic, like machine translation for Arabic language, with your experience with it and what engine you can read.

Natalia Kurysheva 32:11
So for now, we have it covered by Google NMT. So basically, right now the setup is like this, we have 10 cent for English to Chinese, we have our third engine coming in for English to Japanese, and all the other for now are covered by Google NMT. Arabic is not one of our popular languages. And we checked our Google Analytics data several times, like month after month, it never came up as Yeah, top 10 languages for us. So we just use raw in tea and I've never heard any feedback, neither positive nor negative. From from them. So yeah, this is the whole experience I have.

Max Morkovkin 32:57
Cool, thank you very much. And the question from Valentina Cosmo what TMS do you use for managing localization process, if any?

Natalia Kurysheva 33:07
For Yeah, we have a TMS in place, but we do not use it for this exact project, which I was talking about we use it for our regular daily operations. Yeah, we are using I'm not sure if I can say that called Words TMS. And we do not have a an empty model there. So all the time, when we need to do something with mt plus post editing, for example, we are using it outside of a TMS. Cool.

Max Morkovkin 33:41
And the question from Jay Villar, can you say how big a translation memory should be at least for using it for training agents and getting good results in machine translation?

Natalia Kurysheva 33:52
Yeah, so as I've mentioned in in one of my slides, it should be around 12 to 15,000 segments, source segments, it's not words, not character segments in your translation memory. This is the number which you are safe to start with. But I wasn't discussing that was was one of the MT specialists who really works with it every day. And he told me that basically, sometimes you can get very good result was 5000 segments corpus for training. On the other hand, you can have a very big corpus, but with low quality was inconsistencies inside and all kinds of problems there and then you will get a very poor result after training. So it will depend mostly on the quality of the corpus. But yeah, the then the amount of segments, which you should look at, is 12 to 15,000.

Max Morkovkin 34:52
Better. Thank you very much. We've answered 11 questions. We still have few of them, but we will continue answering them later. Thank you very much you have time okay good

Discover why 25% of the Fortune 500 choose Smartcat

Book a meeting

Speakers

Natalia Kurysheva

Localization Project Manager at Agilent