I remember the first time I got involved in a software localization project. I was assigned to translate the user interface of some industrial computer system using a complicated software localization tool. I had to problem with technical stuff but translating a UI was new to me and the task turned out to be a real challenge.
In fact, the style of the texts wasn’t that strange to me. Being somewhat IT-savvy, I was well aware of the “computer” language that is typically used in software (Under the influence of chatbots and voice interfaces my view has since changed dramatically though). The really puzzling part was these weird-looking pieces: {0}, %d, %setup% and $stop$. I had no idea what they stand for and whether the words trapped between the symbols should be translated or not, so in no time I made lots of mistakes. The newline characters, represented as \n, were the final blow to my self-confidence. I haven’t seen those ever before, and the worst thing about them was that they would often be merged with other words. As if the situation wasn’t ridiculous enough, the text was in German, which made it really hard to tell if \nein was supposed to be “nein” or “\n” + “ein”. The project manager must’ve regretted his decision to assign that task to a dummy like me after being bombarded with questions. Might’ve even gone on a Twitter rant.
Finding out the placeholders in the translated files for an urgent project have magically disappeared. #PMproblems #StartPanicking
— Lau Velázquez (@geekylau)October 9, 2013
Übersetzer, welche den Platzhalter $DepartureTime$ mit $HeureDépart$ übersetzen sind einfach nur konsequent.
— Olivier Oswald (@ooswald)July 27, 2011
The translators who translate the placeholder $Departure Time$ as $HeureDépart$ are only consistent.
That moment when a translator decides to change form of quotation marks and half of your variables in-game disappear.
— Vojtěch Schubert (@falagor)January 12, 2016
Years later, the memories of my first experience bring a smile to my face. Needless to say, I always take time with translators to explain them how variables work. And an ultimate way to avoid mistakes and save time for yourself and your team is to use regular expressions. They can be used to turn variables into uneditable and easily trackable objects. In different software, these objects are referred to by different names. We at Smartcat like to call them placeholders. And if you’re wondering what regular expressions are, they have nothing to do with small talk. They are patterns used to match and manipulate character combinations. Learning about regular expressions doesn’t require programming skills. Click through this no-brainer presentation made by Thomas Vackier, localization expert at Yamagata Europe, to get a hang of what regex (short for “regular expression”).
The Mysterious Symbols
For a quick example, let’s take a few lines from a song called I’m Gonna Be (500 miles) by The Proclaimers:
But I would walk 500 miles And I would walk 500 more Just to be the man who walked a thousand miles To fall down at your door
Say, we want these lyrics to change whenever the listener travels to a different location. We’re gonna need variables for that:
But I would walk %$1s miles And I would walk %$2s more Just to be the man who walked a {spelled-number} miles To fall down at your {location}
What dangers lurk here? Important characters can be deleted and/or misused. The digits in %$1s must follow a sequential pattern and “spelled-number“ and “location” are not to be translated. Unfortunately, not everyone knows that.
“But why not just hire an expert translator instead of having to deal with all this newbie mess?”, one might ask. Whatever the actual reasons could be, it’s worth to remember: no one is born a localization pro, yet anyone who has struggled through their first steps and gained the invaluable experience in the process can become one.
Noob-Proof, or How to Save the World
So what can we do to stop the disasters from happening and save us editing time? Let’s describe the variables using regular expressions:
For variables with consecutive numbers: %\$\ds
For variables enclosed in curly braces: \{.+?\}
I personally find this cheat sheet very useful for composing regular expressions and regex101.com is a great online tool I recommend for testing. Now let’s upload the file containing variables into Smartcat. That’s what we’ll see:
The variables are displayed as purple units that are obviously not translatable and can be safely transferred from the source text into the target language via a keyboard shortcut. These elements will remain intact in the translated document after they’re downloaded. The use of placeholders is not confined to localization projects. Imagine you have a large air pump spec sheet with thousands of items in it. Order codes are assigned for each item and they all look much the same, so it’s easy to goof'em up.
Make one mistake here and the consequences may turn out to be very unfortunate. There may be errors when dispatching orders from the warehouse, proper installation can be problematic too. The customer will suffer losses or worse: incorrect data can cause technical issues which, in turn, can result in environmental catastrophe. Placeholders are not be translated, they should just be safely transferred into the target language, so why waste time on them? Let’s find all the order codes using this elegant regex [A-Z]{2}\d{4}\-\d{4} and magically turn them into uneditable elements:
What a great way to make our translation process (and the editing part of it too) easier and faster!
At the moment, placeholders are only supported by default for the common localization file formats. If you want to use them in any other document types, let us know.
I reached out to my colleagues asking them about their experience with placeholders.
Fyodor Bezrukov, Executive Director at Logrus IT (Kyiv office)
“Using placeholders and tags in document formats is a common work scenario for us. They're life-savers for when clients send Excel files containing HTML or XML markup. Thanks to the support of placeholders and regular expressions in Smartcat, it's became much easier and more convenient to handle such resources.”
Marina Ilyinykh, Localization Manager at Bookmate
“We use variables and formatting in our apps' UIs a lot. Replacing tags and variables with placeholders allows us to run automatic consistency checks and secure important data, such as prices, subscription dates and links.”
Yannis Evangelou, CEO at lexiQA, shared his mixed feelings:
Yannis Evangelou, CEO lexiQA
“There’s a common issue with translators translating the text within placeholders, tokens and tags. Escape characters are also often ignored. Most of them are not aware that these are not translatable lexical units. Project managers don't know that either. Yet, this type of negligence can cause critical errors. A translator once told me he had to deal with some text that included HTML formatting and he went on to actually translate the line <p style="border: 1px solid red;"></p> into <π στυλ="περίγραμμα: 1 πίξελ συμπαγές κόκκινο;"></π>. He argued that maybe the developer who will use it doesn’t speak English. The PM didn't fix it because she thought the translator's reasoning was... reasonable!”
Then, Rolf Klischewski, game localization expert, chimed in:
«Game translators often have to deal with stuff like this:
And, of course, there are all sorts of problems connected to it.
Here, the problem is that some destinations require an article. So, it's “the Bahamas”, but “Barbados”. Or in German it's “die Schweiz” for “Switzerland”. In cases like these, we often reject the project. Normally I’d advise clients to change their text engine because I don't want to deliver crap. Devs will usually ask: “Is that really that big a problem?” And, I mean, no, because you can play the game. So, how to handle such issues? Tell the client, explain the problem. If they can't or won't fix the engine and you need the money, do the job, but make sure you are not credited.»