Translation of an article from a leading localization system developer at Badoo.
Further on his behalf:
We work with several large projects: Badoo, Bumble, Lumen and Chappy.
Now in the localization system we have 150,000 phrases and texts translated into 52 languages. Moreover, each of our applications has its own audience, its markets, its own style of communication with users, versions for the web and for mobile platforms.
In this article I will tell you how we built the localization process, how we approach quality control, how we will release translations depending on the platform, and most importantly, how we made sure that developers speak well of our translation system. This is a very important point: more than 300 developers work on projects, the work of which should be comfortable. Developers are not translators and should not think about translations.
The article was written based on my report at the Highload ++ conference in November.
Table of Content:
- The technical task for the whole head!
- Features of the translation process
- We ask users to help
- Development Organization
- Localization quality control
- Releases, Versions
- The main thing. Summary
To begin with, let’s see how the localization process in our company generally looks.
In this diagram, I did not reflect all the nuances – they are not needed for a general understanding. The bottom line is that we start with the terms of reference (TOR). Next comes client and server development, and in parallel with them is the translation process.
TK and the final release phase are not in vain highlighted in one color. This is a hint that the release should comply with the ToR. No other way. If the TK is not complete enough, then it will not be clear to the developers who is responsible for what, which of them should integrate the text: the “mobile phone” should “sew” it into the mobile application or the server developers should send it from the server in response to the request.
Let’s deal with all this. But first, I want to introduce and explain one term.
Lexeme - any indivisible text to be translated.. It can be a button caption, a headline, or an entire paragraph.
Now we are ready to move on to the main material!
The first stage of our process is the preparation of the correct technical specifications. The main element related to localization in it is the token table. In fact, this is a list of texts that should be used in the application or on the site.
The token table indicates whether the text is given by the server or integrated into the application. The key must be indicated: if the text was used before, the key will be present in this table; if the text was not used anywhere, the serial number of the text will be indicated, and the developer will be able to set a convenient key.
Reusing text is a very insidious moment. On the one hand, the localization process is accelerated, and on the other, you can get into a funny situation.
I will explain with an example. Once we had a question “Do you smoke?” With the answer options “Yes” and “No”. Here we see three Lexem: two for answers and one for the question. The question was translated into Russian as “Do you smoke?”, The answers are “I smoke” and “I don’t smoke”. Then we decided to conduct another survey and reuse the answer options. In English, everything looked correct: “Fancy visiting a party?” – “Yes” / “No”. In Russian, due to the reuse of Lexem, the following “dialogue” turned out: “Will you go to the party?” – “I smoke” / “I don’t smoke”.
Now, when we compile the ToR and decide on the reuse of the text, we take into account in what contexts it was previously used. We also indicate whether the token is given by the server or integrated into the client and delivered to customers through the App Store or Google Play.
These techniques help save time because they exclude discussion at later stages.
The next step is translation. And the main thing here is not to lose the original thought. And this often happens, because all languages are different, with their own shades and turns. Sometimes the most accurate translation simply does not fit on the screen and translators have to find a compromise.
I’ll tell you the points of how we start translating, how we bring the context to the translators, maintain a common style and check the result.
There is order where there are rules (and all follow them). Therefore, we have a regulation on the order of translation.
To begin with, we choose a language that is understood by most translators. It is on it that we will prepare the source texts, so that later they can be easily translated into other languages. All the languages into which we translate (and we have 52 of them) are divided into main (parent) and dialects. The language in which the texts are prepared is English (we call it Master). Further, from English we translate into other languages: Spanish, French, Russian and others. Sometimes the translation needs to be clarified for one of the dialects – then we translate into Mexican Spanish or Australian English. But if we do not need this, we will use the translation into the parent language: basic Spanish or basic English.
Example. Let’s say we need to make the greeting more formal. Initially, there was “Hey” in English, “Hola” in Spanish, “Salut” in French, “Hello” in Russian, “G’day mate” in Australian, and “Que onda” in Mexican (“Like a wave ? ”; Mexicans are cool!). If we want to change the text, making it more official, then we will have to change the original text in English. At this point, translations into other languages become incorrect: they need to be checked and clarified. We draw the attention of translators to this.
An important point is the context in which the translation exists.
I will explain with examples.
Immediately, I note that some examples are screenshots of known resources, but their names do not matter to us, we just look at the most common types of errors in localization.
This is a sign with a gas station: “Before starting the movement, make sure that there is no gun in the tank.” The word “gun” was translated literally into English: “gun”. But a gun for an American is a weapon. In this context, the request “Get the gun out of the tank” sounds rather strange.
In the following example, the creators of the application decided to make a universal version of the text for men and women – apparently, there is some benefit for them in this. The feeling is that texts and pictures are simply collected on one screen: what is at stake is unclear.
The next example is about how the original thought of the text was lost as a result of the translation. Look at the Russian version on the right: we are offered to start communication with ourselves. Although it was understood that they give us the opportunity to link our Instagram account.
Such errors happen when the translation occurs out of context. Therefore, for each token in our localization system, the following is indicated:
- text description;
- a picture that shows which elements are next to the text on the screen;
- a note about whether the text will be shown to men or women – so that translators can determine whether two different translations are needed or just one;
- types of variables (this is a very important point, and I will tell you more about it when we analyze the development process);
- maximum text length: it is very important for push notifications, because the width of the screen of a mobile device is not unlimited.
Also, we will definitely break the large text into parts. This is convenient if you then need to search or make changes.
Let’s analyze this moment in more detail. When we broke the text, we lost the connection between individual phrases and sentences. Therefore, we must show the translators what happened before and after this text. This is relevant, for example, in the case of legal documents – so that they are translated correctly.
We also highlight local terms, slang words in Lexem. For example, in the case of the sentence “Unlock your Likes List to see everyone who’s interested at once”, the translator needs to know that Likes in this case is a special application directory that contains contacts of users who like the profile. Another similar example is the term “Stories”. Ten years ago, no one at the word “story” imagined Instagram. Now it is associated primarily with him.
So, we made sure that the translation option is highly dependent on the context, namely on the following factors:
- User gender
- the numerical variable that appears in the text: “You have only one friend” and “You already have ten friends”;
- Platforms: Web, Android, iOS;
- the project for which the translation is being performed.
Let us dwell on the last point – the dependence of the translation on the project. This is important because each project has its own style.
These are the headers of letters that are sent to the user if his account has been blocked.
For Badoo: “Your account is locked.”
For Lumen: “Your account is locked.”
For Bumble: “You were blocked.”
And for Chappy – “Aw!”
In order to maintain a unified style within each project, you need to give translators access to the translation history. We have a tool called Translation memory (TM). The translator always has access to information about matches and the percentage of similarity: he can either use the old translation or enter a new one. We show translators not only 100% matches, but also less similar options, and we will definitely highlight the differences.
Besides the fact that the “Translation Memory” allows you to preserve the style within the framework of the project, it also helps to speed up the process, because the translator does not need to enter the same thing twice.
Cases and numbers
We have a tool called the Case Matrix. This is like a multiplication table, only for cases and numerals.
Translators, as necessary, fill out this matrix for different words in each language. Filling it in one step is unrealistic, so this happens gradually: it took a word – made.
As a result, the matrix helps to avoid these errors:
The advantage of the tool is that the desired shape is selected immediately before rendering, before showing to the user. Here’s how it goes:
For example, we have a translation into Russian. “Credits” in the center is an identifier, a link to a case matrix. The “Credits amount” on the left is the number that will come from the developer. And @ 3 is the case indicated by the translator (in this case, the accusative).
“You need 10 credits”: the phrase “10 credits” will be substituted automatically.
If we multiply 150,000 phrases and texts into 52 languages, we get a number in the region of 7.5 million. Of course, manually checking all this is unrealistic. Therefore, we did an automatic check of translations at the time of saving.
We automatically check for features such as missing emoji or variables. If the translator accidentally deleted a variable, the phrase loses its structure and meaning. Compare: “You need 10 credits” and “You need credits” – the second phrase is spoiled, the thought is lost.
We also check the missing HTML, otherwise the layout will go.
And we always warn the translator that his translation is longer than the original. At this point, he must check whether he is suitable, whether the text will fit on the screen.
We highlight the main points:
- translators need an understanding of the context;
- the translation system should be so flexible that an appropriate translation can be made for each language so that the translator does not choose universal formulations; support for declensions and cases is necessary;
- Be sure to automatically check the translations.
In addition to the work of professional translators, we use the help of users. There are two methods here: A / B testing and joint translation.
So, you need a translation, for example, into Russian. The translator translated one phrase in two different ways, and you do not know which option to choose. In this case, you can conduct an A / B test: show users different options and choose one depending on their reaction.
We had a choice between two options: “Ready for new acquaintances? Join in! ”And“ A few more steps … and you will become part of Badoo. ” As a result of testing, we found out that more users completed registration when they saw the second option of push notification. We left him.
Below is a complete outline of the factors on which the translation option depends. The fifth element is just an A / B test: if the user falls into some group, then he will be shown the corresponding version of the text.
Once we sent users from Mexico a notification asking them to translate some texts into their language for a small fee in the form of loans – the application’s internal currency. And they agreed: in just two days, 5,000 Lexem were translated for us. This is a huge help, and the Mexicans are great guys!
What is interesting and why is this approach important? If you do not have a local dialect translator, allow users to do this work. As it turned out, they are really ready to participate in the development of a project that they like.
We have a collaborative translation platform. You can log in using your Badoo account. And vote for the best translation.
This is a screenshot of the German translation window. The user can add his own version. When one of the options gains a threshold number of votes, we show it to our full-time translator, and it can be used as the main one (provided that it matches the style, the rules of the project, does not offend anyone, and so on).
Do not be afraid to ask users for help. They will prompt and help.
We pass to the most interesting – to the development process. I specifically first spoke about the translation process, described common problems, then to show how the developers solve these problems.
There are two main difficulties: how to organize parallel development and how to track errors when using Lexem so that the correct translations appear at the right time.
I’ll start with the story. Previously, our development scheme looked different. The source code was stored in a file in the repository. Two developers could change something in parallel, and then there was a need to combine these changes. The problem is small, but inconvenient.
The old scheme in which changes had to be combined
Now we change and add Lexem centrally in the localization system. Developers only need to download a set of Lexem before starting work on the task and use them. The key is indicated, you wrote the code, you use it – everything you don’t think about anything else.
Errors When Using Lexem
There are many variables in translations.
If you are in a hurry, you can easily confuse “credit_amount” and “credit”. To avoid this, we introduced control – a text container, a kind of abstraction over the translation, which knows what type of variables are used in this translation. It performs the substitution and verifies that the passed value types for the substitution are as expected. If all the substitutions are done, then it returns a line that can already be shown to the user. If not, the same container is returned. If we try to show the user a translation before we have completed all the substitutions, then we will see a warning in the logs and we will know where to go and how to correct the situation.
Highlights in development:
- developers should only deal with their work – they should not think about localization, changing texts, and so on;
- you need to check what the developers have done, and this check is also better to automate – this will preserve the nerve cells of all participants in the process.
So, we already have a developed product that we have translated. It remains to check how well we did it.
Let’s start with the examples. How many jambs in this screenshot?
I highlighted two. Above – the translator apparently did not know that a distance would be shown in front of his phrase. Bottom – the width of the screen on which the translation is displayed is not taken into account.
The second example also concerns translations that are too long that do not correspond to the width of the screen — everything is simply cropped here, the inscription does not fit on the button.
In the following example, in addition to showing us text in different languages, we are also offered to know the pain.
To prevent such errors from occurring on the prod, quality control is simply necessary.
Let’s see what control options exist.
The first that comes to mind is to check the translation on a test version of the site or application. That is, just start and see if what happened is consistent with the design, idea, technical specifications, and so on. Using this method, we caught this error in the push notification:
The next quality control method is according to the screenshots of the application.
We have developed a special tool that, in a test environment, takes screenshots of all screens of mobile applications in all languages. You can see how they look through the browser. There is also a special mode, switching to which, we can see the identifiers of the text that we are shown. This helps a lot when debugging: you can quickly find out which token it is and why it got to where it got (maybe we inherited the code in which this token is substituted).
If you have a web version and you just need to get pictures from somewhere, you can integrate token markers into the source text, write a plug-in for Google Chrome – and from testers’ machines, from their browsers, this plug-in will send screenshots of pages to your localization system on which he discovered Lexem.
<ul> <li> ... </li> <li> <! - lexeme_12345 -> Dating <! - lexeme_12345_end -> </li> <li> ... </li> </ul>
We used this method for quite some time. He allowed to collect a huge number of pictures in just two weeks. But we refused it, because with it you can only get images of the already released version, and we learned how to get pictures and design at the stage of forming TK.
Control during translation
As I said above, it seemed to us not enough to take pictures when there is already a ready-made application. We decided to take screenshots when the application is not yet ready, when there is still nothing and it is necessary to somehow control the quality, to understand whether everything is going as it should.
So we got a control tool during the translation.
I will explain the principle of its work. Our designers use Sketch, an application in which they create interfaces, including mobile application interfaces. We learned how to replace texts in Sketch files and generate screenshots of the screen we need using the Sketch software interface. Now, in the process of translator’s work, we can immediately show him screenshots of screens in his language. And to do this even before the developers began to create the first version of the new functionality.
Later we designed this solution as open-source (code).
If it is not possible to verify the translation in any specific language, for example, in Japanese, then you can order a selective audit, that is, a third-party company can show the translation of every hundredth token with a picture and ask if everything is correct.
Highlights in quality control:
a visual assessment of the quality of the translation is necessary;
during the testing process, it is important to understand what devices your audience is using and test the application on all these devices.
So, we have tested cool written functionality. It remains to deliver it to users.
In our Badoo application, there was a Super Power service. At some point, we needed to change its name to “Badoo Premium”, and at once in all versions it was atomic so that the user would not see “Super Strength” on one screen and “Badoo Premium” on the other.
To do this, we have attached a token version to each task branch in Jira. When we include changes from a branch in a new version of a project, the new version of Lexem is immediately pulled up. If you need to roll back something, we remove the task branch from the new version and together with it we remove the version of Lexem with translations into all languages.
When a token has been tested or when users already see it, you need to be very careful: it’s better not to change anything in it, but create a new version, attach it to the ticket and with the new release deploy the new version of the token with new translations.
However, you can make mistakes during the translation. In the example below there are two of them.
False: “It’s a remath.”
True: “It’s a rematch.”
In English, you cannot use the direct apostrophe. The letter “c” is also missing.
Versioning Lexem and versioning translations are two different things. The translation can be fixed at any time: when the task is in development, when it is at the testing stage or even when the functionality has already been brought to the user (nothing bad will happen if users see the corrected translation in the new version of the application).
Server! = Smartphone
Delivery of updates to different platforms occurs in different ways. If you are developing a mobile application, then for sure you have the server and client parts.
What you show to the user either partially comes from the server, or is located on his smartphone (for example, integrated translation).
The path that passes the transfer from the server to the user lies through our production server, where you can easily deliver updated versions of files with translations.
But the path of integrated translation is long: it lies through the App Store or Google Play. The user downloads the update and only after that sees the fix. This process seemed too slow for us, and we came up with our own “Hot Update” update mechanism. It allows you to generate a new version of translations at the touch of a button and make it clear to all customers in the world that there is something new that needs to be downloaded and used.
When the application is launched on a mobile device, it sends a launch notification to the server and reports the current version of translations. If the localization system has a ready update, then it will issue a notification in response. The client downloads the update, applies it.
The user will see new translations when he switches to the next screen. Two of our articles are devoted to the implementation of this solution: one and two.
In the release process, you must take into account which path the application goes from you to your customers. Probably, different parts of your application are updated differently.
Let’s return to the scheme that I cited at the beginning of the article.
What you should pay attention to if you are developing a translation system:
- write detailed terms of reference;
- take into account the context and provide translators with access to it;
- keep a history of translations in order to maintain a unified style within the project;
- automate control (otherwise, any random translator that is located in several time zones from you will be able to do everything in his own way);
- free developers from solving non-core tasks. They create new versions of your product, it gives joy to your users and gives satisfaction from the project you are creating.