Home > Chromium, Ubuntu > Chromium translations explained: part 1

Chromium translations explained: part 1

This is the first part of a series of posts about the Chromium translations. This part explains how the upstream translations work, next parts will cover the interaction with Launchpad, the machinery to convert strings back and forth, the merge of all strings per branch, and how it goes back to upstream and benefits the community.

Grit format

Chromium uses a format call Grit, standing for Google Resource and Internationalization Tool. As its name implies, it is a format created by Google, which is used in many internal projects, and some open-sourced projects like Chromium. It started on Windows, and has been extended to Mac and Linux.

It is able to do much more than translations. It can manage entire resource packs. It can produce the following file types:

  • reg, rc, adm and its successors admx/adml (Windows)
  • plist, .strings (Mac OS X)
  • .h.cc, .js, .json, .html and .pak files (all platform)

Grit uses 2 file types:

  • grd: the template, listing all the files to create, and all the strings to translate
  • xtb: (per-lang) translations and preferences

There could be many templates in a project, Chromium has 30 (not all translations, fortunately). There are as many xtb files as there are langs for each grd. If a given template is not translated into a particular lang, there is no xtb file associated. In the same way, if a given string is not translated, it doesn’t appear in the xtb (unlike GetText which puts it with an empty msgstr).

Grit is composed of 2 parts, a client and a server (a console). The client is open-source, it lives in the Chromium source tree, the console is not. It’s accessible only by Google developers.

The grd files are maintained by the Chromium developers. Anyone adding a string in the UI must add it to the template. The xtb files are generated by the console populated by paid-translators (focusing on Google Chrome only). Both the grd and the xtb files are merged by the grit client at build time.

Those 2 file types are XML based, grd files have a quite complex syntax, they can even have conditions. xtb files are usually very simple.

I will not go into the details of those syntaxes, if you are really interested, you can have a quick look at those examples: generated_resources.grd, generated_resources_en-GB.xtb.

Now, focusing on translations, here is how to have a localized string in the C++ code:

  1. include chrome/common/l10n_util.h and a template specific .h file.
  2. call l10n_util::GetStringUTF8(), or  l10n_util::GetStringFUTF8() which will replace the placeholder variables

Let’s see with a real example.

Example

chrome/browser/gtk/options/options_window_gtk.cc:

  #include "grit/chromium_strings.h"
  #include "grit/generated_resources.h"

  std::string dialog_name = l10n_util::GetStringFUTF8(
          IDS_PREFERENCES_DIALOG_TITLE,
          l10n_util::GetStringUTF16(IDS_PRODUCT_NAME));

chrome/app/generated_resources.grd:

  <if expr="os == 'linux2'">
    <message name="IDS_PREFERENCES_DIALOG_TITLE"
             desc="The title of the Preferences dialog box">
     <ph name="PRODUCT_NAME">$1<ex>Google Chrome</ex></ph> Preferences
    </message>
  </if>

chrome/app/resources/generated_resources_pt-BR.xtb:

  <translation id="3664704721673470303">
  Preferências de <ph name="PRODUCT_NAME"/></translation>

What do we have here?

  1. The code of that string is in red (IDS_PREFERENCES_DIALOG_TITLE), it must be in one of the grd files too. Notice it is not in the xtb file (it should trouble you)
  2. In blue, it’s a placeholder variable (PRODUCT_NAME), it MUST be in all 3 files
  3. In orange, it is an example of value for the placeholder variable (Google Chrome), it is meant to help translators, but it doesn’t have to be translated, hence, it is not in the resulting xtb
  4. The translated string in the xtb file (in pink) is a numerical id (3664704721673470303), where one could have expected the IDS_ code instead. The mystery darkens…
  5. There is a nice description for the string (the ‘desc’ attribute of the <message> tag). Once again, it’s there to help the translators by providing a context.
  6. The string is embedded into an <if> tag, it means there is an extra step, a kind of filtering, for what ends up in the lang-packs / resource files.

Hmmm. Now what? There is nothing else. No file or anything mapping red and pink… how is that possible?

It happens Grit creates the numeric id from the string itself, using a fingerprint (an unsigned 64bit md5). It took me a while to figure out, but the string is partially cleaned-up (the <ex> is dropped, some html entities are decoded, etc.) before the fingerprint is computed.

You may ask why we need to do this:

  1. it’s mandatory to know what has already been translated
  2. I mentioned earlier than when a string is not translated, it’s not in the xtb. Hence, we don’t have the mapping, we have to compute it ourself to add it in the improved xtb (as we don’t have access to the console, remember?)

Why is that needed in the first place? i.e. why not use the IDS code? The reason is that a given string could be present in the same grd template several times, with different contexts/conditions. There is no need to translate it several times. The fingerprint is there to remove duplicates.

As I will explain in the next part of this series, to use the Launchpad Translation facilities, I needed to convert those Grit files into GetText files.

Here is how I present the string from the example above in the GetText template:

#. IDS_PREFERENCES_DIALOG_TITLE
#. - description: The title of the Preferences dialog box
#. - condition: os == 'linux2'
#: id: 3664704721673470303
msgid "%{PRODUCT_NAME} Preferences"
msgstr ""

As you can see, I tried to preserve as many information as possible (while hiding the complexity of XML). I’ve lost the <ex> for the placeholder variables (I should be able to add them).

I hope this clarifies a few things regarding the Chromium translations. Please let me know what you think in a comment and don’t forget to rate the article.

About these ads
  1. January 8, 2011 at 19:09 | #1

    This is very interesting, thanks!

    Some typos: “MUST me” -> must be, “one of the grd file” -> one of the grd files.

    How does this i18n framework handle strings that must be translated differently in different contexts (e.g. a verb and a noun may be spelled the same in English, but differently in other languages)?

    Is there some equivalent of ngettext?

    • fta
      January 8, 2011 at 18:45 | #2

      Thanks.. and typo fixed (I should not write that late).

      When words are ambiguous in english, the context of the string must help determine what it is supposed to mean. If the string itself is too short, the description (desc attribute in the <message> tag) must clarify the situation. If it’s still not enough, the whole string could be wrapped into an <if> tag with something like <if expr=”lang in ['ar', 'ro', 'lv']“>…</if> and clearly explain why it has to be treated differently, and that information is passed to the translators.
      Have a look at those strings for example:
      generated_resources.grd

      At the moment, I have a problem with strings having such a “lang” condition (and saying “For all other languages, do NOT translate.”). My converter simply ignores those conditions and then reports those strings as missing for langs that are not even supposed to translate them. That’s something I’d like to fix before Chromium 10 is released.

      If you find such strings, don’t hesitate to file bugs either upstream (preferred) or in launchpad, providing a link or at least the IDS code and the lang.

    • tony chang
      January 10, 2011 at 19:50 | #3

      To handle the case of two strings being identical in English, but different in another language, there’s a ‘meaning’ attribute. For example, we use it with the string ‘Duplicate’ (see IDS_APP_MENU_DUPLICATE_APP_WINDOW and IDS_TAB_CXMENU_DUPLICATE) which can mean duplicate tab or duplicate application window. In English, we always use the string ‘Duplicate’, but in other languages, we can provide different strings.

      For hashing purposes, we just append the meaning attribute to the text to get the ID.

  2. October 30, 2012 at 19:01 | #4

    This is really helpful !!

    Could you say how we can build this change? Is there some initial step before changing the file mentioned? And after change the file, which build command I have to perform?

    Thanks.

  3. Raf
    April 7, 2014 at 15:23 | #5

    Is this discussion still alive?? Do you know if there is some tool to edit xtb file in a more friendly way (i mean using a tool like poedit)?

  1. January 23, 2011 at 19:06 | #1
  2. February 10, 2011 at 00:30 | #2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: