6.5. Translation synchronisation

6.5.1.  The need for a synchronization process

The files which are really used for building Debian Installer packages are hosted in the SVN (Subversion) repository used for the development of Debian Installer packages. Each Debian Installer package sources is hosted in a subdirectory of the packages/ directory, with a very classical Debian package source tree organization, including the debian/po directory where the package's translation are hosted.

These files need to be synchronised with the so-called master files from the packages/po directory, as the translators only work with these master files. This synchronization must work both ways with changes to original strings going from individual packages to the master files and changes/updates from the master files going to individual packages:

  • When Debian Installer packages templates are changed, the individual debian/po/templates.pot file will change. These changes have to be moved to files in packages/po;

  • When translations are updated in the packages/po directory, these changes have to be moved back into individual debian/po/*.po files for each package. This does not need to be done at each translation update but must be done before building packages.

The task of keeping track of the correct synchronisation is one of the most important tasks of the Debian Installer i18n coordinators. It is handled by the scripts/l10n/l10n-sync script.

6.5.2.  Localization files synchronisation process

6.5.2.1.  Initial synchronization

This paragraph is here for historical reasons and is irrelevant after the initial process has run once.

The template.pot is generated from the Debian Installer repository by merging together all debian/po/templates.pot files. The scripts/l10n/l10n-sync script may be used for this. First check that no PO files stay in packages/po when running it for the first time.

6.5.2.2.  Switch a language translations to the master file

This paragraph is here for historical reasons and should be now irrelevant as all languages translations have been switched to the master file.

This process is meant to collect together all translations spread out in the individual package, for one language, and merge them in a single master file. This action should only happen once for each language.

The detailed process follows. For each step, the following list will mention whether the action is to be done by the language translators or by the Debian Installer i18n coordinators.

  • (translators) bring this language to 100% translation in all packages. This step is not mandatory but will help in tracking down fuzzy strings;

  • (Debian Installer i18n coordinators) stop the cron jobs running scripts/l10n/l10n-sync to avoid conflicts;

  • (translators) stop commiting <language>.po files for this language;

  • (Debian Installer i18n coordinators) build the general <language>.po file in packages/po by using the scripts/l10n/gettext-helper script;

  • (translators) fill in the PO file header properly. The Plural-Forms header may need to be added manually;

  • (Debian Installer i18n coordinators) launch the synchronization script once with the special --keep-revision=<language> switch which will enforce the PO-Revision-Date header to be left unchanged in all packages. The synchronisation script will then spread out the "new" translations to all packages;

  • (translators) in the newly created master file, track fuzzy strings which are often caused by identical strings with different translations in different Debian Installer packages. These fuzzy strings will temporarily show all translation variants from which the translators will have to pick up one version only, of course.

  • (translators) commit back the updated <language>.po file to packages/po. This will automatically trigger individual PO files update when packages will be built.

An helper script named switch-language has been written to handle this switch to the master file. Using it absolutely needs that no action happens simultaneously on the switched language files. Translators are requested to not use it and rather ask one of the Debian Installer i18n coordinators to do it for themselves.

6.5.2.3.  Synchronization process

The l10n-sync script is run periodically by one of the Debian Installer i18n coordinators. It should preferrably be run on a reliable Debian host such as people.debian.org and will update the general template.pot and all PO files from individual templates.pot files from all Debian Installer packages.

While running, this script will also run debconf-updatepo for each Debian Installer package and will commit back the regenerated templates.pot files (package maintainers often forget running debconf-udpatepo when commiting changes to English templates).

During normal development, this synchronization process is run once a day. During release preparations, the frequency will be increased to speed up the full cycle.

When operating on the sarge branch, the l10n-sync will also merge translations from master files in the trunk branch, by using the --merge option. Synchronisations on the sarge branch only occur once a week.

The general process is:

  1. Initial step: synchronize the whole repository;

  2. Individual packages update step. For each Debian Installer package:

    1. synchronize the local copy with the Debian Installer SVN;

    2. run debconf-updatepo;

    3. commit back the files in debian/po.

  3. Master templates file update step:

    1. merge all templates.pot files to packages/po/template.pot. The packages/po/header.pot is mandatory (this file will provide the standard header for the merged POT file;

    2. commit this file.

  4. Master PO files update step. For each PO file in packages/po:

    1. synchronize with Debian Installer SVN;

    2. if the script is run with the --merge option, merge translations from the reference PO file (the reference file is given priority for identical strings);

    3. update with packages/po/template.pot. This step uses the msgmerge. The resulting PO file formatting details will depend on the version of this utility. For that reasons, the synchronization script should always be run from machines using the same Debian release version (there are known differences between the woody and sarge versions of the gettext utilities);

    4. commit back the changed file to Debian Installer SVN.

  5. Individual packages PO files update step. For each Debian Installer package:

    1. synchronize the local copy with the Debian Installer SVN (in case some update occurred in the meantime);

    2. update debian/po/*.po files with master files;

    3. commit back the changes to Debian Installer SVN;

This system minimizes race conditions which could trigger conflicts.

The conflict windows have been minimised as much as possible in the l10n-sync script. However, in order to limit the number of commits made by the script, files are not commited as soon as they are modified (except by using the --atomic-commits switch which slows down the whole process a lot and triggers a lot of commit actions). So, the script is sometimes likely to trigger conflicts. For that reason, it will stop working on a SVN copy where SVN conflict files are present. This explains why this script must always be monitored even when it is scheduled to execute periodically.

6.5.2.4.  Synchronization script use

The l10n synchronisation script is kept in Debian Installer repository in the scripts/l10n-utilities directory and is named l10n-sync.

This is a shell script (which may contain some bashisms until a skilled shell programmer cleans it out).

The script uses some command-line switches which may affect its behaviour. Some of these switches are mostly present for historical reasons and are kept because they may have some new use in the future:

  • --debug will trigger more output by the script. Otherwise, the script reports about its actions but commands output (such as svn, debconf-updatepo, msgmerge) will be redirected to /dev/null;

  • --online triggers svn update commands before working on files. Except for testing, there is no reason for not using this switch.

  • --commit allows the script to commit files back to the Debian Installer repository. Otherwise, the modified files are kept in the local SVN copy. Of course, this switch should be used in production. Omitting it is useful only for testing.

  • the --atomic switch instructs l10n-sync to commit files as soon as they are modified. This makes the script quite slow and may trigger dozens of commits. As Debian Installer commits are sent to the development IRC channel and are followed by several Debian Installer maintainers, this switch should not be used except in very rare occasions;

  • the --atomic-updates switch instructs l10n-sync to issue a svn update command before working on each package. This may be very CPU, network- and time-consuming and should only be used when there is some need to limit possible conflicts;

  • the --keep-revision switch needs an argument which must be a language code (one only). It will force l10n-sync to restore the PO-Revision-Date field of this language on each modified file. This switch is used when switching languages to the master file (see Section 6.5.2.2, “ Switch a language translations to the master file ”. It should not be used in other occasions;

  • the --svn switch defines the command for calling the svn utility. It may be used in case some special behaviour is needed;

  • the --debconf-updatepo defines the command for calling the debconf-updatepo utility. It allows using a specially crafted debconf-updatepo switch, often the --skip-merge switch which only updates templates.pot files and not PO files (using this speeds up the general synchronization script);

  • the --sort-order switch allows giving at the command line the order the packages should be dealt with. This allows to put a crafted order in the generated templates.pot file so that the translators begin to work on the most important packages;

  • the --merge switch allows merging master files from another branch. It is used, for instance, on the sarge branch for merging translations coming from trunk.

The script needs the location of the local copy of the Debian Installer repository as an argument. It makes some simple checks about the copy. A partial SVN checkout may be used, with only the packages/ directory.

When using this script with commits, the Debian Installer copy must be as clean as possible. It should not be used for development tasks. The script checks for possible SVN conflict files and aborts if it finds some.

Prospective languages (see Section 6.3, “ Prospective languages ”) are handled in a special way: for all such languages, the translations are not copied in the individual packages directories.

6.5.2.5.  The PROSPECTIVE list

When languages are on early stages of translation after they have been added through the new language process (Chapter 3, The New Language Process: adding a new language to Debian Installer), they are temporarily listed in a file named packages/po/PROSPECTIVE.

Languages listed in that file are excluded from synchronization, and, therefore, PO files are not created in individual packages. As a consequence, activating a language means that the language code is removed from the PROSPECTUVE file.

6.5.2.6.  List of handled packages

A special file, named packages/po/packages_list, lists the Debian Installer packages which are included in the master files. This file also sorts these packages by order or priority for translations. The strings at the beginning of the files are to be translated first.

The Debian Installer packages maintainers must request for their packages to be included in this file, when they estimate that their package is ready for translation. Only Debian Installer i18n coordinators can integrate packages there, after checking that the strings have been reviewed.

6.5.2.7.  Synchronization script and automated commits

Being aimed to be an automated process with automated commits, the script requires a few prerequisites to be run without user interaction.

First of all, when run from a cron job, the script has to be able to commit files. This means that the account it is run from should use a SSH key with an empty passphrase and add this key to the account it commits to on anonscm.debian.org.

First, create a SSH key with an empty passphrase and put it in a special file:

user@host:~> ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/user/.ssh/id_dsa): \
        /home/user/.ssh/nopass
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/user/.ssh/nopass.
Your public key has been saved in /home/user/.ssh/nopass.pub.
The key fingerprint is:
75:fa:62:54:2c:34:09:96:ad:f2:57:cf:16:ce:74:69 bubulle@mykerinos

Then, this key should be added to ~/.ssh/authorized_keys on anonscm.debian.org for the Alioth account (<alioth_account>) under which identity commits will be done.

Finally, on the host from which l10n-sync runs will happen, the following should be added to ~/.ssh/config:

host anonscm.debian.org
  user <alioth_account>
  IdentityFile /home/user/.ssh/nopass

Of course, this actually means that this local account will then be able to commit to anonscm.debian.org with <alioth_account> as identity, without further control. This means that the account the script is run with should be very safely protected.

6.5.2.8.  Synchronization script control

Some mechanisms have been implemented to add more safety and remote control possibilities to the synchronisation script.

First of all, before doing any work, the script opens a special file kept in the SVN repository and named packages/po/run-l10n-sync. The "run=" line in this file mentions whether synchronisation should happen or not. If the file contains "run=0", then the script will exit without taking any action.

This mechanism gives all Debian Installer developers with commit access to the SVN repository a very simple way to disable the synchronization script actions. For this, developers just need to change the file and commit the new version.

While the script is disabled, runs can still be enforced by using the --force swith, which will ignore the packages/po/run-l10n-sync file.

While it is running, the script also creates a file named .l10n-sync.lock at the root of the local copy of Debian Installer repository. The file is removed only after successful runs of the script. When this file already exists, the script does not run and exits with an error message.

This mechanism prevents running the synchronization script twice on the same Debian Installer repository checkout copy.

6.5.2.9.  Synchronization script automated runs

(this section should be updated when the script run conditions are changed)

The script actually runs under Christian Perrier account on people.debian.org.

It is run through cron jobs:

TODO

A special cron entry is also added in order to allow other Debian Installer developers to trigger runs of the script at any moment. They just need to create a specialf "flag" file on people.debian.org (this restricts the feature to official Debian developers who have an account on this machine).

As people.debian.org is a woody machine and as the behaviour of gettext utilities such as msgmerge changes with the version of Debian GNU/Linux systems, Debian Installer developers should avoid using the script on other machines to avoid useless massive commits.