Posts Tagged ‘python’

Chromium translations explained: part 2

January 23, 2011 7 comments

In the first part of this series of posts about the Chromium translations, I covered Grit, the format of translations used by upstream for Chromium (and Google Chrome, ChromeOS..). In another post, I recently explained the release management of this project, showing that multiple branches evolve in parallel, inside the so called Channels. In this part, I will cover the interaction with Launchpad, and show how the strings are converted back and fourth, how the Launchpad contributed strings are merged with the upstream strings, and the various problems that came up since contributions started to flow.
Read more…


PPA stats, initial impressions

January 5, 2011 9 comments

Our rocking Launchpad team gave us a nice Christmas present last week in bug 139855. After a loooong wait, we finally have some raw stats for PPA downloads. The Launchpad API was ready for a while, but the machinery to parse the logs apparently took a long time to setup properly. But what is important is that it’s finally there for us to try.
Enough to say that this topic is hot for me. I manage a lot of PPAs as you may know, especially ones with daily builds.

The API provides access to the stats via 3 methods:

  1. getDailyDownloadTotals
  2. getDownloadCount
  3. getDownloadCounts

those methods are related to binary_package_publishing_history so they work on binary packages.
They just give counts, either total or hashed by dates, as I will show below. It doesn’t say who or how, but it does say from where.
So the granularity is :

pkg-name / version / distribution / architecture { / yyyy-mm-dd } => number of downloads { / country }

The problem with my daily PPAs is the big number of updates. Like this one:

>>> ppa
<archive at;
>>> ppa.getBuildCounters()
{u'failed': 2641, u'superseded': 221, u'total': 13045, u'pending': 0, u'succeeded': 10183}

It means those 10k builds generated n x 10k debs, and hopefully, way more downloads. We’ll see shortly…

To experiment how usable the API is for this use case, and what kind of information I can learn, I gave it a try.
My initial need is to see what my users want (which distro, whice arch, ..) so I can focus on that, and maybe trim some dead branches from my tree.

>>> bin
<binary_package_publishing_history at;
>>> bin.distro_arch_series
<distro_arch_series at;
>>> bin.getDailyDownloadTotals()
{u'2011-01-04': 11}
>>> bin.getDownloadCount()
>>> bin.getDownloadCounts()
<lazr.restfulclient.resource.Collection object at 0xa031f6c>
>>> x=bin.getDownloadCounts()
>>> x.lp_operations
>>> x.lp_attributes
['resource_type_link', 'total_size', 'start', 'entries', 'entry_links']
>>> for e in x.entries:
... print repr(e)
{u'count': 1,
u'archive_link': u'',
u'country_link': u'',
u'http_etag': u'"bc6b572b45ed648216b441ebd6390042f096bc0e-c607ca46c77b9673130a54553f81fd2595304c96"',
u'self_link': u'',
u'binary_package_version': u'10.0.628.0~svn20110104r70404-0ubuntu1~ucd1',
u'binary_package_name': u'chromium-browser',
u'resource_type_link': u''
, u'day': u'2011-01-04'}

I think you get the idea..

In my first test, I just used getDailyDownloadTotals() against all binary packages of that particular PPA.

The code (python) was easy to write, but it took a while to run, and OOPS-ed (elsewhere, not in any of these 3 methods) after something like 1h15, still on the first binary package.

Fortunately, I was able to retrieve enough meaningful data to produce this:

250k downloads in 2010, not so bad for a daily build of Chromium trunk.

Looking quickly at the chart, there are already lessons to learn:

  1. I’m loosing a lot of users each time there’s a new Ubuntu release. In April, the loss was expected, Lucid came with Chromium stable in universe, I guess people moved to it, and it’s not trackable by this API (the official distribution is mirrored everywhere on the planet, unlike PPA which are still centralized).
  2. Even if I’ve lost half of the daily users, there are still more than enough to justify the time I spend on this.
  3. all week-ends show a huge decrease in activity, but still, there are more active users than I thought.

I made this page public if you want to have a closer look. It’s frozen (it doesn’t poll LP) but it’s interactive, you can zoom in/out.

I still have plenty of ideas for ways to report interesting facts using those stats but it’s already clear to me that it’s too slow to be used for big PPAs like this one. It’s very new, so I’m not overly concerned at this point. I will experiment with smaller PPAs in the coming days, and report back.


Chromium translations dashboard

January 3, 2011 5 comments

Two weeks ago, I noticed that the Chromium translations page on Launchpad showed way more green than usual. For me, “green” meant that translations come from the upstream tree. I was quite surprised to see so many strings turned green overnight, even for completely new langs and for templates I knew upstream already said they can’t take.
I quickly checked the output logs of my converter, fearing the worst, fortunately there was nothing wrong with it. I scanned the Launchpad-Translators mailing list and found it was an improvement. I read it twice, and I still don’t see what the benefit for Chromium could be. Obviously, Chromium will not move to Launchpad, not even its translations. What Launchpad gets is a gettext export, using my converter as a bidirectional gateway with the native format (Grit) living in the upstream tree, and I still don’t see that tree imported into Launchpad either (dozens nested svn and git trees, controlled by a py script called gclient managing the modular dependencies). After giving it more thoughts, I realized the old colors were more useful for my very particular use case, so I needed them back, somehow.

My first idea was to use the Launchpad API, which I use in other projects, but it quickly proved a dead end. The next idea was to directly hook that up into my converter, which is seeing all the strings after all, so it was the right place to extract figures from, and, why not, create a nice dashboard.

I went on, added more Python code to my already pretty long converter, played with some CSS gradients and created this:

(click on the image to see it completely)

This page lives there and is updated daily.

A few comments:

  1. I’m not a web designer. The page looks nice to me in Chromium, but is probably ugly or broken in other browsers. If you have ideas to improve it, please ping me.
  2. This page also reports conversion errors (meaning rejected strings), which are not visible in Launchpad as Rosetta has no way of knowing there’s a problem with those strings. If you are a translator of one of those listed langs, please go fix it 🙂
  3. The numbers are taken at the end of the conversion chain, meaning all non-red strings successfully passed all the sanity checks and should end-up in the debs.
  4. Once strings are visible in this page, they will be in the next daily build. An almost instant reward for translators.
  5. I don’t have a “Need Review” category, obviously, Launchpad doesn’t export those strings.
  6. Completely new langs (declared in Launchpad but for which there’s no string yet) are not visible here (6 langs have 0 strings at the moment), you need to land at least one string.
  7. If you look closely at both screenshots, some numbers differ, like for Spanish. 0 missing in LP, 1 in my dashboard. It’s caused by the asynchronous export of Launchpad, it always arrives too late compared to the daily build, skipping a cycle. Too bad. It would be nice to be able to schedule that export (like 1h before I kick off a build). EDIT: Spanish was a bad example, that one string is in fact bogus (it’s listed at the top of the page, so it has been rejected). That’s a better example of my 2nd and 3rd points combined.

the goal remains the same, eradicate the red, and land a maximum of strings upstream (less blue and purple).