Posts Tagged ‘javascript’

PPA stats, initial impressions

January 5, 2011 9 comments

Our rocking Launchpad team gave us a nice Christmas present last week in bug 139855. After a loooong wait, we finally have some raw stats for PPA downloads. The Launchpad API was ready for a while, but the machinery to parse the logs apparently took a long time to setup properly. But what is important is that it’s finally there for us to try.
Enough to say that this topic is hot for me. I manage a lot of PPAs as you may know, especially ones with daily builds.

The API provides access to the stats via 3 methods:

  1. getDailyDownloadTotals
  2. getDownloadCount
  3. getDownloadCounts

those methods are related to binary_package_publishing_history so they work on binary packages.
They just give counts, either total or hashed by dates, as I will show below. It doesn’t say who or how, but it does say from where.
So the granularity is :

pkg-name / version / distribution / architecture { / yyyy-mm-dd } => number of downloads { / country }

The problem with my daily PPAs is the big number of updates. Like this one:

>>> ppa
<archive at;
>>> ppa.getBuildCounters()
{u'failed': 2641, u'superseded': 221, u'total': 13045, u'pending': 0, u'succeeded': 10183}

It means those 10k builds generated n x 10k debs, and hopefully, way more downloads. We’ll see shortly…

To experiment how usable the API is for this use case, and what kind of information I can learn, I gave it a try.
My initial need is to see what my users want (which distro, whice arch, ..) so I can focus on that, and maybe trim some dead branches from my tree.

>>> bin
<binary_package_publishing_history at;
>>> bin.distro_arch_series
<distro_arch_series at;
>>> bin.getDailyDownloadTotals()
{u'2011-01-04': 11}
>>> bin.getDownloadCount()
>>> bin.getDownloadCounts()
<lazr.restfulclient.resource.Collection object at 0xa031f6c>
>>> x=bin.getDownloadCounts()
>>> x.lp_operations
>>> x.lp_attributes
['resource_type_link', 'total_size', 'start', 'entries', 'entry_links']
>>> for e in x.entries:
... print repr(e)
{u'count': 1,
u'archive_link': u'',
u'country_link': u'',
u'http_etag': u'"bc6b572b45ed648216b441ebd6390042f096bc0e-c607ca46c77b9673130a54553f81fd2595304c96"',
u'self_link': u'',
u'binary_package_version': u'10.0.628.0~svn20110104r70404-0ubuntu1~ucd1',
u'binary_package_name': u'chromium-browser',
u'resource_type_link': u''
, u'day': u'2011-01-04'}

I think you get the idea..

In my first test, I just used getDailyDownloadTotals() against all binary packages of that particular PPA.

The code (python) was easy to write, but it took a while to run, and OOPS-ed (elsewhere, not in any of these 3 methods) after something like 1h15, still on the first binary package.

Fortunately, I was able to retrieve enough meaningful data to produce this:

250k downloads in 2010, not so bad for a daily build of Chromium trunk.

Looking quickly at the chart, there are already lessons to learn:

  1. I’m loosing a lot of users each time there’s a new Ubuntu release. In April, the loss was expected, Lucid came with Chromium stable in universe, I guess people moved to it, and it’s not trackable by this API (the official distribution is mirrored everywhere on the planet, unlike PPA which are still centralized).
  2. Even if I’ve lost half of the daily users, there are still more than enough to justify the time I spend on this.
  3. all week-ends show a huge decrease in activity, but still, there are more active users than I thought.

I made this page public if you want to have a closer look. It’s frozen (it doesn’t poll LP) but it’s interactive, you can zoom in/out.

I still have plenty of ideas for ways to report interesting facts using those stats but it’s already clear to me that it’s too slow to be used for big PPAs like this one. It’s very new, so I’m not overly concerned at this point. I will experiment with smaller PPAs in the coming days, and report back.