Home > Chromium, Ubuntu > PPA stats, initial impressions

PPA stats, initial impressions

Our rocking Launchpad team gave us a nice Christmas present last week in bug 139855. After a loooong wait, we finally have some raw stats for PPA downloads. The Launchpad API was ready for a while, but the machinery to parse the logs apparently took a long time to setup properly. But what is important is that it’s finally there for us to try.
Enough to say that this topic is hot for me. I manage a lot of PPAs as you may know, especially ones with daily builds.

The API provides access to the stats via 3 methods:

  1. getDailyDownloadTotals
  2. getDownloadCount
  3. getDownloadCounts

those methods are related to binary_package_publishing_history so they work on binary packages.
They just give counts, either total or hashed by dates, as I will show below. It doesn’t say who or how, but it does say from where.
So the granularity is :

pkg-name / version / distribution / architecture { / yyyy-mm-dd } => number of downloads { / country }

The problem with my daily PPAs is the big number of updates. Like this one:

>>> ppa
<archive at https://api.launchpad.net/devel/~chromium-daily/+archive/ppa&gt;
>>> ppa.getBuildCounters()
{u'failed': 2641, u'superseded': 221, u'total': 13045, u'pending': 0, u'succeeded': 10183}

It means those 10k builds generated n x 10k debs, and hopefully, way more downloads. We’ll see shortly…

To experiment how usable the API is for this use case, and what kind of information I can learn, I gave it a try.
My initial need is to see what my users want (which distro, whice arch, ..) so I can focus on that, and maybe trim some dead branches from my tree.


>>> bin
<binary_package_publishing_history at https://api.launchpad.net/devel/~chromium-daily/+archive/ppa/+binarypub/16459888&gt;
>>> bin.distro_arch_series
<distro_arch_series at https://api.launchpad.net/devel/ubuntu/natty/amd64&gt;
>>> bin.getDailyDownloadTotals()
{u'2011-01-04': 11}
>>> bin.getDownloadCount()
11
>>> bin.getDownloadCounts()
<lazr.restfulclient.resource.Collection object at 0xa031f6c>
>>> x=bin.getDownloadCounts()
>>> x.lp_operations
[]
>>> x.lp_attributes
['resource_type_link', 'total_size', 'start', 'entries', 'entry_links']
>>> for e in x.entries:
... print repr(e)
...
{u'count': 1,
u'archive_link': u'https://api.launchpad.net/devel/~chromium-daily/+archive/ppa',
u'country_link': u'https://api.launchpad.net/devel/+countries/AU',
u'http_etag': u'"bc6b572b45ed648216b441ebd6390042f096bc0e-c607ca46c77b9673130a54553f81fd2595304c96"',
u'self_link': u'https://api.launchpad.net/devel/~chromium-daily/+archive/ppa/+binaryhits/chromium-browser/10.0.628.0~svn20110104r70404-0ubuntu1~ucd1/amd64/2011-01-04/AU',
u'binary_package_version': u'10.0.628.0~svn20110104r70404-0ubuntu1~ucd1',
u'binary_package_name': u'chromium-browser',
u'resource_type_link': u'https://api.launchpad.net/devel/#binary_package_release_download_count'
, u'day': u'2011-01-04'}

I think you get the idea..

In my first test, I just used getDailyDownloadTotals() against all binary packages of that particular PPA.

The code (python) was easy to write, but it took a while to run, and OOPS-ed (elsewhere, not in any of these 3 methods) after something like 1h15, still on the first binary package.

Fortunately, I was able to retrieve enough meaningful data to produce this:

250k downloads in 2010, not so bad for a daily build of Chromium trunk.

Looking quickly at the chart, there are already lessons to learn:

  1. I’m loosing a lot of users each time there’s a new Ubuntu release. In April, the loss was expected, Lucid came with Chromium stable in universe, I guess people moved to it, and it’s not trackable by this API (the official distribution is mirrored everywhere on the planet, unlike PPA which are still centralized).
  2. Even if I’ve lost half of the daily users, there are still more than enough to justify the time I spend on this.
  3. all week-ends show a huge decrease in activity, but still, there are more active users than I thought.

I made this page public if you want to have a closer look. It’s frozen (it doesn’t poll LP) but it’s interactive, you can zoom in/out.

I still have plenty of ideas for ways to report interesting facts using those stats but it’s already clear to me that it’s too slow to be used for big PPAs like this one. It’s very new, so I’m not overly concerned at this point. I will experiment with smaller PPAs in the coming days, and report back.

Thoughts?

About these ads
  1. Christoph
    January 5, 2011 at 04:48

    It seems for “Architecture: all” packages, a download count for each arch gets added instead of just one.

    • fta
      January 5, 2011 at 19:19

      indeed. That’s an annoying bug. the arch-all binaries are exposed with the full set of architectures supported on those distributions, so we see armel, sparc, ia64, powerpc.. even for virtual PPAs. I filed a bug for that months ago: LP #645921

    • January 6, 2011 at 13:57

      This is because of a bit of a conflict between the internal and the exposed model. Internally a BinaryPackagePublishingHistory references a BinaryPackageRelease, which represents a particular .deb. The BPPH download count methods delegate to the BPR.

      Because of the way APT archives are structured, it is the .deb that we can track downloads for — we can’t tell which architecture, or which series. So the download count must be stored per-BPR. For architecture-dependent packages this is fine, since each BPPH has a different BPR. But architecture-independent packages have a single BPR shared among the BPPHs, so the results appear duplicated.

      It’s not terribly easy to change the model on Launchpad’s end. It seems like it should be easy enough to avoid problems on the client side, though.

  2. January 5, 2011 at 19:00

    Can you make the chart generation code public?

    • fta
      January 5, 2011 at 19:20

      I will. It’s just not ready at the moment.

    • Thierry Carrez
      January 7, 2011 at 12:56

      +1 !

  3. stefansundin
    September 26, 2014 at 05:21

    I made a userscript to view download count: https://gist.github.com/stefansundin/f9df6c5e0fd184c60709

  1. April 29, 2011 at 20:07

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: