spelunking swift package manager – Workspaces

I finished up my last bug fix and tackled another. SR-4261 took me deep into some new areas of the code base that I wasn’t familiar with. I did more spelunking to have a decent clue of what the code was doing before making the fix. The results of the spelunking are this post.

Many of the recent changes to SwiftPM have been about managing the workspace of a project – pinning dependencies, updating and editing dependencies, and handling that whole space of interactions. These are exposed
as commands under swift package, such as swift package edit and swift package pin.

In the codebase, most of the high level logic for these commands resides in the target Workspace.

If you want a map of all the targets in the project, I included a graph of targets in my first round of spelunking swift package manager.

There are a number of interesting classes in Workspace, but the ones I focused on
were related to manifests and managed dependencies: Manifest and ManagedDependency. These two are combined in Workspace in DependencyManifest which represents SwiftPM’s knowledge of the workspace, its current state, and provides the means to manipulate the workspace with commands like edit and pin.

When swift package manager wants to manipulate the workspace, the logic
starts out loading the current state. This pretty consistently starts with loading
up the main targets by invoking loadRootManifests. This in turn uses the manifest loading logic and an interesting (newish mechanism to SwiftPM) piece: a DiagnosticsEngine. The DiagnosticsEngine collects errors and can emit interesting details for tooling wanting to provide more UI or feedback information.

loadRootManifests loads this from a list of AbsolutePath that gets passed in from the swift CLI – generally the current working directory of your project. In any case, loadRootManifests returns an array of Manifest, which is the key to loading up information about the rest of the workspace.

The next step is often loadDependencyManifests that returns an instance of DependencyManifests. This does the work of loading the dependencies needed to create the holistic view of the project to date and load the relevant state. Loading the ManagedDependencies leverages the class LoadableResult. LoadableResult is a generic class for loading persistence from a JSON file – Pins and ManagedDependencies are both loaded using it. In the case of ManagedDependencies, it loads from the file dependencies-state.json, which includes loading up the current state, current path, and relevant details about the repository. That file, along with validating the relevant repository exists (using validateEditedPackages) at the correct location on disk, also indicates any packages in the “edit” state.

Pins work a bit differently, being stored in the source path. This is to allow them to be included in the project source in order to concretely specify versions or constraints to versions for each dependency. The ManagedDependencies are maintained in the
working directory for builds, not expected to be in source control, and represent the state of things on your local machine.

The way that SwiftPM handles dependency resolution is by using a collection of RepositoryPackageConstraint and a constraint solver to resolveDependencies. The DependencyResolver is it’s own separate thing under the PackageGraph package. I have not yet dug into it. Most notably, The DependencyResolver will throw an error if it’s unable to resolve the constraints provided to it – and that’s key to the heart of the bug SR-4261, which is about adding in missing constraints for edited packages when invoking the pin command.

Using 3 Tiers of Continuous Integration

A bit over a year ago, I wrote out Six rules for setting up continuous integration, which received a fair bit of attention. One of the items I called out was the speed of tests, suggesting to keep it to around 15-20 minutes to encourage and promote developer velocity.

A few weeks ago, the team at SemaphoreCI used the word “proper” in a similar blog post, and suggested that a timing marker should be 10 minutes. This maps pretty directly to a new feature they’re promoting to review the speed of their CI system as it applies to your code.

I disagree a bit with the 10 minute assertion, only in that it is too simplistic. Rather than a straight “10 minute” rule, let me suggest a more comprehensive solution:

Tier your tests

Not all tests are equal, and you don’t want to treat them as such. I have found 3 tiers to be extremely effective, assuming you’re working on a system that can have some deep complexity to review.

The first tier: “smoke test”. Lots of projects have this concept and it is a good one to optimize for. This is where the 10 minute rule that SemaphoreCI is promoting is a fine rule of thumb. You want to verify as much of the most common paths of functionality that you can, within a time-boxed boundary – you pick the time. This is the same tier that I generally encourage for pull request feedback testing, and I try to include all unit testing into this tier, as well as some functional testing and even integration tests. If these tests don’t pass, I’ve generally assumed that a pull request isn’t acceptable for merge – which makes this tier a quality gate.

I also recommend you have a consistent and clearly documented means of letting a developer run all of these tests themselves, without having to resort to your CI system. The CI system provides a needed “source of truth”, but it behooves you to expose all the detail of what this tier does, and how it does it, so that you don’t run into blocks where a developer can’t reproduce the issue locally to debug and resolve it.

The second tier: “regression tests”. Once you acknowledge that timely feedback is critical and pick a marker, you may start to find some testing scenarios (especially in larger systems) that take longer to validate than the time you’ve allowed. Some you’ll include in the first tier, where you can fit things to the time box you’ve set – but the rest should live somewhere and get run at some point. These are often the corner cases, the regression tests, integration tests, upgrade tests, and so forth. In my experience, running these consistently (or even continuously) is valuable – so this is often the “nightly build & test” sequence. This is the tier that starts “after the fact” feedback, and as you’re building a CI system, you should consider how you want to handle it when something doesn’t pass these tests.

If you’re doing continuous deployment to a web service then I recommend you have this entire set “pass” prior to rolling out software from a staging environment to production. You can batch up commits that have been validated from the first tier, pull them all together, and then only promote to your live service once these have passed.

If you’re developing a service or library that someone else will install and use, then I recommend running these tests continuously on your master or release branch, and if any fail then consider what your process needs to accommodate: Do you want to freeze any additional commits until the tests are fixed? Do you want to revert the offending commits?  Or do you open a bug that you consider a “release blocker” that has to be resolved before your next release?

An aside here on “flakes“. The reason I recommend running the second tier of tests on a continuous basis is to keep a close eye on an unfortunate reality of complex systems and testing: Flakey Tests. Flakes an invaluable for feedback, and often a complete pain to track down. These are the tests that “mostly pass”, but don’t always return consistently. As you build into more asynchronous systems, these become more prevalent – from insufficient resources (such as CPU, Disk IO or Network IO on your test systems) to race conditions that only appear periodically. I recommend you take the feedback from this tier seriously, and collect information that allows you to identify flakey tests over time. Flakes can happen at any tier, and are the worst in first tier. When I find a flakey test in the first tier, I evaluate if it should “stop the whole train” – freeze the commits until it’s resolved, or if I should move it into a second tier and open a bug. It’s up to you and your development process, but think about how you want to handle it and have a plan.

The third tier: “deep regression, matrix and performance tests”. You may not always have this tier, but if you have acceptance or release validation that takes an extended amount of time (such as over say a few hours) to complete, then consider shifting it back into another tier. This is also the tier where I tend to handle the (time consuming) and complex matrixes when they apply. In my experience, if you’re testing across some matrix or configurations (be that software or hardware), the resources are generally constrained and the testing scenarios head towards asymptotic in terms of time. As a rule of thumb, I don’t include “release blockers” in this tier – it’s more about thoroughly describing your code. Benchmarks, matrix validations that wouldn’t block a release, and performance characterization all fall into this tier. I recommend if you have this tier that you run it prior to ever major release, and if resources allow on a recurring periodic basis to enable you to build trends of system characterizations.

There’s a good argument for “duration testing” that sort of fits between the second and third tiers. If you have tests where you want to validate a system operating over an extended period of time, where you validate availability, recovery, and system resource consumption – like looking for memory leaks – then you might want to consider failing some of these tests as a release blocker. I’ve generally found that I can find memory leaks within a couple of hours, but if you’re working with a system that will deployed where intervention is prohibitively expensive, then you might want to consider duration tests to validate operation and even chaos-monkey style recover of services over longer periods of time. Slow memory leaks, pernicious deadlocks, and distributed system recovery failures are all types of tests that are immensely valuable, but take a long “wall clock” time to run.

Reporting on your tiers

As you build our your continuous integration environment, take the time and plan and implement reporting for your tiers as well. The first tier is generally easiest – it’s what most CI systems do with annotating in pull requests. The second and third third take more time and resources. You want to watch for flakey tests, collecting and analyzing failures. More complex open source systems look towards their release process to help coral this information – OpenStack uses Zuul (unfortunately rather OpenStack specific, but they’re apparently working on that), Kubernetes has Gubernator and TestGrid. When I was at Nebula, we invested in a service that collected and collated test results across all our tiers and reported not just the success/failure of the latest run but also a history of success failure to help us spot those flakes I mentioned above.

 

Using docker to build and test SwiftPM

I sat down to claw my way through blog posts, examples, and docker’s documentation to come up with a way to build and test Swift Package Manager on Linux as well as the Mac.

There are a number of ways to accomplish this; I will just present one. You are welcome to use it or not and any variations on the theme should not be difficult to sort out.

First, you’ll need a docker image with the Linux swift toolchain installed. IBM has a swift docker image you can use, and recently another was announced and made available called ‘swiftdocker‘ which is a Swift 3.0.2 release for Ubuntu 16.04. I riffed on IBM’s code and my previous notes for creating a swift development environment to build the latest master-branch toolchain into an Ubuntu 16.04 based image. If you want to follow along, you can snag the Dockerfile and related scripts from my vagrant-ubuntu-swift-dev github repo and build your own locally. The image is 1.39GB in size and named swiftpm-docker-1604.

SwiftPM is a bit of a special snowflake compared to other server-side swift software – in particular, it’s part of the toolchain itself, so working on it involves some bootstrapping  so that you can get to the moral equivalent of swift build and swift test. Because of that setup, you leverage a script they created to get run the bootstrapping: Utilities/bootstrap.

SwiftPM has also “moved ahead” a bit and leveraged newer capabilities – so if you want to build off the master branch, you’ll need the swift-3.1 toolchain, or the nightly snapshot release to do the lifting. The current 3.0.2 release won’t do the trick. The nightly snapshots are NOT released versions, so there is some measure of risk and potential breakage – it has been pretty good for me so far – and necessary for working on SwiftPM.

On to the command! To build and test swiftpm from a copy of source locally

docker run -t --rm -v $(pwd):/data:rw -w /data swiftpm-docker-1604 \
/bin/bash -c "./Utilities/bootstrap && .build/debug/swift-test --parallel"

To break this down, since Docker’s somewhat insane about command-line options:

  • -t indicates to allocate a TTY for this process
  • --rm makes the results of what we do ephemeral, as opposed to updating the image
  • -v $(pwd):/data:rw is the local volume mount that makes the current local directory appear as /data within the image, and makes it Read/Write
  • -w /data leverages that volume mount to make /data the current working directory for any commands
  • swiftpm-docker-1604 is the name of the docker image that I make and update as needed
  • /bin/bash -c "..." is how I pass in multiple commands to be run, since i want to first run through the bootstrap, but then shift over to using .build/debug/swift-test --parallel for a little more speed (and a lot less output) in running the tests.

The /bin/bash -c "..." bits could easily be replaced with ./Utilities/bootstrap test to the same end effort, but a touch slower in overall execution time.

When this is done, it leaves the “build for Linux” executables (and intermediate files) in the .build directory, so if you try and run it locally on a Mac, it won’t be happy. To be “safe”, I recommend you clear the .build directory and rebuild it locally if you want to test on MacOS. It just so happens that ./Utilities/bootstrap clean does that.

A few numbers on how (roughly) long this takes – on my mid-2012 MacBook Air

  • example command above: 5 minutes 15.887 seconds
  • using /bin/bash -c "./Utilities/bootstrap test": 6 minutes, 15.853 seconds
  • a build using the same toolchain, but just MacOS locally: 4 minutes 13.558 seconds

 

Adding thread safety in Swift 3

One of the pieces that I’ve brushed up against recently, but didn’t understand in any great depth, were techniques for making various sections of code thread-safe. There are some excellent articles out there on it, so if you’re looking and found this, let me provide a few references:

In Jun 2016 Matt Gallagher wrote Mutexes and closure capture in Swift, which was oriented towards how to optimize beyond the “obvious advice”, and a great read for the more curious.

The “usual advice” referenced a StackOverflow question: What is the Swift equivalent to Objective-C’s @synchronized? that as a technique spans swift 2 and 3 – the gist being to leverage the libDispatch project, creating a dispatch queue to handle synchronization and access to shared data structures. I found another question+answer on StackOverflow to be a bit easier to read and understand: Adding items to Swift array across multiple threads causing issues…, and matched a bit more of the technique that you can spot in Swift Package Manager code.

One of the interesting quirks in Swift 3 is that let myQueue = DispatchQueue(label: "my queue") returns a serialized queue, as opposed to the asynchronous queue, which you can get invoking DispatchQueue.global(). I’m not sure where in the formal documentation that default information appears – the guides I found to Grand Central Dispatch are all still written for C and Objective-C primarily, so the mapping of Swift libraries to those structures wasn’t at all obvious to me. I particularly liked the answer that described this detail, as it was some of the better descriptions that mapped to the Apple developer guides on concurrency (and which desperately needs to be updated to related to Swift IMO).

Circling back to Matt Gallagher’s piece, the overhead of the scope capture in passing through the closure was more than he wanted to see – although it was fine for my needs. None the less, his details are also available in github at CwlUtils, which has a number of other interesting pieces and tidbits in it that I’ll circle back later to look at in depth.

 

spelunking Swift Package Manager

In the slowly building sequence of my swift dev diaries, I wrote about how to set up a swift development environment, and noted some details I gleaned from the SwiftPM slack channel about how to make a swift 3.0/3.1 binary “portable”. This likely will not be an issue in another year, as the plans for SwiftPM under swift 4 include being able to make statically compiled binaries.

Anyway, the focal point for all this was an excuse to learn and start seriously using swift, and since I’ve been a lot more server-side than not for the past years, I came at it from that direction. With the help of the guys writing swift package manager, I picked a few bugs and started working on them.

Turns out Swift Package Manager is a fairly complex beast. Like any other project, a seemingly simple bug leads to a lot of tendrils through the code. So this weekend, I decided I would dive in deep and share what I learned while spelunking.

I started from the bug SR-3275 (which was semi-resolved by the time I got there), but I decided to try and “make it better”. SwiftPM uses git and a number of other command line tools. Before it uses it, it would be nice to check to make sure they’re there, and if they’re not – to fail gracefully, or at least informatively. My first cut at this  improved the error output and exposed me to some of the complexity under the covers. Git is used in several parts of SwiftPM (as are other tools). It has definitely grown organically with the code, so it’s not entirely consistent. SwiftPM includes multiple binary command-line tools, and several of them use Git to validate dependencies, check them out if needed, and so forth.

The package structure of SwiftPM itself is fairly complex, with multiple separate internal packages and dependencies. I made a map because it’s easier for me to understand things when I scrawl them out.

swiftpm

swiftpm is the PDF version (probably more readable) which I generated by taking the output of swift package --dump-package and converting it to a graphviz digraph, rendering it with my old friend OmniGraffle.

The command line tools (swift build, swift package, etc) all use a common package Commands, which in turn relies and the various underlying pieces. Where I started was nicely encapsulate in the package SourceControl, but I soon realized that what I was fiddling with was a cross-cutting concern. The actual usage of external command line tools happens in Utility, the relevant component being Process.swift.

Utility has a close neighbor: Basic, and the two seem significantly overlapping to me. From what I’ve gathered, Utility is the “we’re not sure if it’s in the right place” grouping, and Basic contains the more stabilized, structured code. Process seems to be in the grey area between those two groupings, and is getting some additional love to stabilize it even as I’m writing this.

All the CLI tools use a base class called SwiftTool, which sets a common structure. SwiftTool has an init method that sets up CLI arguments and processes the ones provided into an Options object that can then get passed down in the relevant code that does the lifting. ArgumentParser and ArgumentBinder do most of that lifting. Each subclass of SwiftTool has its own runImpl method, using relevant underlying packages and dependencies. run invokes the relevant code in runImpl. If any errors get thrown, it deals with them (regardless of the CLI invoking)  with a method called handle in Error. It is mostly printing some (hopefully useful) error message and then exiting with an error code.

My goal is to check for required files – that they exist and are executable – and report consistently and clearly if they’re not. From what I’ve learned so far that seems to be best implemented by creating a relevant subclass (or enum declaration) of Swift.Error and throwing when things are awry, letting the _handle implementation in Commands.Error take care of printing things consistently.

So on to the file checking parts!

The “file exists” seemed to have two implementations, one in Basic.Filesystem and another in Basic.Pathshims.

After a bit more reading and chatting with Ankit, it became clear that Filesystem was the desired common way of doing things, and Pathshims the original “let’s just get this working” stuff. It is slowly migrating and stabilizing over to Filesystem. There are a few pieces of the Filesystem implementation that directly use Pathshims, so I expect a possible follow-up will be to collapse those together, and finally get rid of Pathshims entirely.

The “file is executable check” didn’t yet exist, although there were a couple of notes that it would be good to have. Similar checking was in UserToolchain, including searching environment paths for the executable. It seemed like it needed to be pulled out from there and moved to a more central location, so it could be used in any of the subclasses of SwiftTool.

At this point, I’ve put up a pull request to add in the executable check to Filesystem, and another to migrate the “search paths” logic into Utility. I need to tweak those and finish up adding the tests and comments, and then I’ll head back to adding in the checks to the various subclasses of SwiftTool that could use the error checking.

Sabbatical

I took this panoramic over a month ago now, when the crocus’ weren’t even opening yet. A pretty spectacular weekend, and the view of Lake Union and east to the Cascades was amazing.

It seemed like a damn good photo to start my sabbatical with. As I’m posting this, I’m just beginning a sabbatical. I’ve been wanting to do this for over a decade, and this year its feasible. Learning, Playing, and Travel are the top contenders for my time. I have been making lists for nearly 3 months about “what I could do” – I’ll never get to it all, but it makes a good target.

I’ve started some art classes, which I’m inordinately nervous about, but have been trying to work on for months now. I figured a full-out class would enforce some structure around what I’m doing, and hopefully lead to some interesting things. I have been loving the new camera on the iPhone 7 and posting photography experiments to facebook and twitter. It’ll be interesting to see if I can fuse a few different arts together.

I can’t even begin to think of not being a geek, so I am also contributing to a couple  open source efforts. I’m now a contributor to Apple’s swift project: just fixing a bug or two at this point, but it’s a start and gives me a reason to dig more into the language. I may do the same with Kubernetes a bit later, and expand on learning the Go language.

Travel is also in the plan – although like usual I’ll post more about where I’ve been than “where I am”, so don’t expect updates until after the fact for the travel. Lots of things to see on the horizon…

HOW TO: making a portable binary with swift

Over the weekend I was working with Vapor, trying it out and learning a bit about the libraries. Vapor leverages a library called LibreSSL to provide TLS to web services, so when you compile the project, you get a binary and a dynamic library that it uses.

The interesting part here is that if you move the directory that contains the “built bits”, the program ceases to function, reporting an error that it can’t find the dynamic library. You can see this in my simple test case, with the code from https://github.com/heckj/vaportst.

git clone https://github.com/heckj/vaportst
swift build -c release
mv .build/release newlocation
./newlocation/App version

then throws an error:

dyld: Library not loaded: /Users/heckj/src/vaportst/.build/release/libCLibreSSL.dylib
 Referenced from: /Users/heckj/src/vaportst/newlocation/App
 Reason: image not found
Abort trap: 6

It turns out that with Swift 3 (an Swift 3.1 that’s coming), the compiler adds the static path to the dynamic library, and there’s an interesting tool, called the install_name_tool that can modify that from a static path to a dynamic path.

Norio Nomura was kind enough to give me the exact syntax:

install_name_tool -change /Users/heckj/src/vaportst/.build/release/libCLibreSSL.dylib @rpath/libCLibreSSL.dylib .build/release/App
install_name_tool -add_rpath @executable_path .build/release/App

This tool is specific to MacOS, as the linux dynamic loader works slightly differently. As long as the dynamic library is in the same directory as the binary, or the environment variable LD_LIBRARY_PATH is set to the directory containing the dynamic libraries, it’ll get loaded just fine on Linux.

Swift today doesn’t provide a means to create a statically linked binary (it is an open feature request: SR-648). It looks like that may be an option in the future as the comments in the bug show progress towards this goal. The whole issue of dynamic loading becomes moot, at the cost of larger binaries – but it is an incredible boon when dealing with containers and particularly looking towards running “server side swift”.

Kubernetes community crucible

I’ve been watching and lurking on the edges of the Kubernetes community for this past cycle of development. We are closing on the feature freezes for the 1.6 release, and it is fascinating to watch the community evolve.

These next several upcoming releases are a crucible for Kubernetes as a project and community. They have moved from Google to the CNCF, but the reality of the responsibility transition is still in progress. They made their first large moves and efforts to handle the explosive growth of interest in their project and the corresponding expansion of community. The Kubernetes 1.6 release was supposed to be a “few features, but mostly stability” after the heavy changes that led into 1.5, recognizing a lot of change has hit and stabilization is needed. This is true technically as well as for the community itself.

The work is going well: the SIG’s are delegating responsibility pretty effectively and most everyone is working to pull things forward. That is not to say it is going perfectly. The crucible isn’t how it dealt with the growth, but how it deals with the failures and faults of the efforts to handle that growth. One highlight is that although 1.6 was supposed to be about stability, the test flakiness has risen again. The conversation in last week’s community meeting highlighted it, as well as discussed what could be done to shift back to reliability and stability as a key ingredient. Other issues include project-wide impacts that SIG’s can have, the resolution to that being a lot of the community project and product manager focus over the past months.

This week’s DevOpsWeekly newsletter also highlighted some end user feedback: some enthusiastically pro-Kubernetes, another not so much. The two posts are sizzling plates of feedback goodness for the future of Kubernetes – explanations of what’s effective, what’s confusing, and what could be better – product management gold.

  • Thoughts on Kubernetes by Nelson Elhage is an “enthusiastic for the future of the project” account from a new user who clearly did his research and understood the system, pointing out rough edges and weak points on the user experience.
  • Why I’m leaving Kubernetes for Swarm by Jonathan Kosgei highlights the rougher-than-most edges that exist around the concept of “ingress” when Kubernetes is being used outside of the large cloud providers, highlighting that Docker swarm has stitched together a pretty effective end-user story and experience here.

I think the project is doing a lot of good things, and have been impressed with the efforts, professionalism, and personalities of the team moving Kubernetes forward. There is a lot of passion and desire to see the right thing done, and a nice acknowledgement to our myopic tendencies at times when holistic or strategic thoughts are needed. I hope the flaky tests get sorted without as much pain as the last round, and the community grows from the combined efforts. I want to see them alloyed in this crucible and come out stronger for it.

Benchmarking etcd 3.0 – an excellent example of how to benchmark a service

Last week, the CoreOS team posted a benchmark review of etcd 3.0 on their blog. Gyo-Ho Lee was the author, and clearly the primary committer to the effort – and he did an amazing job.

First and foremost is that the benchmark is entirely open, clear, and reproducible. All the code for this effort is in a git repository dedicated to the purpose: dbtester, and the test results (also in the repository) and how to run the tests are all detailed. The benchmarking code was created for the purpose of benchmarking this specific kind of service, and used to compare etcd, zookeeper, and consul. CoreOS did a tremendous service making this public, and I hope it gave them a concrete dashboard for their development improvements while they iterated on etcd to 3.0.

What Gyo-Ho Lee did within the benchmark is what makes this an amazing example: He reviewed the performance of the target against multiple dimensions. Too many benchmarks, especially ones presenting in marketing materials, are simple graphs highlighting a single dimension – and utterly opaque as to how they got there. The etcd3 benchmark reviews itself, zookeeper, and consult against multiple dimensions memory, cpu, and diskIO. The raw data that backed the blog post is committed into the repo under “test-results”. It is reasonably representative (writing 1,000,000 keys to the backend) and tracked time to complete, memory consumed, CPU consumed, and disk IO consumed during the process.

I haven’t looked at the code to see how re-usable it might be – I would love to see more benchmarks with different actions, and a comparison to how this operates in production (in cluster mode) – but these wishes are just variations on the theme, and not a complaint to the work done so far.

As an industry, as we build more with containers, this kind of benchmarking is exactly what we need. We’re composing distributed services now more than ever, and knowing the qualities of how these systems or containers will operate is as critical as any other correctness validation efforts.

How to measure your open source project

Before I get into the content, let me preface this with this is a WORK IN PROGRESS. I have been contributing and collaborating using Open Source for quite a while now, and my recent work is focused on helping people collaborate using open source as well as how to host and build open source projects.

I am a firm believer of “measure what you want to do” – to the point that even indirect measures can be helpful indicators. Most of this article isn’t something you want to react to on a day-to-day basis – this is about strategic health and engagement. A lot of the information takes time to gather, and even more time to show useful trends.

Some of this may be blindingly obvious, and hopefully a few ideas (or more) will be useful. These nuggets are what I’ve built up to help understand the “success” of an open source project I am involved with, or running. I did a number of google searches for similar topics when I was establishing my last few projects – there weren’t too many articles or hints out there like this, so hopefully this is a little “give back”.

There are two different – related but very separate – ways to look at your project in terms of “success”:

  • Consumption
  • Contribution

Consumption

Consumption is focused around “are folks effectively using this project”. The really awesome news in this space is that there’s been a lot of thinking about these kinds of measures – it’s the same question that anyone managing a product has been thinking about, so the same techniques to measure product success can be used.

If your project is a software library, then you will likely want to leverage public package management tooling that exists for the language of your choice: PyPi, CPAN, NPM, Swift Package Catalog, Go Search, etc. The trend of these package managers is heading towards relying on Github, Bitbucket, and Gitlab as a hosting location for the underlying data.

Some of the package indices provide some stats – npm, for example, can track downloads through NPM for the past day, week and month. A number of the package indicies will also have some measure of popularity where they try and relate consumption by some metric (examples: http://pypi-ranking.info/alltime, NPM keeps track of “how many dependents and stars a package has, and go-search tracks the number of stars on the underlying repo)

The underlying source system, like github, often have some metrics you can look at – and may have the best detail. Github “stars”, which is a social indicator that’s useful to pay attention to, and any given given repository has graphs that provide some basic analytics as well (although you need to be an admin on that repo to see it under the URI graphs/traffic). Github also recently introduced the concept of “dependents” in its graphs – although I’ve not seen it populated with any details as yet. Traffic includes simple line charts for the past two weeks covering:

  • Clones
  • Unique Cloners
  • Views
  • Unique Visitors

Documentation hosted on the Internet is another excellent place to get some metrics, and is a reasonable correlation to people using your library. If you’re hosting documentation for your project through either github pages or ReadTheDocs, you can embed in a Google Analytics code so that you can get page views, unique users per month, and other standard website analysis from those sites.

I’ve often wished to know how often people where looking at just the README on the front page of a Github repository, without any further setup of docs or github pages, and Ilya Grgorik made a lovely solution in the form of ga-beacon (available on github). You can use this to host a project on Heroku (or your PaaS or IaaS of choice), and it’ll return a image and pass along the traffic to Google Analytics to get some of those basic details.

If your project is a more complete service, and most specifically if you have a UI or a reference implementation with a UI, that offers another possibility for collecting metrics. Libraries from a variety of languages can send data to Google Analytics, and with many web-based projects it’s as easy as embedding the Google Analytics code. This can get into a bit of a grey space – about what your monitoring and sending back. In the presentation entitled “Consider the Maintainer“, Nadia Eghbal cited a preference towards more information that I completely agree with – and if you create a demo/reference download that includes a UI, it seems quite reasonable to track metrics on that reference instance’s UI to know how many people are using it, and even what they’re using – although don’t confuse that with evidence about how people are using your project, it’s far more about people experimenting and exploring when you’re reviewing stats from a reference UI.

There is also StackOverflow (or one of the Stack Exchange variants) where your project might be discussed. If your project grows to the level of getting questions and answers on StackOverflow, it’s quite possible to start seeing tags either for your project, or for specifics of your project – at least if the project encourages Q&A there. You can get basic stats per tag, “viewed”, “active”, and “editors” as well as the number of questions with that tag, which can be a reasonably representation of consumption as well.

Often well before your project is big enough to warrant a StackOverflow tag, Google will know about people looking for details about your project. Leverage Google Trends to search for your project’s name, and you can even compare it to related projects if you want to see how you’re shaping up against possible competition, although pinning it down by query terms can be a real dark art.

Contribution

Measuring contribution is a much trickier game, and often more indirect, than consumption. The obvious starting point there are code contributions back to your project, but there’s other aspects to consider: bug reports, code reviews, and just conversational aspects – helping people through a mailing list, IRC or Slack channel. When I was involved with OpenStack there was a lot of conversation about what it meant to be a contributor and how to acknowledge it (in that case, for the purposes of voting on technical leadership within the project). Out of those conversations, the Stackalytics website was created by Mirantis to report on contributions in quite a large number of dimensions: Commits, Blueprints (OpenStack’s feature coordination document) drafted and completed, Emails sent, bugs filed and resolved, reviews, and translations.

Mirantis expanded Stackalytics to cover a number of ancillary projects: Kubernetes, Ansible, Docker, Open vSwitch, and so on. The code to Stackalytics itself is open source, available on github at https://github.com/openstack/stackalytics. I’m not aware of other tooling that provides this same sort of collection and reporting – it looks fairly complex to set up and run, but if you want to track contribution metrics this is an option.

For my own use, the number of pull requests submitted per week or month has been interesting, as has the number of bug reports submitted per week or month. I found it useful to distinguish between “pull requests from the known crew” and external pull requests – trying to break apart a sense of “new folks” from an existing core. If your project has a fairly rigorous review process, then time between creation and close or merging of a PR can also be interesting. Be a little careful with what you imply from it though, as causality for time delays is really hard to ping down and very dependent on your process for validating a PR.

There’s a grey area for a few metrics that lay somewhere between contribution and consumption – they don’t really assert one or the other cleanly, but are useful in trying to track the health of a project: mailing lists and chat channels. The number of subscribers to project mailing lists or forums, # of people subscribed to a project slack channel (or IRC channel). The conversation interfaces tend to be far more ephemeral – the numbers can vary up and down quite a bit – and by minute and hour as much as over days or weeks – but within that ebb and flow you can gather some trends of growth or shrinkage. It doesn’t tell you about quality of content, but just that people are subscribed is a useful baseline.

Metrics I Found Most Useful

I’ve thrown out a lot of pieces and parts, so let me also share what I found most useful. In these cases, the metrics were all measured weekly and tracked over time:

  • Github repo “stars” count
  • Number of mailing list/forum subscribers
  • Number of chat channel subscribers
  • Number of pull requests submitted
  • Number of bugs/issues filed
  • Number of clones of the repository
  • Number of visitors of the repository
  • Number of “sessions” for a documentation website
  • Number of “users” for a documentation website
  • Number of “pageviews” for a documentation website
  • Google Trend for project name

I don’t have any software to help that collection – a lot of this is just my own manual collection and analysis to date.

If you have metrics that you’ve found immensely valuable, I’d love to know about them and what you found useful from them – leave me a comment! I’ll circle back to this after a while and update the list of metrics from feedback.

Updates:

Right after I posted this, I saw a tweet from Ben Balter hinting at some of the future advances that Github is making to help provide for Community Management.

And following on that Bitgeria poked me about their efforts to provide software development analytics, and in particular their efforts at providing software to collect metrics on what it means to collaborate into their own open source effort: Grimoire Lab.

I also spotted an interesting algorithm that  used to compare open source AI libraries based on github metrics:

Aggregate Popularity = (30*contrib + 10*issues + 5*forks)*0.001