I’ve been using continuous integration, and continuous deployment, across multiple languages, companies, and teams to drive and verify the quality of development. The most frequently visited piece in this blog is a walk-through for setting up continuous integration for python projects with Jenkins.
CI systems have evolved and expanded over time. From the earliest days of buildbot and cruisecontrol to modern systems leveraging jenkins or hosted services like TravisCI and CloudBees the capabilities have been growing, becoming more diverse, and more capable. We have evolved from just compiling the code to doing static analysis, running unit and functional tests in addition to full system composition from multiple repositories, and running full integration tests.
Here are some of the most important lessons I’ve learned in using these systems. If you’re not doing these today, why not?
1) Keep all the logic for building your code in scripts in your code repositories/source control.
It’s super easy to fall into the trap of putting complex logic into tools like Jenkins. It’s one of the earliest, and easiest, anti-patterns to hit. The dependencies needed for you build as it grows and changes over time, and the logic as you start having dependent builds (if you separate out core libraries), grows almost silently until you end up with a build server that is a special snowflake and source of fear. Fragile, easy to break, and hard to set up.
The single best remedy I’ve seen for this is to not let anything into your build server that you can’t set up with a script, and set up your build system entirely with some software configuration management. Ansible, puppet, or chef – make sure that you can reproduce the whole functionality at a moments notice from some git repository. If you’re just setting up your build system now, or even just refactoring it or adding to it, do this now. Don’t put it off.
2) Gate merging code on the results of tests, and mercilessly hunt down and resolve unstable tests.
Our “unit test” frameworks and CI systems have evolved to the point that we’re as often running functional tests with unit test frameworks. The difference being that you’re as like to be expecting dependent services to be available for your tests, and working the code down and through those dependencies. Common services like MongoDB, MySQL, RabbitMQ, etc are the most common.
Our testing infrastructure can also be where those systems get stressed the most – setup and teardown, many builds running in parallel and vying for resources, and the inevitable “scrounge computing” that often goes into setting up build infrastructure means that resources available for your builds may well be far, far below what a developer has available to them on their laptop while working.
Asynchronous tests often don’t have all the callbacks or hooks to know when setup it complete, so you get little hacks like “sleep 5” scattered in your code – only those values can end up being massively different between a developer’s laptop and the tiny VM or container that’s running in your public or private cloud and hosting your builds. The worst part is that its hard to debug a race condition failure from a low-resource failure, and in many cases low-resources will acerbate a race condition.
Do everything you can to make setup and tear down of your dependent services consistent, and know when they complete (this is something that TravisCI handles quite well). You will probably have some “sleep 5″s in there, but be ruthless about allowing them in or expanding the values, always questioning a change.
3) Speed of your gates will directly impact developer velocity and productivity, keep it lean and be consistent with mocks or dummies in your testing code.
The dark side of really good quality gates in continuous integration is that developers start using them; relying on them like a crutch. The old “it’s taking 30 minutes to compile” is slowly being replaced by “I’m waiting for the CI system to do it’s run against my PR”. This will vary by development team culture more than anything, but ruthlessly drive down the time it takes to run tests, and my rule of thumb is that good gating tests should never exceed roughly 15-20 minutes of total run time.
Developer’s relish being lazy, so why not just push up that branch and submit a pull request to see how it goes so they don’t have to run the tests themselves. I’ve not found it effective to stop this, but simply to be aware that it’s there. Treating the time to build and verify as a metric you watch, and then dealing with that as technical debt as it grow, is something you should keep in mind. Periodic refactoring the tests, or even the whole build process, will pay off in overall developer productivity, especially as the team contributing to the code grows.
As a general rule, most developers hate to document anything, but relish learning new tricks – so I’ve found encouraging this socially during code reviews to be a tremendously effective pattern. Make it part of your development team’s culture, and it will reinforce itself as they see the wins and value.
4) Make sure all the developers can reproduce what gets run in the CI system by themselves.
The other side of that “good gates crutch” is that you’ll get some developers throwing up their hands or pointing fingers, saying they can’t reproduce the issue and it “works for them”. Don’t hide any scripts, and don’t restrict your developers from resources to reproduce the issues. Ideally any developer on your team can reproduce the entire build system in a few minutes, and be able to manually invoke any scripts that the automated systems run.
On top of this, encourage mocks and dummies, commenting the hell out of them in the test code, and setting up patterns in your tests that all can follow. A good example in your code repository is worth more than several pages of written documentation external to the code. Additionally, the less you depend on external resources for tests, the faster the tests can actually be to validating core logic.
Most importantly you want to keep quality accountability “all the way up front – with developers writing the code”. If the tests work on the developer’s laptop, but not on the build server, have the developer run the tests on the build server or their own instance they created when you made the whole build system reproducible, but don’t let that accountability slip or devolve into finger pointing.
5) Cascade your builds and tests, driving full integration tests with CI systems.
It’s critical to verifying overall system quality that you build, deploy, configure, and run your code as a whole system. As your code and teams grow, you’ll have multiple repositories to coordinate, and external dependencies that will be constantly growing with you. This setup, and these tests, will take the longest, are the trickiest to debug, consume the most resources, and frankly provide the most value.
I’ve found that It’s worth gating pull requests on an exceptionally stable, minimal set of full system integration tests – the proverbial “smoke test” if you will. Choosing what to add into this set of tests, and what the remove, will be a constant balancing act for what’s proving stable and where the consistent pain points are in your development path.
It’s worth also having longer running regression tests. The set of tests where you add tests as you find bugs, or tests you automate from feature acceptance testing, going through full end-user scenarios. These are also the ideal tests to look for memory leaks, do fault injection on distributed systems, and in general make sure the system *fails* as well as works in the way you intend.
Adding this effort up front – validating end-user features and scenarios with acceptance testing – takes more time, but pays off in the end. If you don’t do this, you have implicit, ever growing technical debt with a fairly high interest rate: people time. It’s the most expensive debt you have, and you’ll spend it periodically dedicating teams to regression testing.
The side effect you won’t see until you have this operational is increased confidence and agility with your product development. When you have confidence in your systems working as designed, you can be faster and freer in changing underlying systems, knowing you can easily verify everything still works or finding the relevant issues fast while you’re doing development. If you have acceptance tests and user scenarios fully automated, your whole product can move far more agilely and effectively, evolving with the business needs.
6) Develop, trend, and review benchmarks about your product from your CI systems.
Use a service like DataDog, or a local instance of graphite, but plan for, develop, and then watch benchmark metrics in and from your code. I mentioned earlier the time it takes to run your “merge gates” as a meaningful metric, and there are metrics in your applications that will be just as important to running in production. Memory footprints, CPU and IO budgets, tracing of calls and timing for user interactions and computationally intensive functions.
This enables you to really judge the efficacy of optimizations, as well as search for and find unexpected failures in complex systems. With cloud services, we are finally able to ascribe meaningful and useful costs associated with all of this. These aren’t just development measures, but business measures as we can balance responsiveness of the system, reliability, and cost to run the services.