Engineering for Quality

I have been thinking a lot about the process of debugging: that moment when you’re aware that there’s a problem but you don’t quite know where and are trying to nail it down so it can be resolved. Its something you can learn in general principles, and tends to be very specific to your product when you’re down to the details. I am developing some internal teaching here, as this knowledge seems to be gathered in fits and starts – and it would benefit from some common understanding. How to think about the problem, how to break it down (isolation), and how to reason about software with which you may not be familiar just seems to be a bit of a gap I’m working to close.

In doing this, I want to put the “why” in perspective – to take it from first principles – and share a holistic vision of what we’re after: Engineering for Quality.

I think we’re all on the same page now with the idea that any software will have defects, and the the purpose of a quality program is to reduce the cost of those defects. There’s a lovely grey area that often turns into religious wars about missing features vs. defects that I’m going to leap over and utterly ignore.

The oft-cited research in Formal Methods for Executable Software Models from the IBM System Sciences Institute:


Relative Costs to Fix Software Defects (Source: IBM Systems Sciences Institute)


The takeaway is simple: fixing bugs earlier in the development process costs a crap-ton less.

The organization structures to achieve this are, and have been, changing. A decade and more ago there were dedicated, separate teams for testing vs. development. As with most things in the technology industry, that’s changed dramatically, with various hype cycles and organization experiments working through variants more of less aggressively and effectively.

Organization structure impacts product development tremendously (the aphorism that software is defined by org structure is far more true than most people realize), but regardless of how individuals are collected into teams, there are common tasks regardless of the self-identity (tester vs SDE vs. developer) of the individual doing the work at any given moment.

Trends in team based software development in the last few years have pretty dramatically impacted the engineering processes used for identifying defects and dealing with the them. The most impactful of these trends include:

  • smaller teams with more focused product/service responsibilities. The proverbial two-pizza team that’s responsible for not only developing a service, but also running it.
  • faster iterations and smaller scopes in development cycles. The design – implement – deliver sequence (aka “agile” development)
  • the concept of test driven development, and driving some quality gates and assumption to the very front of the development process
  • the expansion of continuous integration tools driving automated testing via pull request annotations and quality gates.
  • The result of these trends is “quality engineering” responsibilities have migrated from separate teams to embedded teams or overlapping responsibilities for all developers. What had been separate teams are being overlapped to some degree.

The effects of this overlap have set new expectations that are frequently “understood”, but not often written down. Yes, implying that level of understanding is not at all consistent. These new expectations for individuals responsible for quality include:

  • Understand code architecture. Not just understand the general architecture, but can read and trace code paths to understand what software is doing.
  • Can design, reason, and write and execute code, and should be growing their skills to leverage programming to do their jobs more efficiently.
  • There’s significantly less documentation written before and after the development process, with the code and tests for the code becoming the source of truth for what’s expected in the software, from internal structures to overall feature capabilities.
  • Knowing what the code does is no longer sufficient, as how the code is built, assembled, and configured is now expected. Especially as we continue to see more distributed systems development, the whole develop/build/test cycle is now required as a common understanding across teams.

Software tooling is accelerating these trends. Cloud computing has moved services to be easily replicated, software defined, and disposable. Combined with continuous integration and continuous delivery tools such at TravisCI, Jenkins, Semaphore.IO, Concourse, etc the art and science of testing now commonly spans the breadth of the development process. Where it started with SaaS based services, now created-as-a-product projects are taking advantage of these tools. The end result of which is testing happening more frequently, generating more information, and getting used both earlier in and different ways in the development process.

What had been quality reports are getting cycled earlier into the development process as quality gates. Product requirement shifts are accelerating with the “desire to have improved business agility“, so even internal release goals are constantly shifting and being re-evaluated.

Bringing this back a bit, I thought I’d write down what I look for and what I expect when I am discussing these responsibilities. To keep it simple, I’ll limit this to the responsibilities of someone focused entirely and only on product quality:

The primary job responsibility is creating information about the quality of the product.

In the course of pursuing this goal, I’d expect someone to be able to design and develop tests, integrate those tests into a CI system, generate and collect data and metrics, and refine that raw data into knowledge and information in multiple forms.

Along with these duties, I would expect the individual to learn the architecture, responsibilities, and function of all the components making up the product, being able to perform consistent debugging and isolation on unexpected results to the logical components.

The information to be shared provided includes analysis of quality and expected of the product and components of the product from multiple viewpoints and leveraging object metrics including coverage from use-cases, APIs, and code coverage. Summary analysis from continuous integration systems and quality gates are expected to be included in repeated aggregate form and coupled to the build and development process. When unexpected results are encountered, then detailed analysis identifying objective, easily repeatable steps to reproduce the result along with all debugging and isolation steps to reduce the scope of any further search for resolution.

Analysis provided must be based on objective and measurable mechanisms, clearly and effectively communicated in written, and when relevant graphical and verbal form, and technically accurate.