Some background before I get into what I see Kubernetes community doing.
When I say “Full loop performance automation”, I am talking about scaling up and down services and resources in response to application metrics – often response time, but many different dimensions of metrics are interesting for useful solutions. It is a sort of holy grail for modern applications; it’s not easy, and it is far from consistent or common. And although I’m talking about Kubernetes, this concept is applicable to more than cloud-based services.
it’s a control loop
At the heart this is a feedback control loop. To accomplish this kind of thing:
- You need data from your application to react against: API response times, or queue lengths are some of the most common – the best are fairly direct measures linked to improved response due to additional resources being made available.
- You need to capture that data, baseline it, and have some discriminator that will turn that into choices and action – adding a few more instances of the front end to get the better performance or the inverse: nobody is hitting this now, so reduce the number of instances down to a minimum. Most operations has so far has been obsessed with the first, although the second is arguably more valuable for controlling costs.
- You need a means to wrapping that choice – that discrimination – on to an action or into a control of some form: adding more instances, reducing instances, making relevant system connections and configurations, etc.
This has been feasible for a while, but it’s not been easy. It is dependent on what you can control, often through APIs. There are also limits to the feedback loops; sooner or later you bottleneck depending on what you’re scaling and it’s dependencies. If what you need to scale is not accommodated in your control loop, you can make changes but it’ll have no effect. This sets limits on what you can scale and how far you scale it. Prior to the world of “cloud computing” and IaaS, this was often the number of physical servers you had available, or possibly the memory in those servers, or the amount of underlying raw IO that those servers had to other storage systems. I have had lot of conversations about “IO budget”, “memory budgets”, and more as a key factors in building highly responsive and highly scaled systems.
what we can control through APIs has expanded
With VMware slicing up physical machines into virtual machines and then AWS taking the ball and running with it and creating cloud services – the thing that could be most easily automated was a virtual machine. Automated configuration of virtual machines often followed the pattern of physical machines. In general, the pattern has been assigning a single purpose and not changing it.
In fact, the most common devops tools for VM and bare metal configuration are almost antithetical to the concept or “scaling up and down”, at least within a virtual machine. Puppet and Chef in particular are all based on the concept of “converging to desired state”. State being singular and unchanging; well, until you change the puppet or chef configurations anyway. Not reactive, not responsive – more a “nailed down” thing. Puppet and Chef both kept the boundary of their effects to a single OS instance, most commonly a VM today. And frankly, they slept away the realization from AWS cloud formation, the open source clone that is OpenStack HEAT, Juju, and lately Terraform that the need for management had gone well above managing individual virtual machines and into sets of them.
These practices all stem from the ITIL model. An overly simplistic synopsis being “reducing errors by controlling change”. Originally formed when everything was manually updated and changed, it’s a duct-tape bandage on making a reliable system with us – humans. Human nature not being entirely reliable, more so when not communicating well (or at all) with their peers. The result was recommendations for formalized communication patterns, some of which ultimately became codified by software (originally BladeLogic) and then it’s open source clone/competition of cfEngine, Chef and Puppet.
As applications spread to both span and require multiple machines (or multiple VMs), scaling was how fast you could spin up and add new servers. Ops folks have been wanting to automate a lot of this scale up and scale down for ages, but never had consistent mechanisms to do so. It is not surprising that one of the first features that Amazon enabled after basic EC2 VM creation and tear down was autoscaling.
there’s some early steps towards this level of automation
Amazon’s auto-scaling setups (and several other systems out there) have enabled simple versions of this concept. It’s been applied extremely successfully where it’s easy (like front-end web servers) – stateless services. And what you scale is more (or less) virtual machines. Which is great, but often not granular enough for a really good of control.
challenges include humans in the process
One of the problem points is the consistency of the decision making process. These operations, as they were codified, were almost always using human as the control points. In addition, they were rarely tested as completely as they needed to be, and completely beholden to the specific installation of the underlying application. The access to create (and destroy) virtual machines (and related resources) are tightly controlled. You can see these controls parroted in the autoscaling systems today. Juju, Heroku, or CloudFoundry all have the scaling controls as operator commands. A lot of the reason is cost – scaling up means increasing costs, and companies almost always put a person in that control loop.
As the cost of computing continues to fall, the cost of keeping people in these loops increases. Humans are slow: minutes (or sometimes hours if they’re being woken from sleep and getting remote access) to respond to an alert from a monitoring infrastructure, making a choice, and invoking the relevant commands. IBM recognized this years ago and started to promote fully autonomic services. The concept of full loop performance controls has been lurking in the corner and desired by many in SRE and operations focused jobs. As we’ve been shifting the responsibility to run the service you write to developers, it is also spreading to all levels of development teams.
and the granularity of what we can automate and scale
The other problem, which I hinted at earlier, has been the level of granularity of what you can control. A bare metal machine, or even VM, is too large – too granular. With the shift towards containers as a means of deployment, the granularity is hitting a much better spot: per OS process, not VM. Adding more worker processes, or removing some – and the concept of groupings by service is about level of granularity for these control loops.
The container ecosystem also supports scaling as an operator command: docker, dcos/marathon and kubernetes, mirroring quite a bit of the AWS Autoscaling concepts. Marathon and Kubernetes offer autoscaling APIs, and there’s companies that are trying to build a bit of autoscaling onto Docker.
You can scale based on CPU consumption readings or requests/second from a load-balancer. That often does well for stateless front-end web servers, but you often want to scale to other measures – like adding on temporary cache capacity, or manipulating the number of asynchronous background process workers.
what kubernetes is doing
Kubernetes has a solid architecture for metrics and in general have been very aggressive about getting all the required primitive pieces in place for truly amazing levels of automation. The Kubernetes community has been pretty clear about how they want metrics to operate. Over the holidays, I saw an API proposal that supported custom metrics, which really brought it home for me.
This still requires that we build something into our applications to expose the metrics we want to react to, and we’ll then need to expose those through the service (or some kubernetes object) in order to react to it. Maybe a variant of healthcheck to either your application (healthz concept that Kelsey Hightower has been presenting) or to a sidecar container that “bolts on” a level of monitoring to previously built applications exposes your select application metrics.
Once a means to collect custom metrics is in place, something needs to make the decisions – and the structure of that is already solidly in Kubernetes. The service and controller concepts with Kubernetes provide a good basis for implementing these control loops, they just need more (and different) information.
all this enables more complex optimizations for running your applications
Once this is enabled, new information can be factored into the decision process and become part of the scheduling and balancing setup: balancing multiple services and caches against a cost to run your system; pre-scaling or caching against predicted consumption that would otherwise bottleneck your application; or more complex tradeoffs between services that need to be responsive right now vs. scaling up background processes to handle work at low-usage (or low-cost) times. There are more possible business and use cases here, but they all require similar things – more and different information from the application itself, as well as information from the environment hosting the application.
Mesosphere with Marathon and DC/OS is pushing on some of these same points, but so far hasn’t included de-facto standards or means of piping data from an application to their system to make this feedback loop easy. It’s possible, but a bolt-on, to Marathon. I wouldn’t be surprised to see Docker put something roughly equivalent into it’s APIs extending their concept of runtime metrics, especially if things like the Microscaling effort go well. With Docker’s apparent effort to be everything to everyone, I expect they will jump on that bandwagon as soon as they see someone else doing it and being successful.
I’m looking forward to seeing how this evolves, keeping a close eye on it all. With systems like Kubernetes, this could apply to your local datacenter or to any number of cloud providers, and it’s ultimately about helping us run our applications and services more efficiently and effectively.