the road to RackHD

I’ve delayed in posting this until the news is more formally out there: EMC decided to take the technology that I helped develop at Renasar Technologies and make it open source. The press release that came out today includes a video link of John Roese (CTO for EMC) talking about this effort. Lots of opinions are flowing and getting bandiest about, and the end result for me is simply being fortunate enough to share the work that I helped build with the community at large.

logo_medium

That project is RackHD, hosted on github in the form of multiple git repositories. Yeah, it’s a largish, complex project. It’s also a bit of a unicorn – it’s fundamentally devops tooling written in NodeJS. Most projects today that are related to helping manage infrastructure are in python, or Go. Why we wrote it in javascript is a story for another time.

The motivation for starting RackHD

The original problem we were sorting out was how to handle some of the complexities of firmware and BIOS updates in a fully automated fashion. The dirty secret of the computer hardware industry is that many of those tools are far from automated, requiring tremendous (I might be tempted to describe it as “horrific”) amounts of manual effort and often requires a physical human presence in the datacenter. We aimed to automate as absolutely much of that effort as possible. The goal was simple – to do as much as we could to treat the racks of machines as herds rather pets.
A lot of the focus of similar tooling today is leveraging or enabling cloud solutions. Some great work, a multitude of choices right now, and it’s advancing exactly as you’d expect a useful technology to advance. That space is oriented at abstracting away the specifics of hardware, and when I’m writing generalized services, that’s exactly the way I want to work. Getting to having that abstraction, well – that’s the pain point. There are solutions for configuration management out there – cfEngine, Chef, Puppet, Ansible, etc – all of them expect the hardware is already up and running, connected, and that the person(s) giving the input and making the choices that drive those tools has all the knowledge they need; that the OS is installed, network configured and ready to roll.
That isn’t well automated and is also where a huge number of tribal boundaries exist in most corporate or enterprise data-centers. It’s a pain point ripe for improvement. There’s tools there that do some of this, existing solutions – and we advanced it.
PXE booting is the obvious choice for a platform agnostic mechanism. As an industry standard, it is not without it’s quirks but is reasonably consistent. Most compute servers conform to the PXE spec to allow network booting. The solutions I mentioned earlier that already exist include: Cobbler, Razor, and others. Razor (or it’s clone/rewrite Hanlon) have lovely APIs. Even better than some of their peers – Razor and Hanlon extended the concepts to include a microkernel, which allows essentially arbitrary tasks on the remote machine. In the case of Razor and Hanlon those tasks are  baked into the microkernel – but still a tremendous capability to leverage.
The follow on problem is what hit us, and why we stepped aside from existing tools. We found ourselves needing needing to do a process that involved multiple steps, including a reboot in the middle of those steps. The final reboot is the boundary where existing tools stop. An example is checking, and possibly updating, the BIOS firmware on a server:
  •  PXE boot the server
  •  interrogate the hardware, see if we’re at the right version of firmware
  •  if not, flash the firmware to the version we want
  •  reboot (mandated by things like BIOS and BMC flashing)
  •  PXE boot again
  •  interrogate the hardware
  •  make sure we’re at the right version of firmware
  •  SCORE!
To achieve this, the existing systems (Cobbler, Razor, etc) needed another level of orchestration – resetting what we PXE boot, interacting with data from the machine in question, and making some (simple) decisions. This sequence of needing multiple steps that involved PXE booting is what ultimately led to RackHD.
The solution we chose to implement was adding workflows and a simple machine inventory over mechanisms that could PXE boot a server and leverage the capabilities of a microkernel.
We use the same microkernel concept as Razor and Hanlon. Although rather than just enabling it to do pre-set activities (whatever was built into the microkernel) we added a remote agent so that the workflow engine could specify tasks to be accomplished on the target machine using. Examples include zero-ing out disks, interrogating the PCI bus, even resetting the IPMI settings through a hosts’s internal KCS channel. Within this remote agent, we also optimized the path for interrogating and gathering data – leveraging existing linux tools and parsing the outputs, sending that information back to be stored as relatively free-form JSON data structures, building up the simple machine inventory.
As we developed workflows, we shifted from writing them in code to making them declarative – data instead of code. In the end, we made an event driven workflow engine that allows someone (through a REST API) to put together the tasks they want in the order they want and execute that workflow.
The other place we extended was in creating live data streams of telemetry data from the hardware. We are using events heavily in our workflow engine, and it wasn’t much of a leap to use the same mechanisms to capture data (via polling using IPMI and SNMP) from hardware sensors and publish it as a live data stream. In RackHD these are the “pollers”, with API controls and a fairly wide variety of functionality enabled. In addition to “doing stuff” to a server, it gives us semi-realtime telemetry and metrics.
If you want to see a little demo about all this, take a look at the video introduction screencast about RackHD.

What RackHD is good at

The mechanism of provisioning an OS is one of the more straightforward workflows you could image. I guess I might say that RackHD is focused on being the lowest level of automation in a datacenter. Supporting interrogating vendor agnostic hardware, setting a “personality” onto it (in the form of an OS), capturing telemetry and metrics from those machines to provide a “live data feed”, with a REST based API. The data feed and API, in particular, are meant to be consumed by other software or systems as much (if not more so) than by any individual. RackHD is built to be a component in a larger solution.
As we went through use cases and expanded features, we made the capability for the workflow engine to react to what it discovered – what we call “SKU” support, dynamic rendering of templates for OS installs, and passing of variables and data from the APIs that invoke workflow through to the configuration files that drive OS installs – like a kickstart or debseed file.
While we have a number of workflows in our code repository as examples of how you can do a variety of actions, the real power of the system is in being able to create your own workflows – and submit those through the REST API. You can define workflows for your needs, specific to your hardware if you want, to accomplish your automation goals.

Where we stopped, or what RackHD doesn’t do (today)

We intentionally stopped at two conceptual boundaries. First, we didn’t attempt to replicate all the work and effort that’s gone into software configuration management systems. Ansible, Puppet, Chef, and so forth have a long history of dealing with a lot of that complexity, and doing it well. We made sure our workflow system could integrate seamlessly with those kinds of systems – making a call to register a machine with a Puppet or Chef service, or in the case of Ansible, some example hooks for how to invoke a playbook.
Second, we intentionally made RackHD a comparatively passive system. You can embed a lot of logic in a workflow, but we stopped short of building in more complex logic that amounting to functions more commonly done as scheduling – choosing which machines to install with what OS, etc. We expect that someone, or some thing, will be making those relevant choices – a layer above hardware management and orchestration that we saw as “infrastructure orchestration and management”. We document and expose all of the events around the workflow engine to be utilized, extended, and even incorporated by an infrastructure management system. It’s meant to be a control point for something doing infrastructure management, something else making the decisions on what should be installed where, and which hardware is most useful for what services.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s