pluggable, concurrent processing systems

It started with a thought – “Take the current path of advances with server hardware and take it to an extreme”. I was sitting on the bus and thinking about programming practices and making things more efficient. Things in this case meaning everything from “easier to program” to “uses less power to display a gazillion web pages a second”. For the past several years, I’ve been far more focused on server-based applications and development processes than desktop applications (Sadly, my objective-C programming skills have suffered for it). There’s been some upsides though, even if I have weird mental riffs while riding the bus. Better than staring at the other passengers I suppose.The trends of better tools for programming, working on figuring out concurrency, the GHz glass ceiling, the move to multiple cores per chip, attention to power efficiencies, and a pile of other random junk all conspired to this thought experiment.

Start off with getting rid of the physical drives. Physical drives are not gone in most servers, but are quickly diminishing in server farms. We’re consolidating all the drives into SAN’s, and iSCSI is breaking open some serious doors to making that far more of a commodity. So what if we didn’t have a hard drive in a server machine; could we get away with that? How about trimming off some of the other I/O thingies? In general, most of the IO ports on servers aren’t really used either. Not a like a desktop, where you’ve got a rats nest of USB devices, video screens, and other input devices. Servers, in my hasty generalization world, get almost all of their data through a network interface. Blade servers are already making advances to share common infrastructure like power and cooling. So reduce the complexity of one of these critters to just a CPU and some memory. Let’s get back to the original Von Neumann basic architecture. Okay, so we add a little something to that basic architecture – the network card. Take that thing with two sockets – power and ethernet and make or use a simple socket. Now make a rippin’ back plane for these things to communicate – maybe just using a commodity gigabit ethernet switch. There are lots more details to be resolved, but with that core concept you’ve got yourself the potential for a pluggable processing engine.

Ironically, this vision isn’t too far removed from my earliest experiences with computers – “blades” of circuit boards in rough, open aluminum housings that made up the “computer”. The things I remember were for controlling HVAC systems, didn’t really have any modern IO to speak of (a teletype interface), and Jim Ahart (one of the folks to introduce me to the esoteric “computer” systems back in late 70’s – he passed away a number of years ago) was constantly ripping out one blade, fiddling, and dropping in another. Oh – and I learned that these things weren’t hot swappable. If you tried, you ended up using an oscilloscope, ohm meter and soldering iron for hours to make it work again.It may not surprise anyone that the CPU is the beast that eats the power in a server machine. Memory is a close second. On a desktop, it’s just as likely that the GPU on the video card is the power sink – but video isn’t quite as killer in a server. So maybe you make it so that you can power on and off those pluggable CPU/Memory units to preserve power when you don’t need it. Very cool – sounds great. Ok – what’s it take to make that a reality?

Yeah, “Oh shit – that’s hard” is immediately what I think. It’s a hell of a lot of custom hardware and trickery to make it work. My first vision is all custom boards and chips, new connections and systems needing an entirely new operating system to make the whole thing fly. Not impossible, but kind of a non-starter from my point of view.So let’s take a page from the PC manual of world domination and make this easier. Lets use standard commodity components for this stuff. Maybe the first couple of iterations are nothing more than white-box motherboards that just don’t have a drive attached. It’s still a lot of engineering, but you can sort of see a path to making a prototype and then making improvements where you’re not having to build this whole thing from scratch. Stand on the shoulder’s of giants – definitely. Fiddling with the little Arduino boards makes this kind of experiment seem much more doable.One of the things that really stands out from current computer systems is the lack of prioperception for the computer itself. Computers in general have only rudimentary knowledge of themselves – disk space available, maybe memory (depending on how you look at it with virtual memory). They generally don’t know power consumption, and within a process (from the programmers point of view) – the only feedback you get when hitting memory limits is when malloc burps on you. You can thrash your own software into relative torpor by allocating and working with memory far in excess of your actual physical limits. I see a lot of time spent tuning programs and systems to deal with that lack of self-knowledge in the server world.

Thinking about the operating system to make this all work – this is where I head out into left field. You can look down the road and see that we’re aiming into the multicore world very quickly. It’s not, I think, unreasonable to assume that we might see a desktop computer with 64 cores as a standard option in a matter of 5 years. Shoot – I’m jealous of one of my friends with an 8 core Mac system right now.Coding to take advantage of those cores is generally agreed to be a pain. The common solution is to use threads – you get all this sweet low-latency cross-execution communications with threads. You also get deadlocks, race conditions, and a real bitch of a debugging problem. I’d love to say threads are evil, but in fact they’re just damn tricky to keep straight. Enter transactional memory – take the ACID concept of a database and apply it to shared memory access (okay – so drop the “durability” component of ACID – as soon as the power dies, the whole game is over anyway). It’s cool. It’s really cool – and we have a lot of people with experience with programming against this kind of model. I also like to riff off about this point into the world of concurrent process with the models that have objects talking by passing messages around. Tasklets in stackless python, Axon components in Kamaelia, that sort of thing. The abstraction that Objective-C runtime uses is message passing (very Small-talky), although most Mac programmers I know happy use the leak in that abstraction to dig around in the C code and make the system really sing. We also spend a lot of time in these constrained sandbox arenas – Java VM’s, a python interpreter, the shell on a Unix system. What if we could apply one of these sandboxes to one of our little pluggable units. The analog to starting a process in the Unix OS world would be to fire up one of these processor stubs and load in whatever processing you want to have happen there. Another option there – what if, instead, we used lots of little (meaning small) operating systems and coordinated the whole thing. Then starting a process is booting up another little embedded linux. Concurrency of execution can function on both threads and inter-process communications. I wonder if it would be effective enough – if the latencies could be kept low enough to make this run efficiently. There’s some proof positive of this concept with beowulf clusters in high-end compute systems (supercomputers).

I wonder if we can take the lessons learned, use a really stripped down Linux or BSD based system, include some more prioperception API’s into the overall system and build up a real, hotpluggable commodity “desktop” system. I think of the amount of compute that’s in my iPhone. It’s not a 3GHz powerhouse, but it’s pretty damn effective. What if the inside of my “desktop” wasn’t a load of fans and inefficient power supplies, but instead a simple backplane with lots of little deck-of-cards sized boxes getting plugged in. I don’t know what technology is easily available to be co-opted into this sort of thing, but I’ve got to think that some of the lower power Intel or AMD processors are getting close to being easily used with a “system on a chip” kind of manufacturing. There is no reason that you couldn’t even mix and match different processors in this “desktop” system to optimize for power efficiency based on it’s need. Maybe when this thing is asleep, it’s drawing the equivalent of an iPhone on standby mode, but it can still wake up and be ready to use in moments. The piece that I’ve been ignoring in this new-kind-of-commodity-server fantasy is all the desktop specific pieces that we know and love. Even still, a few slightly custom “compute plugs” would probably do the trick. One high-juice plug with a video card slapped on the side to provide video output. Maybe that’s even a pipeline of processors or internal computing pieces to enable the system to deal with the bandwidth needs for the video. You could do some really interesting things with dedicating a processor plug or two to different input forms – mouse and keyboard, maybe another to video, maybe a third to audio.

It’s a vision, a thought. Some serious hardware engineering and prototypes would come next to make this sort of thing a reality. It’s a leap forward in many ways from where we are today, but not that big of leap. I think it’s in the realm of possibility…

One thought on “pluggable, concurrent processing systems

  1. Sounds like you’ll be interested in something called MASCOT – which I recently got pointed at, but haven’t blogged about yet. UK Military invented it 30 years ago (!), and is defined as essentially a portable kernel… Thing is it hit it’s third major incarnation around 20 years ago, and hasn’t gone huge things since, but its a veritable gold mine of information. It’s influenced the STM work I’ve done (esp the dining philosophers example). Extending this stuff downwards to hardware is natural for MASCOT and something I’d kept in mind as _possible_ for Kamaelia due to things like MyHDL (python based VHDL-type system).

    Links:
    http://www.object-forge.com/
    http://async.org.uk/Hugo.Simpson/ – see the manual in PDF form at the end.

    Wikipedia page: http://en.wikipedia.org/wiki/Modular_Approach_to_Software_Construction_Operation_and_Test
    (mainly useful for search terms I found🙂 )

    You even find people saying things like this: “MASCOT Operating System

    Developed real-time pre-emptive operating system kernel, based on MASCOT principles (Modular Approach to Software Construction, Operation and Test), in assembler for 6809 hardware platform, used in small to medium sized control systems” http://atc-ltd.co.uk/military.htm

    Which is rather freaky considering the sort of platform they’re mentioning!🙂

    Kinda scary to think that they’ve essentially had this technique in complete form (including going down to the levels you suggest) for at least 20 years, maybe even 30…

    Was nice to find that I’d largely completely reinvented their wheel whilst being unaware of their wheel.🙂

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s