Combine and Swift Concurrency

Just before last weekend, the folks on the Swift Core Team provided what I took to be a truly wonderful gift: a roadmap and series of proposals that outline a future of embedding concurrency primitives deeper into the swift language itself. If you’re interested in the details of programming language concurrency, it’s a good read. The series of pitches and proposals:

If you have questions, scan through each of the pitches (which all link into the forums), and you’ll see a great deal of conversation there – some of which may have your answer (other parts of which, at least if you’re like me, may just leave you more confused), but most importantly the core team is clearly willing to answer questions and explore the options and choices.

When I first got involved with the swift programming language, I was already using some of these kinds of concurrency constructs in other languages – and it seemed to be a glaring lack of the language that they didn’t specify, instead relying on the Objective-C runtime and in particular the dispatch libraries in that runtime. The dispatch stuff, however, is darned solid – battle honed as it were. You can still abuse it into poor performance, but it worked solidly, so while it kind of rankled, it made sense with the thinking of “do the things you need to do now, and pick your fights carefully” in order to make progress. Since then time (and the language) has advanced significantly, refined out quite a bit, and I’m very pleased to see formal concurrency concepts getting added to the language.

A lot of folks using Combine have reached for it, looking for the closest thing they can find to Futures and the pattern of linking futures together, explicitly managing the flow of asynchronous updates. While it is something Combine does, Combine is quite a bit more – and a much higher level library, than the low level concurrency constructs that are being proposed.

For those of you unfamiliar, Combine provides a Future publisher, and Using Combine details how to use it a bit, and Donny Wals has a nice article detailing it that’s way more “tutorial like”.

What does this imply for Combine?

First up, pitches and proposals such as these are often made when there’s something at least partially working, that an individual or three have been experimenting with. But they aren’t fully baked, nor are they going to magically appear in the language tomorrow. Whatever comes of the proposals, you should expect it’ll be six months minimum before they appear in any seriously usable form, and quite possibly longer. This is tricky, detailed stuff – and the team is excellent with it, but there’s still a lot of moving parts to manage with this.

Second, where there’s some high level conceptual overlap, they’re very different things. Combine, being a higher level library and abstraction, I expect will take advantage of the the lower-level constructs with updates, very likely ones that we’ll never see exposed in their API, to make the operators more efficient. The capabilities that are being pitched for language-level actors (don’t confuse that with a higher level actor or distributed-actor library – such as Akka or Orleans) may offer some really interesting capabilities for Combine to deal with it’s queue/runloop hopping mechanisms more securely and clearly.

Finally, I think when this does come into full existence, I hope the existing promise libraries that are used in Swift (PromiseKit, Google’s promises, or Khanlou’s promise) start to leverage the async constructs into their API structure – giving a clear path to use for people wanting a single result processed through a series of asynchronous functions. You can use Combine for that, but it is really aimed at being a library that deals with a whole series or stream of values, rather than a single value, transformed over time.

tl;dr

The async and concurrency proposals are goodness, not replacement, for Combine – likely to provide new layers that Combine can integrate and build upon to make itself more efficient, and easier to use.

pluggable, concurrent processing systems

It started with a thought – “Take the current path of advances with server hardware and take it to an extreme”. I was sitting on the bus and thinking about programming practices and making things more efficient. Things in this case meaning everything from “easier to program” to “uses less power to display a gazillion web pages a second”. For the past several years, I’ve been far more focused on server-based applications and development processes than desktop applications (Sadly, my objective-C programming skills have suffered for it). There’s been some upsides though, even if I have weird mental riffs while riding the bus. Better than staring at the other passengers I suppose.The trends of better tools for programming, working on figuring out concurrency, the GHz glass ceiling, the move to multiple cores per chip, attention to power efficiencies, and a pile of other random junk all conspired to this thought experiment.

Start off with getting rid of the physical drives. Physical drives are not gone in most servers, but are quickly diminishing in server farms. We’re consolidating all the drives into SAN’s, and iSCSI is breaking open some serious doors to making that far more of a commodity. So what if we didn’t have a hard drive in a server machine; could we get away with that? How about trimming off some of the other I/O thingies? In general, most of the IO ports on servers aren’t really used either. Not a like a desktop, where you’ve got a rats nest of USB devices, video screens, and other input devices. Servers, in my hasty generalization world, get almost all of their data through a network interface. Blade servers are already making advances to share common infrastructure like power and cooling. So reduce the complexity of one of these critters to just a CPU and some memory. Let’s get back to the original Von Neumann basic architecture. Okay, so we add a little something to that basic architecture – the network card. Take that thing with two sockets – power and ethernet and make or use a simple socket. Now make a rippin’ back plane for these things to communicate – maybe just using a commodity gigabit ethernet switch. There are lots more details to be resolved, but with that core concept you’ve got yourself the potential for a pluggable processing engine.

Ironically, this vision isn’t too far removed from my earliest experiences with computers – “blades” of circuit boards in rough, open aluminum housings that made up the “computer”. The things I remember were for controlling HVAC systems, didn’t really have any modern IO to speak of (a teletype interface), and Jim Ahart (one of the folks to introduce me to the esoteric “computer” systems back in late 70’s – he passed away a number of years ago) was constantly ripping out one blade, fiddling, and dropping in another. Oh – and I learned that these things weren’t hot swappable. If you tried, you ended up using an oscilloscope, ohm meter and soldering iron for hours to make it work again.It may not surprise anyone that the CPU is the beast that eats the power in a server machine. Memory is a close second. On a desktop, it’s just as likely that the GPU on the video card is the power sink – but video isn’t quite as killer in a server. So maybe you make it so that you can power on and off those pluggable CPU/Memory units to preserve power when you don’t need it. Very cool – sounds great. Ok – what’s it take to make that a reality?

Yeah, “Oh shit – that’s hard” is immediately what I think. It’s a hell of a lot of custom hardware and trickery to make it work. My first vision is all custom boards and chips, new connections and systems needing an entirely new operating system to make the whole thing fly. Not impossible, but kind of a non-starter from my point of view.So let’s take a page from the PC manual of world domination and make this easier. Lets use standard commodity components for this stuff. Maybe the first couple of iterations are nothing more than white-box motherboards that just don’t have a drive attached. It’s still a lot of engineering, but you can sort of see a path to making a prototype and then making improvements where you’re not having to build this whole thing from scratch. Stand on the shoulder’s of giants – definitely. Fiddling with the little Arduino boards makes this kind of experiment seem much more doable.One of the things that really stands out from current computer systems is the lack of prioperception for the computer itself. Computers in general have only rudimentary knowledge of themselves – disk space available, maybe memory (depending on how you look at it with virtual memory). They generally don’t know power consumption, and within a process (from the programmers point of view) – the only feedback you get when hitting memory limits is when malloc burps on you. You can thrash your own software into relative torpor by allocating and working with memory far in excess of your actual physical limits. I see a lot of time spent tuning programs and systems to deal with that lack of self-knowledge in the server world.

Thinking about the operating system to make this all work – this is where I head out into left field. You can look down the road and see that we’re aiming into the multicore world very quickly. It’s not, I think, unreasonable to assume that we might see a desktop computer with 64 cores as a standard option in a matter of 5 years. Shoot – I’m jealous of one of my friends with an 8 core Mac system right now.Coding to take advantage of those cores is generally agreed to be a pain. The common solution is to use threads – you get all this sweet low-latency cross-execution communications with threads. You also get deadlocks, race conditions, and a real bitch of a debugging problem. I’d love to say threads are evil, but in fact they’re just damn tricky to keep straight. Enter transactional memory – take the ACID concept of a database and apply it to shared memory access (okay – so drop the “durability” component of ACID – as soon as the power dies, the whole game is over anyway). It’s cool. It’s really cool – and we have a lot of people with experience with programming against this kind of model. I also like to riff off about this point into the world of concurrent process with the models that have objects talking by passing messages around. Tasklets in stackless python, Axon components in Kamaelia, that sort of thing. The abstraction that Objective-C runtime uses is message passing (very Small-talky), although most Mac programmers I know happy use the leak in that abstraction to dig around in the C code and make the system really sing. We also spend a lot of time in these constrained sandbox arenas – Java VM’s, a python interpreter, the shell on a Unix system. What if we could apply one of these sandboxes to one of our little pluggable units. The analog to starting a process in the Unix OS world would be to fire up one of these processor stubs and load in whatever processing you want to have happen there. Another option there – what if, instead, we used lots of little (meaning small) operating systems and coordinated the whole thing. Then starting a process is booting up another little embedded linux. Concurrency of execution can function on both threads and inter-process communications. I wonder if it would be effective enough – if the latencies could be kept low enough to make this run efficiently. There’s some proof positive of this concept with beowulf clusters in high-end compute systems (supercomputers).

I wonder if we can take the lessons learned, use a really stripped down Linux or BSD based system, include some more prioperception API’s into the overall system and build up a real, hotpluggable commodity “desktop” system. I think of the amount of compute that’s in my iPhone. It’s not a 3GHz powerhouse, but it’s pretty damn effective. What if the inside of my “desktop” wasn’t a load of fans and inefficient power supplies, but instead a simple backplane with lots of little deck-of-cards sized boxes getting plugged in. I don’t know what technology is easily available to be co-opted into this sort of thing, but I’ve got to think that some of the lower power Intel or AMD processors are getting close to being easily used with a “system on a chip” kind of manufacturing. There is no reason that you couldn’t even mix and match different processors in this “desktop” system to optimize for power efficiency based on it’s need. Maybe when this thing is asleep, it’s drawing the equivalent of an iPhone on standby mode, but it can still wake up and be ready to use in moments. The piece that I’ve been ignoring in this new-kind-of-commodity-server fantasy is all the desktop specific pieces that we know and love. Even still, a few slightly custom “compute plugs” would probably do the trick. One high-juice plug with a video card slapped on the side to provide video output. Maybe that’s even a pipeline of processors or internal computing pieces to enable the system to deal with the bandwidth needs for the video. You could do some really interesting things with dedicating a processor plug or two to different input forms – mouse and keyboard, maybe another to video, maybe a third to audio.

It’s a vision, a thought. Some serious hardware engineering and prototypes would come next to make this sort of thing a reality. It’s a leap forward in many ways from where we are today, but not that big of leap. I think it’s in the realm of possibility…