A good five or so years ago Shevek, a friend I met while building cloud-generating appliances at the (now defunct) Nebula, spent an afternoon and evening describing the benefits of lock-free data structures to me. He went deep into the theoretical aspects of it, summarizing a lot of the research thinking at the time. While I tried to keep up to retain it all later; I really didn’t manage it. The (predictable) end result was that I got the basics of what it did, why it was valuable, and some basic examples of how it was used, but I missed a lot of the how can this be more directly useful to me. A lot of that conversation was focused on multithreaded code and making it efficient, and what could be done (or not) to avoid contention blocks.
Jumping back to today, when I’m not writing away on Using Combine, I am helping some friends with a project that includes real-time collaborative editing. The goal is the same kind of thing that you see in Google Docs where multiple people are editing a document at the same time – you see each other’s cursors, lives updates, etc.
The earliest form of this kind of functionality that I used was originally a mac program called SubEthaEdit, and a couple years later a browser-based “text editor” called EtherPad, which I used extensively when working with the open source project OpenStack. (This was also around five years ago – I haven’t been actively contributing to OpenStack in quite a while).
Google adopted this capability as a feature, and has done an outstanding job of making it work. Happily, they also talk about how they do such things. In the details I was able to find, they often used the term “operational transformation” to cover most of their recent efforts. That was also the key technology behind Google’s (now dead effort): Wave. Other systems have done the same thing: ShareJS, Atom’s Xray, and the editor Xi.
I spent some time researching how others had tackled the problem of how you enable this kind of feature. The single best cohesive document I read was a blog post by Alexei Baboulevitch (aka “Archagon“) entitled: Data Laced with History: Causal Trees and Operational CRDTs. The article describes his own code and implementation experiments, conveniently available on github as crdt-playground. It also includes a tremendously useful primer into the related topics and links to research papers. It also helped (me) that all his code was done with Swift, and readily available to try out and play with.
Data Laced with History: Causal Trees and Operational CRDTs is absolutely amazing, but not an easy read. While I highly recommend it if you want to know how CRDT works, expect to need to read through it several times before the details all sink in. Alexei writes clearly and very accurately, and it is extremely dense. I have read the whole article many times, as well as dug through all that code repeatedly, to gain an understanding of it.
While I was buried in my own learning of how this works, I had an epiphany one evening: the core of a CRDT is a lock-free data structure. Once I stepped back from the specifics of it, looking at the whole thing as a general algorithm, the pieces I picked up from Shevek clicked into place. Turns out that conversation (which I had mostly forgotten and thought I lost) had a huge amount of value for me, years later. The background I gleaned after the fact gave me some ideas to understand how and why CRDTs work, and ultimately proved incredibly useful to understand how to effectively use them.