From diff3 and elementary graph theory to DVCS in a few short steps

The Least Common Ancestor problem

Synchrotron is a Javascript library for text diffing and three-way-merge, as well as a simple experimental Distributed Version-Control System (DVCS). It turns out to be surprisingly easy to construct a DVCS from a few simple pieces: a record of history, a three-way-merge algorithm, and a least-common-ancestor (LCA) algorithm.

I used a Javascript implementation of diff3 that I built, along with some code for LCA, and a model of revision history, to make a simple Javascript DVCS that manages a collection of JSON structures. It supports named branches, merging, and import/export of revisions. So far, there’s no network synchronisation protocol, although it’d be easy to build a simple one using the revision import/export feature and XMLHttpRequest. The storage format and repository representation is brutally naive, and because it doesn’t yet delta-compress historical versions of files, it is a bit wasteful of memory.

The following LShift blog posts summarise some of the work I’ve done:

There are some interesting demos in the slides from the talk at Osmosoft I gave (video) that you can try:

Railroad history diagram

Giving TiddlyWiki an awareness of history

After building the heart of the system, I experimentally integrated it with TiddlyWiki, providing version-control, history, and branch merging features, as part of collaborative work done with Martin Budden in June, 2008. You can try out the demo SynchroTiddly online, though to experiment with merging between instances you will need to save it to your local hard disk using wget, curl or similar.

History-awareness in other kinds of database


Systems like CouchDB and Google Wave already have rudimentary history-merging operations. It’d be a great addition to devise some deeper awareness of history for them, permitting flexible ‘offline’ operation, improving their partition tolerance and in effect making them even more decentralized and distributed than they already are.

All that would be required would be to make a note of parent document revision IDs each time a change is made to a document. Clearly, some protocol for compacting, thinning, or outright pruning overlong histories would also be required eventually, but a history-aware database could lead to some very interesting applications even in the absence of history compaction. Some of the new NoSQL database systems, Riak for example, support custom programmable merge operators, opening the door to full history-aware document merging.

A great application for such a DVCS-style database would be a bug tracker. Taking the step from centralized to decentralized bug tracking could be a boon comparable to the switch from centralized to decentralized source-code control…

Consumer Applications

Even various calendar- and address-book-synchronisation applications could make good use of history-awareness. Many current systems don’t cope well with more than two parties—generally, one client and one server—being permitted to edit the database.

Getting the Synchrotron code

Synchrotron is hosted on github.

The core interfaces, algorithms and internal structures of the DVCS code seem quite usable to me. In order to get to an efficient DVCS from here, the issues of storage and network formats will have to be addressed. Fortunately, storage and network formats are only about efficiency, not about features or correctness, and so they can be addressed separately from the core system. It will also eventually be necessary to revisit the naive LCA-computation code I’ve written, which is used to select an ancestor for use in a merge.

The code is split into a few different files: