Synchrotron

From diff3 and elementary graph theory to DVCS in a few short steps

Synchrotron is a Javascript library for text diffing and three-way-merge, as well as a simple experimental Distributed Version-Control System (DVCS). It turns out to be surprisingly easy to construct a DVCS from a few simple pieces: a record of history, a three-way-merge algorithm, and a least-common-ancestor (LCA) algorithm.

I used a Javascript implementation of diff3 that I built, along with some code for LCA, and a model of revision history, to make a simple Javascript DVCS that manages a collection of JSON structures. It supports named branches, merging, and import/export of revisions. So far, there’s no network synchronisation protocol, although it’d be easy to build a simple one using the revision import/export feature and XMLHttpRequest. The storage format and repository representation is brutally naive, and because it doesn’t yet delta-compress historical versions of files, it is a bit wasteful of memory.

The following LShift blog posts summarise some of the work I’ve done:

A simple Hunt-McIlroy text diff implementation in Javascript
comm, diff, patch, and diff3 in Javascript
Using the above to perform merging and DVCS.

There are some interesting demos in the slides from the talk at Osmosoft I gave (video) that you can try:

a demo of diff, comm, and patch functionality.
a demo of three-way merge and conflict-handling functionality.
a demo of a Javascript DVCS, a bit like Mercurial, that manages a collection of JSON objects (presenting them in a file-like way, for the purposes of the demo).

This is the video from the talk at Osmosoft I gave (originally hosted at vimeo, http://vimeo.com/1195525):

Giving TiddlyWiki an awareness of history

After building the heart of the system, I experimentally integrated it with TiddlyWiki, providing version-control, history, and branch merging features, as part of collaborative work done with Martin Budden in June, 2008. You can try out the demo SynchroTiddly online, though to experiment with merging between instances you will need to save it to your local hard disk using wget, curl or similar.

History-awareness in other kinds of database

NoSQL DBMS

Systems like CouchDB and Google Wave already have rudimentary history-merging operations. It’d be a great addition to devise some deeper awareness of history for them, permitting flexible ‘offline’ operation, improving their partition tolerance and in effect making them even more decentralized and distributed than they already are.

All that would be required would be to make a note of parent document revision IDs each time a change is made to a document. Clearly, some protocol for compacting, thinning, or outright pruning overlong histories would also be required eventually, but a history-aware database could lead to some very interesting applications even in the absence of history compaction. Some of the new NoSQL database systems, Riak for example, support custom programmable merge operators, opening the door to full history-aware document merging.

A great application for such a DVCS-style database would be a bug tracker. Taking the step from centralized to decentralized bug tracking could be a boon comparable to the switch from centralized to decentralized source-code control…

Consumer Applications

Even various calendar- and address-book-synchronisation applications could make good use of history-awareness. Many current systems don’t cope well with more than two parties—generally, one client and one server—being permitted to edit the database.

Getting the Synchrotron code

Synchrotron is hosted on github.

download a current snapshot tarball
browse the code

The core interfaces, algorithms and internal structures of the DVCS code seem quite usable to me. In order to get to an efficient DVCS from here, the issues of storage and network formats will have to be addressed. Fortunately, storage and network formats are only about efficiency, not about features or correctness, and so they can be addressed separately from the core system. It will also eventually be necessary to revisit the naive LCA-computation code I’ve written, which is used to select an ancestor for use in a merge.

The code is split into a few different files:

The sources for the diff and diff3 demos and the DVCS demo. In the latter, check out the definition of presets.preset1 for an example of how to use the DVCS, and presets.ambiguousLCA for an example of the repository format and the use of the revision import feature.
The diff and diff3 code itself.
Graph utilities (for computing LCA etc)
The DVCS and pseudo-file-system code.
The repository history-graph-drawing code and a python script for drawing the little tile images used in rendering a repository history graph.