Synchrotron
From diff3 and elementary graph theory to DVCS in a few short steps
Synchrotron is a Javascript library for text diffing and three-way-merge, as well as a simple experimental Distributed Version-Control System (DVCS). It turns out to be surprisingly easy to construct a DVCS from a few simple pieces: a record of history, a three-way-merge algorithm, and a least-common-ancestor (LCA) algorithm.
I used a Javascript implementation of diff3 that I built, along with
some code for LCA, and a model of revision history, to make a simple
Javascript DVCS that manages a collection of JSON structures. It
supports named branches, merging, and import/export of revisions. So
far, there’s no network synchronisation protocol, although it’d be
easy to build a simple one using the revision import/export feature
and XMLHttpRequest
. The storage format and repository representation
is brutally naive, and because it doesn’t yet delta-compress
historical versions of files, it is a bit wasteful of memory.
The following LShift blog posts summarise some of the work I’ve done:
- A simple Hunt-McIlroy text diff implementation in Javascript
- comm, diff, patch, and diff3 in Javascript
- Using the above to perform merging and DVCS.
There are some interesting demos in the slides from the talk at Osmosoft I gave (video) that you can try:
-
a demo of diff, comm, and patch functionality.
-
a demo of three-way merge and conflict-handling functionality.
-
a demo of a Javascript DVCS, a bit like Mercurial, that manages a collection of JSON objects (presenting them in a file-like way, for the purposes of the demo).
This is the video from the talk at Osmosoft I gave (originally hosted at vimeo, http://vimeo.com/1195525):
Giving TiddlyWiki an awareness of history
After building the heart of the system, I experimentally integrated it
with TiddlyWiki, providing version-control, history, and branch
merging features, as part of collaborative work done with Martin
Budden in June, 2008. You can try
out the demo
SynchroTiddly
online, though to experiment with merging between instances you will
need to save it to your local hard disk using wget
, curl
or
similar.
History-awareness in other kinds of database
NoSQL DBMS
Systems like CouchDB and Google Wave already have rudimentary history-merging operations. It’d be a great addition to devise some deeper awareness of history for them, permitting flexible ‘offline’ operation, improving their partition tolerance and in effect making them even more decentralized and distributed than they already are.
All that would be required would be to make a note of parent document revision IDs each time a change is made to a document. Clearly, some protocol for compacting, thinning, or outright pruning overlong histories would also be required eventually, but a history-aware database could lead to some very interesting applications even in the absence of history compaction. Some of the new NoSQL database systems, Riak for example, support custom programmable merge operators, opening the door to full history-aware document merging.
A great application for such a DVCS-style database would be a bug tracker. Taking the step from centralized to decentralized bug tracking could be a boon comparable to the switch from centralized to decentralized source-code control…
Consumer Applications
Even various calendar- and address-book-synchronisation applications could make good use of history-awareness. Many current systems don’t cope well with more than two parties—generally, one client and one server—being permitted to edit the database.
Getting the Synchrotron code
Synchrotron is hosted on github.
- download a current snapshot tarball
- browse the code
The core interfaces, algorithms and internal structures of the DVCS code seem quite usable to me. In order to get to an efficient DVCS from here, the issues of storage and network formats will have to be addressed. Fortunately, storage and network formats are only about efficiency, not about features or correctness, and so they can be addressed separately from the core system. It will also eventually be necessary to revisit the naive LCA-computation code I’ve written, which is used to select an ancestor for use in a merge.
The code is split into a few different files:
-
The sources for the diff and diff3 demos and the DVCS demo. In the latter, check out the definition of
presets.preset1
for an example of how to use the DVCS, andpresets.ambiguousLCA
for an example of the repository format and the use of the revision import feature. -
Graph utilities (for computing LCA etc)
-
The DVCS and pseudo-file-system code.
-
The repository history-graph-drawing code and a python script for drawing the little tile images used in rendering a repository history graph.