Diff for Javascript
This page is a mirrored copy of an article originally posted on the (now sadly defunct) LShift blog; see the archive index here.
Tue, 15 August 2006
Toward the end of last week, I found myself wondering about implementing a diff algorithm in Javascript. It turns out there’s already at least one available attempt at this, by John Resig (direct link: jsdiff.js). Unfortunately, the implementation is a little buggy (try diffString("the quick", "the quick fox")
, for example), so I went hunting for source material to implement my own.
I ended up finding J. W. Hunt and M. D. McIlroy, “An algorithm for differential file comparison”, Bell Telephone Laboratories CSTR #41 (1976) (postscript), which turned out to be quite implementable.
Here’s my implementation: hunt-mcilroy.js. With luck, I’ve even implemented something close to the described algorithm.
As an example, the output from
diff("the quick brown fox jumped over".split(/\s+/),
"the quick fox jumps over".split(/\s+/))
is
[{common:["the","quick"]},
{file1:["brown"],
file2:[]},
{common:["fox"]},
{file1:["jumped"],
file2:["jumps"]},
{common:["over"]}]
The next step is to implement a diff3 equivalent, and then I’ve the tools to implement rudimentary version-control. It’d sure be a fine thing to see TiddlyWiki evolve darcs-like distributed change-propagation…
Comments
On 27 August, 2006 at 8:48 am,
wrote:On 22 May, 2007 at 4:48 am,
wrote:How is your diff3 going on ?
If it finish , may I see it’s source code by JS .
Thanks alot
On 22 May, 2007 at 1:15 pm,
wrote:Hi Frank, unfortunately I’ve not yet returned to this little project. I’ll be sure to post when I do, though.
On 22 May, 2007 at 5:01 pm,
wrote:OK , perhaps i’ll try to do a new one by myself . thanx
On 19 February, 2008 at 11:14 pm,
wrote:The code fails with the two sentences that I found on a different site:
str1 = “The red brown fox jumped over the rolling log.”;
str2 = “The brown spotted fox leaped over the rolling log”;
alert(uneval(diff(str1, str2)));
For example, a part of the result array is:
{common:["o", "v"]}, {file1:["e"], file2:["e"]}, {common:["r"]}
On 20 February, 2008 at 8:23 pm,
wrote:You’re right, Peter - computing a diff character-by-character seems to give spurious diffs! How odd. Have you dug into the code to find out why that might be?
Splitting the string into words first gives a sensible answer:
js> uneval(diff(str1.split(/\s+/), str2.split(/\s+/)));
[{common:["The"]},
{file1:["red"], file2:[]},
{common:["brown"]},
{file1:[], file2:["spotted"]},
{common:["fox"]},
{file1:["jumped"], file2:["leaped"]},
{common:["over", "the", "rolling"]},
{file1:["log."], file2:["log"]}]
On 1 August, 2009 at 6:01 pm,
wrote:… by John Resig (direct link: jsdiff.js). Unfortunately, the implementation is a little buggy (try diffString(”the quick”, “the quick fox”) …
It has been corrected (tested today)
On 3 September, 2010 at 10:41 am,
wrote:[quote]
… by John Resig (direct link: jsdiff.js). Unfortunately, the implementation is a little buggy (try diffString(”the quick”, “the quick fox”) …
It has been corrected (tested today)
[/quote]
Does anyone know where the latest copy of the John Resig code can be found? His Project page for diffing has been taken down.
Also see http://neil.fraser.name/software/diff_match_patch/