yoy.be "Why-o-Why"

A (little) bit about 'diff'

2005-10-27 20:59  i332  coding  [permalink]

What is 'diff'? Let's say you want to know the differences between two text files. A most frequent real-life example: two versions of the same source file, for example each with updates from another user. But how to merge them.

Nice tools like ExamDiff Pro - Visual Diff Tool for File and Directory Compare are handy, but often not free. There's one provided in Visual Studio: Visual SourceSafe also, have a look at the history of a file, select two items in the list and hit difference...

But, inquisitive like we are, we need to get to the bottom of it. I want to have done it myself.

You might think, OK, two lists of lines, search a line on the 'left' in the list on the 'right', and if you find any, just forward the list on the right to that line, showing a piece of the list as 'not in the version on the right', and vice versa.

Not.

The best way to go at it is to get the best set of biggest chunks of data that's present in both documents. At least, I found out after [[Google]]ing my way onto Joe White's Blog, having a great set of enlightening exposees on his discovery of the diff algorithm.

A bit of Googling further, I found out which algoritm Perl's Algorithm::Diff was using, and used that and the original in SmallTalk, to get to this attempt in Delphi:

    start1:=0;
    finish1:=ls1.Count-1;
    start2:=0;
    finish2:=ls2.Count-1;
 
    SetLength(matchVector,ls1.Count);
    matches:=TMatchList.Create(Eq);
    links:=TLinkList.Create;
    //first prune
    while (start1<=finish1) and (start2<=finish2) and
      Eq(ls1[start1],ls2[start2]) do
     begin
      matchVector[start1]:=start2;
      inc(start1);
      inc(start2);
     end;
    while (start1<=finish1) and (start2<=finish2) and
      Eq(ls1[finish1],ls2[finish2]) do
     begin
      matchVector[finish1]:=finish2;
      dec(finish1);
      dec(finish2);
     end;
    for a1:=start1 to finish1 do matchVector[a1]:=-1;
 
    //with positions of interval (ls2,start2,finish2);
    for a2:=start2 to finish2 do matches.NoteMatch(ls2[a2],a2);
 
    SetLength(entries,finish2-start2+1);
    entryMax:=start2;
    for a2:=0 to finish2-start2 do entries[a2].Link:=-1;
 
    for a1:=start1 to finish1 do
     begin
      m:=matches.GetMatch(ls1[a1]);
      if Assigned(m) then
       begin
        b2:=-1;
        for mi:=m.Count-1 downto 0 do
         begin
          b1:=m[mi];
 
          if not(b2=-1) and (entries[b2-start2].tr>b1) and (entries[b2-start2-1].tr<b1) then
            entries[b2-start2].tr:=b1
          else
           begin
            b2x:=start2;
            while (b2x<entryMax) and (entries[b2x-start2].tr<b1) do inc(b2x);
 
            if (b2x<entryMax) then with entries[b2x-start2] do
             begin
              if tr=b1 then b2:=-1 else
               begin
                //assert tr>b1
                tr:=b1;
                b2:=b2x+1;
               end;
             end
            else
             begin
              //add!
              entries[entryMax-start2].tr:=b1;
              inc(entryMax);
              b2:=entryMax;
             end;
           end;
 
          //
          if not(b2=-1) then
           begin
            if b2<start2 then nl:=-1 else nl:=entries[b2-start2-1].Link;
            entries[b2-start2].Link:=links.AddLink(nl,a1,b1);
           end;
         end;
       end;
     end;
 
    if entryMax>start2 then
     begin
      nl:=entries[entryMax-start2-1].Link;
      while not(nl=-1) do
       begin
        links.GetLink(nl,a1,a2);
        matchVector[a1]:=a2;
       end;
     end;

twitter reddit linkedin facebook