[daisy] [GSoc] progress update 1
Guy Van den Broeck
guyvdb at gmail.com
Tue Jul 3 14:53:32 CDT 2007
Hi everyone,
Here comes the first progress update for the one and only GSoC Daisy
project this year: DaisyDiff. As mentioned earlier I've only just
started working last week. I spent 3 days with the nice people at
Outerthought and ate their speculoos!
-I've created a project at http://code.google.com/p/daisydiff/ . You're
welcome to check it out.
-I've updated the LCS implementation to a newer version of the one used
by the Eclipse people. This means that we can now diff documents of up
to 100 times the current maximum size without any problems.
-I've messed around with different ways of diffing documents. The
current code diffs by line, and then in each line by word. I tried
diffing the entire document by word.
-I've looked up some XML diffing algorithms and realised that they are
completely unfit for HTML text. I'll look into it more at a later time.
-I've added some extra functionality like making sure that diffs do not
break up tags, that a lot of small differences separated by spaces are
seen as one big diff, etc.
There are a few results i can show you:
http://users.pandora.be/guyvdb/tag-word1.html
http://users.pandora.be/guyvdb/tag-word2.html
http://users.pandora.be/guyvdb/tag-word3.html
I hope the formatting speaks for itself. Please let me know what you
think of the concept of diffing this way compared to the current system.
Comments on the markup and visualisation are also more than welcome. Do
you want the versions to be next to each other in 2 columns? Do you want
the removed text to be hidden by default? Let me know.
So what's next? I'm gonna try and link the markup to the words and make
a diff that doesn't show the HTML code at all. This will probably be at
the cost of formatting difference accuracy but it's a must have for not
so tech-savvy users.
thanks
guy van den broeck
More information about the daisy
mailing list