[daisy] [GSoc] progress update 1

Bruno Dumon bruno at outerthought.org
Wed Jul 4 03:31:10 CDT 2007


On Tue, 2007-07-03 at 21:53 +0200, Guy Van den Broeck wrote:
> Hi everyone,
> 
> Here comes the first progress update for the one and only GSoC Daisy
> project this year: DaisyDiff. As mentioned earlier I've only just
> started working last week. I spent 3 days with the nice people at
> Outerthought and ate their speculoos! 
> 
> -I've created a project at http://code.google.com/p/daisydiff/ . You're
> welcome to check it out.
> 
> -I've updated the LCS implementation to a newer version of the one used
> by the Eclipse people. This means that we can now diff documents of up
> to 100 times the current maximum size without any problems.
> 
> -I've messed around with different ways of diffing documents. The
> current code diffs by line, and then in each line by word. I tried
> diffing the entire document by word.
> 
> -I've looked up some XML diffing algorithms and realised that they are
> completely unfit for HTML text. I'll look into it more at a later time.
> 
> -I've added some extra functionality like making sure that diffs do not
> break up tags, that a lot of small differences separated by spaces are
> seen as one big diff, etc.
> 
> There are a few results i can show you:
> 
> http://users.pandora.be/guyvdb/tag-word1.html
> http://users.pandora.be/guyvdb/tag-word2.html
> http://users.pandora.be/guyvdb/tag-word3.html
> 

good work!

> I hope the formatting speaks for itself. Please let me know what you
> think of the concept of diffing this way compared to the current system.
> Comments on the markup and visualisation are also more than welcome.

Concerning visualisation: I'm not sure the symbols | and >> are
needed/helpful?

>  Do
> you want the versions to be next to each other in 2 columns?

What would be displayed in each column then?

>  Do you want
> the removed text to be hidden by default? Let me know.

I prefer to see the removed text (thus like it is now).

> 
> So what's next? I'm gonna try and link the markup to the words and make
> a diff that doesn't show the HTML code at all. This will probably be at
> the cost of formatting difference accuracy but it's a must have for not
> so tech-savvy users.

For some inspiration, I've found this link about word's document
compare:
http://blogs.msdn.com/microsoft_office_word/archive/2007/01/29/who-changed-what-when.aspx

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno at outerthought.org                          bruno at apache.org


More information about the daisy mailing list