[daisy] [GSoc] daisydiff progress update 2
Mindaugas Idzelis
idzelis at us.ibm.com
Tue Jul 10 09:57:54 CDT 2007
This looks really nice. I think that the simple edits are going to be the
most frequent, and they look great.
It's not exactly diffing, but other way to compare two versions is to
display them side by side in two table columns.
Thanks,
Mindaugas Idzelish
Guy Van den Broeck <guyvdb at gmail.com>
Sent by: daisy-bounces at lists.cocoondev.org
07/09/2007 02:18 PM
Please respond to
guyvdb at gmail.com; Please respond to
"Daisy: open source CMS - general mailinglist" <daisy at lists.cocoondev.org>
To
"Daisy: open source CMS - general mailinglist" <daisy at lists.cocoondev.org>
cc
Subject
[daisy] [GSoc] daisydiff progress update 2
Hello
Voila the results of my first diffing shown in HTML.
-This is a large document with a few little realistic changes. The
result is close to perfect in this case. Note the blue curly underlined
words where the link has changed and hover them.
http://daisydiff.googlecode.com/files/rendered1.html
-This is an example of how the layout of the removed parts is
reconstructed. No problems here.
http://daisydiff.googlecode.com/files/rendered2.html
-This is an example of several different changes. There's a small glitch
with the added '.' after the removed list. In the 'Students' section the
word 'you' should have a bullet. The problem lies with the newline that
is started in the HTML code between the bullets. I have a fix but this
margin is too small to contain it ;)
http://daisydiff.googlecode.com/files/rendered3.html
-This is an example of how the algorithm cuts branches in two when a
removed word is inserted that doesn't have the same tags.
http://daisydiff.googlecode.com/files/rendered4.html
I hope you like it. PLEASE send as much feedback as possible. The
removed word inserting algorithm is not trivial and needs more work. I
will now spend 10 days in Greece to contemplate such topics as the
meaning of life and removed HTML insertion. See you back the 20th!
guy
PS:The algorithm:
===================
The algorithm compares the words (with LCS) in the document without
considering the layout. Then the formatting of the new document is taken
and new parts are coloured green.
Unchanged parts are compared and the LCS (=linear diff) of the layout is
calculated in a vertical fashion between corresponding words. Then that
difference is expressed in a tooltip that pops up when you hover that
particular word.
Next is the difficult part: adding the removed words. There's a mode
that removes all formatting from the removed parts and creates
guaranteed correct HTML. The other mode tries to reconstruct the layout
of the removed parts inside the new document. Depending on the tags
around the removed words and the words before and after the removed part
there is a set of non trivial rules. In the end each word is added to
the new document somewhere in the tree where the word order is kept
consistent and as much tags as possible are kept and shared between
words. The code to do so is a huge mess and needs major refactoring but
works more or less.
When a word is removed in the middle of a tag and that tag does not
appear in the old word's layout then that tag needs to be cut in 2 to be
able to insert the old word in between (see example 4).
Removed or added words that are separated by a few delimiters are
'bridged' in a round of preprocessing.
_______________________________________________
daisy community mailing list
Professional Daisy support:
http://outerthought.org/site/services/daisy/daisysupport.html
mail to: daisy at lists.cocoondev.org
list information: http://lists.cocoondev.org/mailman/listinfo/daisy
More information about the daisy
mailing list