[daisy] [GSoc] daisydiff progress update 2

Vincent Mouton vincent at postback.be
Mon Jul 9 14:02:51 CDT 2007


> Hello
> 
> Voila the results of my first diffing shown in HTML.
> 
> -This is a large document with a few little realistic changes. The
> result is close to perfect in this case. Note the blue curly underlined
> words where the link has changed and hover them.
> http://daisydiff.googlecode.com/files/rendered1.html
> 
> -This is an example of how the layout of the removed parts is
> reconstructed. No problems here.
> http://daisydiff.googlecode.com/files/rendered2.html
> 
> -This is an example of several different changes. There's a small
> glitch
> with the added '.' after the removed list. In the 'Students' section
> the
> word 'you' should have a bullet. The problem lies with the newline that
> is started in the HTML code between the bullets. I have a fix but this
> margin is too small to contain it ;)
> http://daisydiff.googlecode.com/files/rendered3.html

I could have a comment on this page about the 2006 being changed into 2007
(the h2 on the bottom of the page).
Working like this, having 2 h2's actually 'alters' the structure of the
document.

I guess the previous/old version had a h2 containing the string 2006, and
then the contents of that tag changed to 2007.

Now it seems as if there's a H2 that, in the new version, will be empty (I
know daisy takes care of this) and one with the 2007 text.

The structure of the document hasn't changed though, only the contents of
the H2 tag.

cheers,
v

> 
> -This is an example of how the algorithm cuts branches in two when a
> removed word is inserted that doesn't have the same tags.
> http://daisydiff.googlecode.com/files/rendered4.html
> 
> I hope you like it. PLEASE send as much feedback as possible. The
> removed word inserting algorithm is not trivial and needs more work. I
> will now spend 10 days in Greece to contemplate such topics as the
> meaning of life and removed HTML insertion. See you back the 20th!
> 
> guy
> 
> PS:The algorithm:
> ===================
> The algorithm compares the words (with LCS) in the document without
> considering the layout. Then the formatting of the new document is
> taken
> and new parts are coloured green.
> Unchanged parts are compared and the LCS (=linear diff) of the layout
> is
> calculated in a vertical fashion between corresponding words. Then that
> difference is expressed in a tooltip that pops up when you hover that
> particular word.
> 
> Next is the difficult part: adding the removed words. There's a mode
> that removes all formatting from the removed parts and creates
> guaranteed correct HTML. The other mode tries to reconstruct the layout
> of the removed parts inside the new document. Depending on the tags
> around the removed words and the words before and after the removed
> part
> there is a set of non trivial rules. In the end each word is added to
> the new document somewhere in the tree where the word order is kept
> consistent and as much tags as possible are kept and shared between
> words. The code to do so is a huge mess and needs major refactoring but
> works more or less.
> When a word is removed in the middle of a tag and that tag does not
> appear in the old word's layout then that tag needs to be cut in 2 to
> be
> able to insert the old word in between (see example 4).
> Removed or added words that are separated by a few delimiters are
> 'bridged' in a round of preprocessing.
> 
> 
> 
> _______________________________________________
> daisy community mailing list
> Professional Daisy support:
> http://outerthought.org/site/services/daisy/daisysupport.html
> mail to: daisy at lists.cocoondev.org
> list information: http://lists.cocoondev.org/mailman/listinfo/daisy
> 




More information about the daisy mailing list