[daisy] [GSoc] daisydiff progress update 2
Guy Van den Broeck
guyvdb at gmail.com
Mon Jul 9 15:43:38 CDT 2007
thank you for the feedback!
the issue with the 2006/2007 header is that because there are no other
words in the header there is no sensible way to detect if the 2007
header originated from the 2006 header that was altered or if it was
another header that was deleted. one would presume so because you see
the analogy between the years, but we're not trying to do natural
language processing here.
maybe i did not make it clear enough that these pages will not be stored
by daisy or processed to produce the new versions. they merely are a
visual aid for editors and as such the "empty h2 case" is no problem.
guy
Op maandag 09-07-2007 om 21:02 uur [tijdzone +0200], schreef Vincent
Mouton:
> > Hello
> >
> > Voila the results of my first diffing shown in HTML.
> >
> > -This is a large document with a few little realistic changes. The
> > result is close to perfect in this case. Note the blue curly underlined
> > words where the link has changed and hover them.
> > http://daisydiff.googlecode.com/files/rendered1.html
> >
> > -This is an example of how the layout of the removed parts is
> > reconstructed. No problems here.
> > http://daisydiff.googlecode.com/files/rendered2.html
> >
> > -This is an example of several different changes. There's a small
> > glitch
> > with the added '.' after the removed list. In the 'Students' section
> > the
> > word 'you' should have a bullet. The problem lies with the newline that
> > is started in the HTML code between the bullets. I have a fix but this
> > margin is too small to contain it ;)
> > http://daisydiff.googlecode.com/files/rendered3.html
>
> I could have a comment on this page about the 2006 being changed into 2007
> (the h2 on the bottom of the page).
> Working like this, having 2 h2's actually 'alters' the structure of the
> document.
>
> I guess the previous/old version had a h2 containing the string 2006, and
> then the contents of that tag changed to 2007.
>
> Now it seems as if there's a H2 that, in the new version, will be empty (I
> know daisy takes care of this) and one with the 2007 text.
>
> The structure of the document hasn't changed though, only the contents of
> the H2 tag.
>
> cheers,
> v
>
> >
> > -This is an example of how the algorithm cuts branches in two when a
> > removed word is inserted that doesn't have the same tags.
> > http://daisydiff.googlecode.com/files/rendered4.html
> >
> > I hope you like it. PLEASE send as much feedback as possible. The
> > removed word inserting algorithm is not trivial and needs more work. I
> > will now spend 10 days in Greece to contemplate such topics as the
> > meaning of life and removed HTML insertion. See you back the 20th!
> >
> > guy
> >
> > PS:The algorithm:
> > ===================
> > The algorithm compares the words (with LCS) in the document without
> > considering the layout. Then the formatting of the new document is
> > taken
> > and new parts are coloured green.
> > Unchanged parts are compared and the LCS (=linear diff) of the layout
> > is
> > calculated in a vertical fashion between corresponding words. Then that
> > difference is expressed in a tooltip that pops up when you hover that
> > particular word.
> >
> > Next is the difficult part: adding the removed words. There's a mode
> > that removes all formatting from the removed parts and creates
> > guaranteed correct HTML. The other mode tries to reconstruct the layout
> > of the removed parts inside the new document. Depending on the tags
> > around the removed words and the words before and after the removed
> > part
> > there is a set of non trivial rules. In the end each word is added to
> > the new document somewhere in the tree where the word order is kept
> > consistent and as much tags as possible are kept and shared between
> > words. The code to do so is a huge mess and needs major refactoring but
> > works more or less.
> > When a word is removed in the middle of a tag and that tag does not
> > appear in the old word's layout then that tag needs to be cut in 2 to
> > be
> > able to insert the old word in between (see example 4).
> > Removed or added words that are separated by a few delimiters are
> > 'bridged' in a round of preprocessing.
> >
> >
> >
> > _______________________________________________
> > daisy community mailing list
> > Professional Daisy support:
> > http://outerthought.org/site/services/daisy/daisysupport.html
> > mail to: daisy at lists.cocoondev.org
> > list information: http://lists.cocoondev.org/mailman/listinfo/daisy
> >
>
>
> _______________________________________________
> daisy community mailing list
> Professional Daisy support: http://outerthought.org/site/services/daisy/daisysupport.html
> mail to: daisy at lists.cocoondev.org
> list information: http://lists.cocoondev.org/mailman/listinfo/daisy
More information about the daisy
mailing list