[daisy] [GSoc] daisydiff progress update 2
Bruno Dumon
bruno at outerthought.org
Tue Jul 10 03:28:12 CDT 2007
First impression: this visual diff is definitely a very useful addition
for Daisy. It is much more pleasant and clear to look at.
The examples look quite good, except for those glitches you mentioned.
Nice progress!
For those cases where the diff could be confusing, a three-pane view
(with the old & new doc) could help, but this has little to do with the
diff code itself of course.
PS: I'm still planning on doing a first integration of this stuff with
Daisy 2.1(-beta), which should come later this month or begin august.
This will make it easy for everyone to try this out on their documents.
PS2: while trying out the tagdiffer, I found it didn't tokenize words
(anymore) and tracked it down to this:
Index: src/java/org/outerj/daisy/diff/lcs/tag/DelimiterAtom.java
===================================================================
--- src/java/org/outerj/daisy/diff/lcs/tag/DelimiterAtom.java (revision 58)
+++ src/java/org/outerj/daisy/diff/lcs/tag/DelimiterAtom.java (working copy)
@@ -25,10 +25,7 @@
}
public static boolean isValidDelimiter(String s) {
- if (s.length() == 1) {
- isValidDelimiter(s.charAt(0));
- }
- return false;
+ return s.length() == 1 && isValidDelimiter(s.charAt(0));
}
On Mon, 2007-07-09 at 20:18 +0200, Guy Van den Broeck wrote:
> Hello
>
> Voila the results of my first diffing shown in HTML.
>
> -This is a large document with a few little realistic changes. The
> result is close to perfect in this case. Note the blue curly underlined
> words where the link has changed and hover them.
> http://daisydiff.googlecode.com/files/rendered1.html
>
> -This is an example of how the layout of the removed parts is
> reconstructed. No problems here.
> http://daisydiff.googlecode.com/files/rendered2.html
>
> -This is an example of several different changes. There's a small glitch
> with the added '.' after the removed list. In the 'Students' section the
> word 'you' should have a bullet. The problem lies with the newline that
> is started in the HTML code between the bullets. I have a fix but this
> margin is too small to contain it ;)
> http://daisydiff.googlecode.com/files/rendered3.html
>
> -This is an example of how the algorithm cuts branches in two when a
> removed word is inserted that doesn't have the same tags.
> http://daisydiff.googlecode.com/files/rendered4.html
>
> I hope you like it. PLEASE send as much feedback as possible. The
> removed word inserting algorithm is not trivial and needs more work. I
> will now spend 10 days in Greece to contemplate such topics as the
> meaning of life and removed HTML insertion. See you back the 20th!
>
> guy
>
> PS:The algorithm:
> ===================
> The algorithm compares the words (with LCS) in the document without
> considering the layout. Then the formatting of the new document is taken
> and new parts are coloured green.
> Unchanged parts are compared and the LCS (=linear diff) of the layout is
> calculated in a vertical fashion between corresponding words. Then that
> difference is expressed in a tooltip that pops up when you hover that
> particular word.
>
> Next is the difficult part: adding the removed words. There's a mode
> that removes all formatting from the removed parts and creates
> guaranteed correct HTML. The other mode tries to reconstruct the layout
> of the removed parts inside the new document. Depending on the tags
> around the removed words and the words before and after the removed part
> there is a set of non trivial rules. In the end each word is added to
> the new document somewhere in the tree where the word order is kept
> consistent and as much tags as possible are kept and shared between
> words. The code to do so is a huge mess and needs major refactoring but
> works more or less.
> When a word is removed in the middle of a tag and that tag does not
> appear in the old word's layout then that tag needs to be cut in 2 to be
> able to insert the old word in between (see example 4).
> Removed or added words that are separated by a few delimiters are
> 'bridged' in a round of preprocessing.
>
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno at outerthought.org bruno at apache.org
More information about the daisy
mailing list