[daisy] HtmlCleaner : strange case ?

Wim Van Acker wva at schaubroeck.be
Thu Nov 9 06:31:19 CST 2006


christophe blin wrote:
> Hi,
>
> I am looking for html cleaners and find that the one in daisy is
> particulary good :
> - no fucking regexps allover the place
> - clean configuration by xml
> - nice test cases
>
> So I was trying some cases and find that the following seems to behave
> strangely :
> cleaner = template.newHtmlCleaner();
> result =
> cleaner.cleanToString("<html><body><p><ul><li><p>hello!</p></li></ul></p></html>");
>
> I am expecting something like :
> <ul>
> <li>
> hello!
> </li>
> </ul>
>
> but the cleaner answer :
> <ul>
> <li/>
> </ul>
>
> <p>hello!</p>
>
> What I found pretty strange is that the p is put out off the li ?
> IMHO, the only mistake here is that p is forbidden inside a li (i.e it
> is unlikely that the user wants to have an empty li).
>
> I am currently searching where the behavior comes from but if you have
> any hint, do not hesitate to list them here.
>
> Best regards,
> chris
>
>   
Hi,

I was editing a document in a Daisy wiki some time ago and noticed also 
some strange behaviour - at least I think it is.
Have tried to simulate something on the demo.daisycms.org site (great 
site for this kind of stuff) but it was not possible.

So here is a story that comes close ... although my original problem 
involved bullets (lists) and paragraph style.

a) if you create a new document and enter some simple lines of text like

the first line
the second line
some other lines

b) and save the document ...
c) the next time you edit the document the style is changed to 
"paragraph" instead of (none)
d) nevertheless the lines are displayed as

the first line
the second line
the other lines

e) if you put the cursor on "the second line" and explicitly select the 
(none) style ... then the paragraph style becomes visible like

the first line

the second line

the other lines

f) now try to remove the paragraph style of the second line ... it 
doesn't seem to work


It could wel be that c) and f) are not related and it is a coincidence 
but the f) behaviour  is quite annoying.

I experience it often when editing large documents with bullets. And it 
is not easy to get it of it. Manual HTML editing is required. Sometimes 
there seems to be no possibility to remove the paragraph style. My first 
problem was with a text with bullets, but the above example show's it 
happend also without bullets.

Hope this helps.

Wim



More information about the daisy mailing list