[daisy] HtmlCleaner : strange case ?
christophe blin
cblin at tennaxia.com
Tue Nov 14 05:25:04 CST 2006
Hi,
I dig this a little this morning and find that this is really a strange
case.
<p><ul><li><p>hello!</p></li></ul></p>
is cleaned as (on 1 line for brievity)
<ul><li/></ul><p>hello!</p>
but
<ul><li><p>hello!</p></li></ul>
is cleaned as
<ul><li><p>hello!</p></li></ul>
//coment: which, BTW, is not valid xhtml (from the stnadards, li can not have p inside them).
Could someone provide me some information on this behavior ?
Thanks,
regards,
chris
christophe blin a écrit :
> Hi,
>
> I am looking for html cleaners and find that the one in daisy is
> particulary good :
> - no fucking regexps allover the place
> - clean configuration by xml
> - nice test cases
>
> So I was trying some cases and find that the following seems to behave
> strangely :
> cleaner = template.newHtmlCleaner();
> result =
> cleaner.cleanToString("<html><body><p><ul><li><p>hello!</p></li></ul></p></html>");
>
> I am expecting something like :
> <ul>
> <li>
> hello!
> </li>
> </ul>
>
> but the cleaner answer :
> <ul>
> <li/>
> </ul>
>
> <p>hello!</p>
>
> What I found pretty strange is that the p is put out off the li ?
> IMHO, the only mistake here is that p is forbidden inside a li (i.e it
> is unlikely that the user wants to have an empty li).
>
> I am currently searching where the behavior comes from but if you have
> any hint, do not hesitate to list them here.
>
> Best regards,
> chris
>
>
--
_____________________________________________________________________
Tennaxia, www.tennaxia.com,
Pilotez vos obligations environnementales
_____________________________________________________________________
Siège social :
6, rue Léonard de Vinci - 53001 Laval Cedex -
Tél : 02 43 49 75 50 - Fax : 02 43 49 75 77
Agence Paris :
19, rue réaumur - 75003 Paris -
Tél : 01 42 77 04 19 - Fax : 08 25 19 19 61
Agence Lyon :
Parc du Chater - 63 rue de la garenne - 69340 FRANCHEVILLE -
Tél : 04 72 39 98 14 - Fax : 04 72 39 93 85
The information in this message sent by TENNAXIA is confidential
and may be legally privileged. It is intended solely for the
addressee(s). Access to this message by anyone else is unauthorized.
If you are not the intended recipient, please delete it and notify
the sender : any disclosure, copying, distribution or any action
taken or omitted to be taken in reliance on it, is prohibited and
may be unlawful.
More information about the daisy
mailing list