[daisy] HtmlCleaner : strange case ?

Bruno Dumon bruno at outerthought.org
Wed Nov 15 04:10:51 CST 2006


On Tue, 2006-11-14 at 13:57 +0100, christophe blin wrote:
> Thanks for the precision, I misunderstood with ul tag.
> 
> BTW, I still do not understand why the cleaning for
> 
> <p><ul><li><p>hello!</p></li></ul></p>
> is
> <ul><li/></ul><p>hello!</p>
> 
> If someone can point me why the p is out of the li at the end, I will be
> glad.

It must be some bug.

> 
> Then, I also have a problem to understand why, in the Cleaner, new
> SaxBuffer() and new ArrayList<StartElementInfo>() is done for every step ?

new SaxBuffer: the output of each processing step is stored in a
SaxBuffer. Such a SaxBuffer is simply a data structure for XML, just
like DOM, but for what we need to do in the cleaner it is much easier to
use and a lot more efficient.

new ArrayList<StartElementInfo>(): iirc this keeps track of the current
open ancestor elements (in the output).

> 
> Other questions concerning this component (more important, the 2
> previous are only for my deeper understanding) :
> 1. I would like to get rid off all the br tags and replace them with a p.
> Eg: <li>this is <br/>crazy</li> by <li><p>this is </p><p>crazy</p></li>
> 
> Where should I implement this ? (in HtmlRepairer.Cleaner directly or
> could I add another clean step without touching the code ?)

Either way is possible I guess (you'll have to touch the code anyway)

I find this a strange requirement though.

> 
> 2. I would like to have required attributes.
> How is it possible (more precisely, is it already possible and if not
> how could I implement this) ?
> Eg: want the 'id' attribute for every p in the html to be required

What would happen if a required attribute is missing? Throw an
exception?

I think this might be more something for a separate validation-step.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno at outerthought.org                          bruno at apache.org



More information about the daisy mailing list