[daisy] indexing and searching PDF

Jano Kula jano.kula at tiscali.cz
Wed Jan 23 05:45:04 CST 2008


Hi,

while browsing the indextore I've noticed that the PDF indexes are 
stored in the index file line by line like in the PDF file itself. Thus 
hyphenated words are not found while searching. This is probably the 
Lucene thing, should I report it to its developers? Saving the index in 
the indexstore after one global substitution should solve this.

Is there any information on document parts in the indexstore? Imagine 
this minimal example:

document-type: book description
   part-type (daisy-html): annotation
   part-type (PDF): table of contents
   part-type (PDF): sample chapter      <--- here is the string
   part-type (PDF): index

If one searches for the string and this compound document is found, I 
can't see the way to find out which of the PDFs matches the string and 
it is confusing the user can't find it in the displayed daisy-html. If 
there is an information on document-parts in the indexstore, some mark 
(an arrow or highlighted link) could mark the part, where the string was 
found. Opening just one part would loose information on its context, I 
think. But there might be no information on parts in the indexstore. How 
would you deal with this situation?

And the last small issue with PDFs. While uploading the PDF with Firefox 
on Linux, file is marked with application/binary and this can't be 
overwritten to application/pdf changing the text in Mime-type field. 
Document fails to save not conforming to the mime-type of the document 
part. Uploading the same file with IE on Windows works. Is this solely 
the browser thing?

Thank you.

Jano



More information about the daisy mailing list