[daisy] indexing and searching PDF
Jano Kula
jano.kula at tiscali.cz
Wed Jan 23 05:45:04 CST 2008
Hi,
while browsing the indextore I've noticed that the PDF indexes are
stored in the index file line by line like in the PDF file itself. Thus
hyphenated words are not found while searching. This is probably the
Lucene thing, should I report it to its developers? Saving the index in
the indexstore after one global substitution should solve this.
Is there any information on document parts in the indexstore? Imagine
this minimal example:
document-type: book description
part-type (daisy-html): annotation
part-type (PDF): table of contents
part-type (PDF): sample chapter <--- here is the string
part-type (PDF): index
If one searches for the string and this compound document is found, I
can't see the way to find out which of the PDFs matches the string and
it is confusing the user can't find it in the displayed daisy-html. If
there is an information on document-parts in the indexstore, some mark
(an arrow or highlighted link) could mark the part, where the string was
found. Opening just one part would loose information on its context, I
think. But there might be no information on parts in the indexstore. How
would you deal with this situation?
And the last small issue with PDFs. While uploading the PDF with Firefox
on Linux, file is marked with application/binary and this can't be
overwritten to application/pdf changing the text in Mime-type field.
Document fails to save not conforming to the mime-type of the document
part. Uploading the same file with IE on Windows works. Is this solely
the browser thing?
Thank you.
Jano
More information about the daisy
mailing list