[daisy] [JIRA] Commented: (DSY-463) A text part with mimetype "text/plain" should accept a part with mime type "text/plain;encoding=UTF8"

Bruno Dumon (JIRA) issues at cocoondev.org
Mon May 7 10:46:20 CDT 2007


    [ http://issues.cocoondev.org//browse/DSY-463?page=comments#action_13165 ] 

Bruno Dumon commented on DSY-463:
---------------------------------

FYI, getting the text (using the correct encoding) from files is a problem of Daisy's textextractors, not Lucene, so can be fixed in our codebase. Currently it is indeed UTF-8 which is always used. There is some library out there which is able to automatically guess the encoding based on the file content itself, which I was thinking of putting to use in Daisy some day.

Anyhow, all this isn't a priority for me either, so will likely never happen without some external stimulation :-)

To anyone out there who might want to hack on this: just an utility class which implements the correct mime-type parsing and matching would be a great contribution. Or maybe this exists already?

> A text part with mimetype "text/plain" should accept a part with mime type "text/plain;encoding=UTF8"
> -----------------------------------------------------------------------------------------------------
>
>          Key: DSY-463
>          URL: http://issues.cocoondev.org//browse/DSY-463
>      Project: Daisy
>         Type: Improvement
>   Components: Repository
>     Versions: 2.0.1
>     Reporter: Geoffrey De Smet
>     Priority: Trivial

>
> I have a schema which contains a part like this:
>   <partType name="myTextPart" mimeTypes="text/plain" ...
> When I set a part like this:
>   document.setPart("myTextPart", "text/plain;encoding=UTF-8", myString.getBytes("UTF-8"));
> I get the following exception:
> Caused by: org.outerj.daisy.repository.DocumentTypeInconsistencyException: The mime-type "text/plain;charset=UTF-8" isn't part of the allowed mime types (text/plain) required by the PartType "myTextPart" (ID: 14).
> 	at org.outerj.daisy.repository.commonimpl.DocumentVariantImpl.setPart(DocumentVariantImpl.java:545)
> 	at org.outerj.daisy.repository.commonimpl.DocumentVariantImpl.setPart(DocumentVariantImpl.java:513)
> 	at org.outerj.daisy.repository.commonimpl.DocumentVariantImpl.setPart(DocumentVariantImpl.java:488)
> 	at org.outerj.daisy.repository.commonimpl.DocumentImpl.setPart(DocumentImpl.java:307)
> On the other hand, if I 'd use:
>    <partType name="myTextPart" mimeTypes="text/plain;encoding=UTF-8" ...
> it doesn't get indexed by lucene

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.cocoondev.org//secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



More information about the daisy mailing list