[daisy] [JIRA] Commented: (DSY-463) A text part with mimetype
"text/plain"
should accept a part with mime type "text/plain;encoding=UTF8"
Bruno Dumon (JIRA)
issues at cocoondev.org
Mon May 7 10:46:20 CDT 2007
[ http://issues.cocoondev.org//browse/DSY-463?page=comments#action_13165 ]
Bruno Dumon commented on DSY-463:
---------------------------------
FYI, getting the text (using the correct encoding) from files is a problem of Daisy's textextractors, not Lucene, so can be fixed in our codebase. Currently it is indeed UTF-8 which is always used. There is some library out there which is able to automatically guess the encoding based on the file content itself, which I was thinking of putting to use in Daisy some day.
Anyhow, all this isn't a priority for me either, so will likely never happen without some external stimulation :-)
To anyone out there who might want to hack on this: just an utility class which implements the correct mime-type parsing and matching would be a great contribution. Or maybe this exists already?
> A text part with mimetype "text/plain" should accept a part with mime type "text/plain;encoding=UTF8"
> -----------------------------------------------------------------------------------------------------
>
> Key: DSY-463
> URL: http://issues.cocoondev.org//browse/DSY-463
> Project: Daisy
> Type: Improvement
> Components: Repository
> Versions: 2.0.1
> Reporter: Geoffrey De Smet
> Priority: Trivial
>
> I have a schema which contains a part like this:
> <partType name="myTextPart" mimeTypes="text/plain" ...
> When I set a part like this:
> document.setPart("myTextPart", "text/plain;encoding=UTF-8", myString.getBytes("UTF-8"));
> I get the following exception:
> Caused by: org.outerj.daisy.repository.DocumentTypeInconsistencyException: The mime-type "text/plain;charset=UTF-8" isn't part of the allowed mime types (text/plain) required by the PartType "myTextPart" (ID: 14).
> at org.outerj.daisy.repository.commonimpl.DocumentVariantImpl.setPart(DocumentVariantImpl.java:545)
> at org.outerj.daisy.repository.commonimpl.DocumentVariantImpl.setPart(DocumentVariantImpl.java:513)
> at org.outerj.daisy.repository.commonimpl.DocumentVariantImpl.setPart(DocumentVariantImpl.java:488)
> at org.outerj.daisy.repository.commonimpl.DocumentImpl.setPart(DocumentImpl.java:307)
> On the other hand, if I 'd use:
> <partType name="myTextPart" mimeTypes="text/plain;encoding=UTF-8" ...
> it doesn't get indexed by lucene
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.cocoondev.org//secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
More information about the daisy
mailing list