[daisy] [JIRA] Commented: (DSY-463) A text part with mimetype "text/plain" should accept a part with mime type "text/plain;encoding=UTF8"

Geoffrey De Smet (JIRA) issues at cocoondev.org
Mon May 7 09:03:20 CDT 2007


    [ http://issues.cocoondev.org//browse/DSY-463?page=comments#action_13163 ] 

Geoffrey De Smet commented on DSY-463:
--------------------------------------

More issues to take into consideration:
- if for properties both "text/plain;charset=UTF-8" is added, it's not considered part of the documents filtered for lucene indexing and it will not be indexed.
- lucene currently has no way of knowing the charset (and takes as far as I can tell always UTF-8, which is ok), but it should be able to get the mimeType and take a look at the charset parameter in there. Try this out at home:
-- Create 2 files with notepad in windows, one saved as ANSI (cp1252) and one saved as UTF-8. Write the text "0€ financiële belasting" in both.
-- Attach both to a new document type with a text/plain part type.
-- Do a lucene based query on "belasting" (2hits) and "financiële" (1hit, only the UTF-8 saved notepad file).

We have a workaround for now, so I am afraid creating a patch hasn't got priority for now :/ Hope the feed-back helps at least.

> A text part with mimetype "text/plain" should accept a part with mime type "text/plain;encoding=UTF8"
> -----------------------------------------------------------------------------------------------------
>
>          Key: DSY-463
>          URL: http://issues.cocoondev.org//browse/DSY-463
>      Project: Daisy
>         Type: Improvement
>   Components: Repository
>     Versions: 2.0.1
>     Reporter: Geoffrey De Smet
>     Priority: Trivial

>
> I have a schema which contains a part like this:
>   <partType name="myTextPart" mimeTypes="text/plain" ...
> When I set a part like this:
>   document.setPart("myTextPart", "text/plain;encoding=UTF-8", myString.getBytes("UTF-8"));
> I get the following exception:
> Caused by: org.outerj.daisy.repository.DocumentTypeInconsistencyException: The mime-type "text/plain;charset=UTF-8" isn't part of the allowed mime types (text/plain) required by the PartType "myTextPart" (ID: 14).
> 	at org.outerj.daisy.repository.commonimpl.DocumentVariantImpl.setPart(DocumentVariantImpl.java:545)
> 	at org.outerj.daisy.repository.commonimpl.DocumentVariantImpl.setPart(DocumentVariantImpl.java:513)
> 	at org.outerj.daisy.repository.commonimpl.DocumentVariantImpl.setPart(DocumentVariantImpl.java:488)
> 	at org.outerj.daisy.repository.commonimpl.DocumentImpl.setPart(DocumentImpl.java:307)
> On the other hand, if I 'd use:
>    <partType name="myTextPart" mimeTypes="text/plain;encoding=UTF-8" ...
> it doesn't get indexed by lucene

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.cocoondev.org//secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



More information about the daisy mailing list