[daisy] [JIRA] Commented: (DSY-467) Support language specific
Analyzer (for
example DutchAnalyzer) and fall back on StandardAnalyzer
Geoffrey De Smet (JIRA)
issues at cocoondev.org
Wed May 9 10:03:20 CDT 2007
[ http://issues.cocoondev.org//browse/DSY-467?page=comments#action_13168 ]
Geoffrey De Smet commented on DSY-467:
--------------------------------------
In our use case everything is dutch, so the ability to just use the locale of the server, would hack-fix it, but it would be pretty unstable (as it depends on the vm locale).
However in the bigger use case, it's indeed pretty hard, there are a couple of issues:
- Creating a new language doesn't verify the the language code, but this wouldn't be needed with the fallback on standardAnalyzer.
- Each language might need its own index, however some could share an index.
- If a language is modified, its documents would need to be re-indexed
- To be able to do a general search (in every language), there indeed still needs to be a language-neutral index.
- What impact will indexing each document variant into maximum 2 indexes (instead 1) have?
I was thinking of implementing it as a quick-patch on FullTextIndexImpl, but it looks like it's a bit out of my league for now :)
Maybe something for daisy 2.2 or later? We can recommend the use of lucene proximity searches ("werk~") for now.
> Support language specific Analyzer (for example DutchAnalyzer) and fall back on StandardAnalyzer
> ------------------------------------------------------------------------------------------------
>
> Key: DSY-467
> URL: http://issues.cocoondev.org//browse/DSY-467
> Project: Daisy
> Type: Feature Wish
> Components: Querying and indexing (repository)
> Versions: 2.0.1
> Reporter: Geoffrey De Smet
>
> In this file:
> http://svn.cocoondev.org/repos/daisy/tags/RELEASE_2_0_0/daisy/repository/server/src/java/org/outerj/daisy/ftindex/FullTextIndexImpl.java
> we find this code:
> private IndexWriter constructIndexWriter() throws IOException {
> return new IndexWriter(indexDirectory, new StandardAnalyzer());
> }
> which says that lucene uses StandardAnalyzer, however when we insert words like "werken", "gewerkt" and search for "werkte", it doesn't find it.
> The solution is to use a DutchAnalyzer instead.
> A solution might be: ask the language code of the daisy branch,
> then check if there's a Analyzer for that (for example "nl" => DutchAnalyzer)
> and do a fall back on StandardAnalyzer if it doesn't exist.
> Problem is that the current FullTextIndexImpl probably uses a singleton analyzer, independ of in which branch scope it's processed.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.cocoondev.org//secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
More information about the daisy
mailing list