[daisy] Re: issue with large amout of documents
Bruno Dumon
bruno at outerthought.org
Thu Jun 7 14:48:04 CDT 2007
Hi Bart,
Something that I'm still interested in having feedback on is if you have
already tried how things work once all documents are loaded in the
cache. Regardless of whether that's an acceptable option for your use
case, it would be good to know how much that helps. At least the second
execution of a query should go quite fast then? If not, could you
provide the timings included in the query results to see what takes the
most time?
In case you don't know how to configure the document cache, there's an
example here:
http://issues.cocoondev.org/browse/DSY-488
(it's on my todo list to add to the docs someday)
On Thu, 2007-06-07 at 15:17 +0200, Bart Van den Abeele wrote:
> Bruno Dumon schreef:
> > Because you would be bypassing the repository abstraction. I'd rather
> > see the repository improved than doing these sort of things.
>
> Is it perhaps possible to make a fast version of the query-method. A
> version that skips the post-processing like ACL and sorting (although we
> could use the sorting of lucene very good!)
I'm not sure what you mean with the sorting of lucene? You mean the
score-based sorting or other sorting? IIRC when there's a fulltext
search we keep the order of the documents as returned by lucene, unless
there's an explicit order by clause that orders them differently.
Indeed, the implementation could be optimized for the case we can on
beforehand determine that the ACL allows everyone read access to all
documents, and when there's no order-by clause, but again I'm not sure
that solves a lot as IMHO in many cases you'll want these features.
>
> All i want is that my query gets executed correctly and lasts no longer
> thant 3 minutes. It should return only the fields that i selected.
>
> Example that works:
>
> select documentType, $sb_dd_gemeente, $sb_dd_aktenummer, $sb_dd_jaar,
> $sb_dd_datum, $sb_dd_pictures, $sb_dd_betrokkenen_voornaam,
> $sb_dd_betrokkenen_achternaam, $sb_dd_betrokkenen_hoedanigheid where
> FullText('+Maude') and $sb_dd_jaar = '2000' and (documentType =
> 'sb_dd_Geboorte' or documentType = 'sb_dd_Huwelijk')
>
> Example that doesn't work (because every documentType has about 10K
> documents):
>
> select documentType, $sb_dd_gemeente, $sb_dd_aktenummer, $sb_dd_jaar,
> $sb_dd_datum, $sb_dd_pictures, $sb_dd_betrokkenen_voornaam,
> $sb_dd_betrokkenen_achternaam, $sb_dd_betrokkenen_hoedanigheid where
> (documentType = 'sb_dd_Geboorte' or documentType =
> 'sb_dd_AangifteHuwelijk' or documentType = 'sb_dd_Huwelijk' or
> documentType = 'sb_dd_Overlijden' or documentType =
> 'sb_dd_BijgevoegdGeboorte' or documentType = 'sb_dd_BijgevoegdHuwelijk'
> or documentType = 'sb_dd_BijgevoegdOverlijden' or documentType =
> 'sb_dd_Supplementair' or documentType = 'sb_dd_BijgevoegdSupplementair'
> or documentType = 'sb_dd_Echtscheiding' or documentType =
> 'sb_dd_BijgevoegdEchtscheiding' or documentType = 'sb_dd_Nationaliteit'
> or documentType = 'sb_dd_UitgebreideGeboorte' or documentType =
> 'sb_dd_UitgebreideOverlijden')
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno at outerthought.org bruno at apache.org
More information about the daisy
mailing list