[daisy] issue with large amout of documents
Bruno Dumon
bruno at outerthought.org
Thu Jun 7 03:18:16 CDT 2007
Hi,
On Wed, 2007-06-06 at 08:16 +0200, Bart Van den Abeele wrote:
> Thx for the info, i have some additional questions.
>
> We work with allot of documents : 100.000 is no exception even
> 1000.000 is possible. The user can start a query and if he isn't
> carefull, he could query for all of them.
In such case, with the current design of the repository, you have to
make sure you have enough memory so that all documents fit into it. Also
configure the document cache big enough so that things work fast.
This, combined with a limit clause on the query so that the result XML
isn't too big.
> At the moment i don't have a way to say to the user that the action
> that he started will result in an error.
>
> My first question is if it is possible to get the list no mather how
> long it is. Now this query results in a outofmemory exception.
> Could it perhaps be possible if only the list with it fields is
> retrieved and not the complete documents are loaded so not all that
> memory is taken and it doesn't take so long.
There's not much difference between "just the fields" and "a document".
A document object consists of some metadata, the fields and the list of
parts (without the actual part content), so these objects aren't very
big.
> Perhaps if i go directly to the database in stead of working via the
> api?
Then there's not much point in using the repository server: just use an
RDBMS then.
>
> My other question is how i can handle this situation. At this moment
> i can't detect the out-of-memory exception because i don't get this
> one when i use the api to talk with te repository, i get a
> DaisyPropagatedException. Could this be wrapped in a specific
> exception, so a can handle it properly? Or should i compare the string
> DaisyPropagatedException.getRemoteClassName() to
> "java.lang.OutOfMemory"?
While I don't find the handling of OutOfMemoryErrors a good thing (these
are pretty critical errors which shouldn't occur on a regular basis),
this approach to checking for the remote errors is good. It makes it
tied to the remote implementation of the API, so I would write it such
that it would continue to work if the code would run in the same VM as
the repository, thus using a test like:
exception instanceof OutOfMemoryError
||
exception instanceof DaisyPropagatedException &&
((DPE)e).getRemoteClassName().equals("...")
and this while running over the entire exception chain (using getCause)
since the exception might be nested.
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno at outerthought.org bruno at apache.org
More information about the daisy
mailing list