[daisy] Daisy information

Murugesan Pakkirisamy, Leninbabu [CIB-IT] leninbabu.murugesanpakkirisamy at citigroup.com
Fri Aug 4 08:30:21 CDT 2006


Bruno,

	Thanks much for the detailed answer. This will certainely help. With these information I will take it to the next step. I will be doing some more analysis before presenting to the approval committee. Unless they plan to go for a commercial version I don't see any road blocks. Being a OpenSource fan I will try my best to use Daisy in our project. I will post the updates soon.

Regards,
Babu


-----Original Message-----
From: daisy-bounces at lists.cocoondev.org
[mailto:daisy-bounces at lists.cocoondev.org]On Behalf Of Bruno Dumon
Sent: Friday, August 04, 2006 4:36 AM
To: Daisy: open source CMS - general mailinglist
Subject: RE: [daisy] Daisy information


On Thu, 2006-08-03 at 13:23 -0400, Murugesan Pakkirisamy, Leninbabu
[CIB-IT] wrote:
> Marc,
> 
>      Thanks for your prompt response. To answer your questions,
> Initially volume of documents won't be much. We are expecting around
> around 500 documents per *month* of size <5MB each to get into the
> system. It will keep growing as time progresses. That is why I am also
> looking for a good backup/retreival system. As far as the
> alternatives, we are also looking at Alfresco, KnowledgeTree and few
> other commercial vendors. I am in favour of Daisy because of its
> simplicity and ease of use. Volume handling is our main concern. Is
> there a limitation on number of documents that can be stored in Daisy?
> I hope MySQL doesn't have any limitation in this regard. I also have
> questions regarding document storage. Are these documents stored in
> database or in file system? 
> 
> I greatly appreciate your assitance.

We don't have much experience with large data sets, and haven't took the
time yet to do some serious testing in this regard.

There is however no specific limitation on the number of documents that
can be stored in Daisy (or we're talking about limits like the largest
available disks, the maximum number of database rows, the maximum number
of files storable on a file system, or the java.lang.Long.MAX_VALUE
constant).

Things will of course get slower with larger volumes of data. Some
possible hot spots that come to mind are:

 - Daisy's document cache: an internal cache in Daisy, by default
limited to 10.000 documents, though it can be made as big as desired and
as memory permits. The cache should ideally be at least as big as the
'working set' of documents, i.e. the number of documents that are
regurarly accessed.

 - The table storing the "fields" of documents. This is a table with one
row per field (per document, per version). If you have 10.000 documents
with 5 fields each, and each document has 3 versions, this table will
thus contain 150.000 rows. For searching on fields (e.g. combining
searches on multiple fields using 'and') this table needs to be joined
with itself (on numeric keys). I don't know how efficiently databases
can do this, but I assume something like 100.000 records are small
numbers for a typical rdbms. (There's also room for improvement in Daisy
here, as we could store the fields of old versions in a separate,
non-indexed table)

 - Many things can be tuned on the actual setup: e.g. setting blobstore,
database files and fulltext indexes on different disks, tuning
database/OS parameters, ...

Regarding your question concerning the storage of the documents: all
metadata is stored in the database, the actual data of document parts
(the binary data) is stored on the filesystem. The size of the binary
data doesn't matter much, Daisy handles everything using streams.

Hope this helps.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno at outerthought.org                          bruno at apache.org

_______________________________________________
daisy community mailing list
Professional Daisy support: http://outerthought.org/site/services/daisy/daisysupport.html
mail to: daisy at lists.cocoondev.org
list information: http://lists.cocoondev.org/mailman/listinfo/daisy


More information about the daisy mailing list