[daisy] [JIRA] Created: (DSY-495) Query performance on large repositories

Bruno Dumon (JIRA) issues at cocoondev.org
Thu Jun 21 05:45:50 CDT 2007


Query performance on large repositories
---------------------------------------

         Key: DSY-495
         URL: http://issues.cocoondev.org//browse/DSY-495
     Project: Daisy
        Type: Improvement
  Components: Repository - querying and indexing, Enterprise: ASP, HA, ...  
 Reporter: Bruno Dumon
    Priority: Minor


This issue serves as a place to collect thoughts on query performance for big repositories. The current Daisy repository was designed with the short term goal of providing rich functionality for small to medium sized repositories (tens of thousands of documents), which is sufficient for many websites, wikis, etc. When using the repository for larger scale projects (newspapers, record management, etc), search performance can become an issue.

In general, Daisy is not in the business of developing database technology, but rather providing a higher-level repository on top of an existing database. So we are dependent on what is available in the open-source database space.

The current scalability problems come for a good deal from the fact that Daisy allows to combine fulltext and metadata searches (which are performed on different systems, the SQL database and the Lucene index resp.). As a consequence of this, ordering of the results needs to be done by Daisy (order-by clauses might also contain expressions which cannot be translated to SQL, but this is of secondary importance).

Another, though in general probably smaller, problem is that the query results need to be ACL-filtered. The evaluation of the ACL needs the document object to be in memory, which works quite well when it is cached, but can be slow to load if not cached. What could help here is be a high-performant persistent document cache, avoiding the need to load documents via SQL.

Thoughts for possible solutions:
 * having a SQL database which supports fulltext searching would help (or some form of integration on that level)
 * a variant of this would be: first perform the fulltext query (Lucene), then upload the fulltext results in a temporary table to the database, and use those to merge results.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.cocoondev.org//secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



More information about the daisy mailing list