[daisy] [new feature plan] hierarchical fields
Bruno Dumon
bruno at outerthought.org
Wed Aug 9 03:13:53 CDT 2006
Hi,
A new feature we're planning to add to Daisy are hierarchical fields.
I'll jump right away into explaining my current ideas about this.
A hierarchical field is a field whose value is a hierarhical path (an
ordered set of values), typically selected from some hierarchy (a
hierarchical selection list). In our case, the actual value of the field
would be the complete path to the selected node in the hierarchie. For
example, take this hierarchy:
A
B
C
D
E
If the value 'E' is selected from the hierarchy, the actual stored field
value would be the path /A/C/E. This makes that hierarchical fields are
very similar to multivalue fields. However, we want the hierarchical
fields to also support multivalueness, so it's a bit like we need
two-dimensional multivalue fields.
A side-effect of storing the complete path for the value of a
hierarchical field is that, when the hierarchy structure changes, the
values of the actual document fields stay the same. This is in many
cases likely the desired behaviour, and has a couple of advantages, such
as easier-to-implement searching, allowing free entry, and easier
display of the selected path (no need to look it up in the original
hierarchie). The alternative to storing the complete path would be
storing some unique ID that identifies the node in the hierarchie.
The storage of the hierarchical fields in the 'thefields' table will be
very similar to the multivalue fields. This means storing '/A/C/E' will
need 3 rows in that table. Doing an exact search for '/A/C/E' will
require joining that table three times with itself. If it's a multivalue
hierarchical field and you need to search on two 3-length paths, that
will require 6 joins. I don't know how much this impacts performance,
but if it's not overused this should be OK.
Datatypes and hierarchical fields
---------------------------------
Hierarchical fields will be combinable with any of the existing
datatypes (string, long, link, ...).
The string datatype will probably be used most frequently, and some
datatypes make very little sense (like date and datetime).
For a while I thought about limiting hierarchical fields to string
values, in which case we could also make 'hierarchical' an additional
field datatype rather then a new field type property (which would keep
things simpler).
There is however one non-string type use case which might be quite
useful, which is 'link', as this allows the hierarchical nodes to be
represented by documents, and to build the hierarchical selection list
from documents (see below).
Searching on hierarchical fields
---------------------------------
It should be possible to search for exact paths in the hierarchy
(e.g. /A/B/C) but also for descendants (/A/**), children (/A/*), or
paths ending on a certain value (**/C).
The hierarchical selection list
-------------------------------
The hierarchy from which the value for a hierarhical field is selected
is provided by a hierarchical selection list (free entry of the
hierarchical value will be possible too).
As with the current selection lists, there can be static selection lists
and query-based selection lists.
The static selection lists will at first be limited to string types, as
other types are probably not that useful.
For the query-based lists, one special case that will be added is for
when a hierarchy is modelled by means of documents. That is, documents
which have a multivalue link field pointing to their 'children'. In such
case, the hierarchical list can be build by having a start-query and a
specification of link-fields which should be followed starting from the
initial documents.
Relation between hierarchical fields and dependent fields
---------------------------------------------------------
To a certain extent, the problems for which one might use a hierarhical
field could also be solved by having multiple fields with dependent
selection lists.
However, there are some differences:
- a hierarchical field can represent hierarchies of varying depth
- since a hierarchical field is one atomatic field, it can be
multivalue, and faceted browsing on this atomic value can be performed.
I'll start working on the hierarchical fields on a separate branch
(BRANCH_HIERFIELDS).
Comments and questions are welcome.
--
Bruno Dumon http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno at outerthought.org bruno at apache.org
More information about the daisy
mailing list