[daisy] advice on tagging and Categories.
Marc Portier
mpo at outerthought.org
Thu Sep 28 05:10:44 CDT 2006
Chris wrote:
>
> Hi,
>
> I am working on a new Daisy project
> and have a question about tagging and Categories.
>
> I see in the daisy scratchpad how a field like
> $DaisyCommWikiCategory is used in the navigation
> document query.
>
> I am looking for something similar, but more like tagging.
> By tagging I mean multiple attributes such as;
>
> Attributes:
> White-Paper, Meeting-Planner, Tutorial, News.
> Math, Science, Automobiles.
>
> DocA - Science, White-Paper
> DocB - Math, White-Paper
> DocC - News
> DocD - Science, News
>
> I think I found a way to do this, I created a multi-value field
> called Tag with this set of values. Then I made a New Document
> just like SimpleDocument but with the Tag field and I called it
> BasicDocument. Why am I doing this:
>
> Now I can do a query for these 'tags' for navigation
> or query includes.
>
> I could also use these tags in a faceted browser? I hope.
>
yes, and IMHO, that's precisely where those come to practical use,
however I think they become more useful when you actually define
separate dimensions (or facets) upfront
(see more below)
> Or for a 'similar docs' list based on a custom publisher?
>
mm, yummy, nice!
although I'm unsure of the current state of daisy's query language will
allow to practically express this notion of 'similar' (which to me
sounds like a threshold applied to a calculated ratio of matching values
over the total amount of present or possible values?)
anyways: I'm surely interested in where this might lead us to, do keep
us informed of how things are going, and don't be shy to propose new
features (or even better provide patches :-))
> I am also thinking if I want this kind of tagging on attachments
> I would make a new attachment document type with this new Tag
> field.
>
> WDYT?
>
I've recently attended GovCamp in Brussels (which was a barcamp like
unconference about e-Gov stuff) and touched upon the subject of using
facet browsers as a guide to finding stuff in a cloud of tags
In the discussion afterwards someone attended me on the more advanced
use of 'semantical tags' and related it to how rdf tripplets work...
(and the essence of semantic web approaches...)
Now, looking at your proposal of having one field (multivalue) with the
catch-all-semantics name 'TAG' , I'm getting the feeling you're loosing
some semantical value compared to a solution where you would allow for
more separate fields, each with a more precise semantical meaning.
e.g. by introducing various fields:
* 'PublicationType': White-Paper, Meeting-Planner, Tutorial, News.
* 'Topics' (multivalue) : Math, Science, Automobiles
And actually: on the subject of 'topics' you might want to have a look
at the (see daisy 2.0-dev) hierarchical values that would allow you to
provide a semantical taxonomy of keywords for people to pick from. This
would allow to identify sub-topics as being a member of larger topic
groups. e.g. Algebra >is> Math >is> Science. With this new feature
tagging an article as being about 'Algebra' could make it popup in a
query searching for articles about 'Math' (or one of it's decendants)
Now, I do understand how above suggestions bring in more formalism to
this tagging (more then you'ld want?). And making it worse: more
formalism that is centrally controlled by the one admin or small group
that is defining the document-types.
Also: introducing this upfront-formalism feels like breaking with IMHO
the biggest advantages of the folksonomies and tagging movements that
are now largely catching on everywhere:
- decentralisation and
- adding semantics as we go (like in natural human learning and
interaction)
On the other hand: the vast resulting chaos in some tag-clouds out there
and the growing number of disgruntled users with these free tagging
systems seem to indicate some more formalism or at least guidance is
required anyway... I 'm afraid the web 2.0 hype (nor academic work on
the semantic web for that matter) has brought us to a decisive point in
this debate yet...
Coming back to daisy I think we're in pretty good shape to answer both
approaches with the structure we have in place: I think we provide quite
some flexibility here and there to find a balance:
Note that
1/ document-types can pre-define more formal semantical fields (with
strong types and/or selection lists), but documents still can have free
added 'custom fields'.
2/ document-types can change over time, so ther is a mechanism to
respond to changing needs
3/ document-tasks provide an easy programmable way to do bulk operations
on documents already in the repo.
> Thanks, Chris.
>
thanx for sharing, and excuse me for being more contemplative and
filosopher-like then pragmatic and useful in my above 'advice'
> p.s. The documentation was very good. I recently installed Daisy 1.5.1
> on Ubuntu 6 and had no trouble. There were a few steps but easy
> to follow.
>
cool! (being a bit of an ubuntu fan myself)
you might be interested in the fact that our svn-trunk has (since
recently) a distro/debian section holding the code to build your own
packages
-marc=
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/mpo/
mpo at outerthought.org mpo at apache.org
More information about the daisy
mailing list