[daisy] "A document ID can only be specified for foreign
namespaces, which DSY is not."
Marc Portier
mpo at outerthought.org
Mon Dec 11 04:37:17 CST 2006
Robert Cecil wrote:
> Hi folks,
>
> I will apologize up front for my long-ish post; I hope someone has the
> patience to read through my blathering and provide some thoughts. :)
>
no need to apologize,
In fact it gives me the opportunity for a lengthy reply :-)
>
> I am trying to use the new Daisy 2.0 import/export tool in a way
> possibly not fully imagined by its creators. Here's my situation: I have
You are right there: primary use case for imp-exp was different then
what you describe. However the idea for using it to support
development-teams via scm was never far away...
(checking the archives: I found the thread "Changing the format of the
Daisy document IDs for Daisy 2.0" mentioning that at least once in this
message: http://lists.cocoondev.org/pipermail/daisy/2006-June/004259.html)
Of course knowing 'those thoughts' were there is not more then just
hopeful news: it doesn't guarantee that anyone went the extra mile to
effectively test out or provide possible missing components. I am
however quite sure the model supports what you want 'in principle'.
For more reading on the namespacing-model, please do grab this document:
http://cocoondev.org/daisydocs-2_0/repository/general/337.html which
sums up how the namespaces come to work.
Much like XML namespaces, the Daisy namespaces indicate some content -
'authority' that will guard uniqueness of all identifiers in the
'namespace' (= why I called them id-spaces at one point)
In your case I think this idea would lead to each 'project' your team
works on to be representing one distinct 'daisy-namespace' (having an
own fingerprint)
Additionally it should be noted that the id's of document-types and
field-types are not put in the namespace: (given your original question
started at sharing those, this is quite important to note IMHO)
Those always get new id's assigned on the repo they are created in.
Therefore:
1/ calls over the API or URL's should always reference them by name
2/ those names should be carrying some own unique-ness identifier
(prefix?) if you foresee more of those projects being deployed on one
and the same repository.
For completeness a similar remark should be made for 'skins', 'sites',
'facetbrowsers', 'pubreq-sets'. Those have names (they are sub-folders
in the daisy-data or daisy-wiki-data) that could possible collide. To
avoid the hassle of much renaming at deployment time it might be wise to
make those carry a similar project-identifying-prefix?
Finally I should note the importance of making the correct distinction
between the various aspects into play. The term 'project' might mean
different things for different people: It coule be a one on one with
'customer' but there could very well be a lot of reusable stuff between
those customers: any setup you foresee should be capable of coping with
these 'projects' being possibly depending on more generic base-
'projects'. Typically these base-projects will be gravitating around
document-types, publisher requests and facet browsers, end-customer
projects will more likely contain skins, at least one site, etc etc).
So far for the theory and 'thoughts'. In practice, this is what I would
propose (warning: untested thought-experiment below)
Setting:
- Team of Developers A, B, C are working on projects P1 and P2.
Organization:
- Assign two distinct namespaces (with different fingerprints!) for the
projects P1 and P2.
- Similarly assign distinct prefixes to be used for 'types', sites,
skins, ...
- Each project has its own SCM-repository to store regular exports as
well as other project related information (Java, daisy-js-scripts,
skins, ..., automated tests, ... whatnot)
- Customers will eventually run their own repository with own assigned
fingerprint and default DSY namespace! (Possibly even adding ACL rules
to prevent them changing stuff in the project-namespaces) Deploying any
or all of the 'projects' to those will be a mix of
1/ copying in the daisy-(wiki)-data stuff (sites, skins, ...),
2/ loading in the project specific types,
3/ and loading in the project specific documents
- Likewise each developer has his own repo installed, with own assigned
fingerprint and default DSY namespace. Everything created there is for
test purposes.
- They should 'register' (automatically during first import, or via a
custom tool on top of the API: see NamespaceManager) the
namespace-fingerprint for those projects they'll be working on.
- Of course the developer has access to the scm system and will have a
local sandbox in which they collaboratively can create the
daisy-(wiki)-data stuff.
The Tricky part (I've been avoiding)
is to allow developers now to create documents WITHIN the project-namespace.
Purely API speeking: The registration of the project's
namespace-fingerprint in the developers-daisy-instance will allow
creating a document into that namespace by using the
Document.setRequestedId() on newly created documents.
One could thus create some mini-doc-creator tool (and embed it into
maven/ant/shell) that would create such documents on the developers'
instance.
Note however that the uniqueness of those ID's is to be guaranteed by
this mythical 'project-authority'. How you organize this is entirely up
to you, the easiest way to achieve might just be running a central
accessible daisy-repo per project being worked on. These should then be
configured with the project's fingerprint-and namespace so that creating
a document there will effectively assign a new ID that is guaranteed
unique in that namespace.
After creating this new document centrally the developer could export
from the central system to his own scm sandbox and upload into his dev-repo.
I understand this involves more steps then a developer would typically
want, so it might be smart to look into again some build-env tool that
does most of the work. On the other hand: since the ID's of these
documents are to be used in links between them, they are in effect part
of the project's interface between various developers in the team:
Meaning it makes sense to agree upon them soon enough in the project. So
many of them could be created in bulk at the beginning of the project.
Leaving only the occasional addition.
Possible additional bonus: It's my feeling that these installations
could double as some intgeration-test platform as well: One could
reimport from a (checked in and possibly tagged) scm sandbox to the
central server again.
Other thoughts:
- import/export allows to work versus a non-zipped directory structure,
That could be exactly mapping a folder in the scm sandbox which could
help arbitrate conflicts between developers working on the same documents.
- parts would then be directly accessible to other tools on the
developers workstation. Sounds like a bonus IMHO, but might require
htmlcleaner to be invoked when the import is done.
- propagation of "removal of documents" in daisy and scm
- organize having both scm-commitmails and project-daisy-repository
event-mails delivered to some dev-team mailing-list
- enforce export of schema-types for which no needed documents exist
Note:
Options to set up two central accessible daisy-repo's for two distinct
projects being worked on:
- might be two separate hosts (could be nice for separate maintenance
and version upgrades)
- could be 2 instances running from the same installation == separate
data dirs and mysql-tables.
But: It can _not_ be two sites/skins on one instance! (since they would
share the default namespace)
> a set of developers who each an instance of Daisy-2.0 installed on their
on a side-note: I've been suggesting some dev-teams to share
pre-installed daisy's in vm-ware images, it allows developers to easily
reset to some initial blank slate, it also gives the opportunity to
fire-up a seperate release of daisy to test compatiblity with your own work
in the case of working on trunk it also ensures everybody in the team
has exactly the same revision under their fingers and only the one
building a new vmware image has to get into the trouble of building it
from source etc etc
on the downside: each developer gets a running daisy instance on a
different (host-only networking) ip-address (not localhost any more), so
you need to allow configuring that.
> local computers. A central "build" machine is maintained, primarily for
> QA purposes and to demonstrate project progress to the client on regular
> basis. Developers will typically do the following: create custom
> doctypes, create custom forms (cforms) and other Daisy extensions,
> create/modify skins and create content (primarily navigation documents,
> and placeholder documents). At some point in the future this project
> will be deployed in a hosting environment, and the client will assume
> responsibility for maintaining their own content (obviously). Until
> then, my dev team is creating small bits of content, along with the
> other project artifacts mentioned above.
>
> I thought the Daisy 2.0 export/import tool would allow for us to version
> the 'logical' content in the site: documents, doc types (schema), etc,
> and support parallel work on the site. My thought process would be
> something like:
> vm
> 1. Developer checks out via SCM the module for the project containing
> the current Wiki directory (sites, resources, skins), plus the last
> export directory structure (documents, info, etc)
> 2. Developer runs import on local machine to update his/her private
> MySQL instance with the most recent documents and schema information
> from SCM.
> 3. Developer runs a script (or manually) any updates to the various Wiki
> sites folders, resource folders, etc, depending on the situation
> 4. Developer's local Daisy instance reflects the most recent team
> commits to all of these artifacts and continues working.
yep: for completeness: at this time it would also include any pending
stuff he had locally and was not exported (and/or) committed yet, right?
> 5. When the time is right, the developer runs Export on his/her own
> local Daisy instance, reconciles the output of the export against
> his/her CVS workspace, and updates any Wiki folder changes for
> programmatic/functional changes.
> 6. Developer runs CVS Update reconciles conflicts, and then commits all
> work.
> 7. At a predetermined time and schedule, someone logs on the central
> build server, runs CVS update, which updates the CVS workspace on the
> server to grab all the recent checkins of both Wiki folder stuff, and
> any checkins to Daisy site Export output.
> 8. The person runs Daisy Import to import all CVS changes into the
> shared central MySql instance, thereby synchronizing any documents and
> schema changes
>
yes, that's pretty much how I'ld see things, but as noted above the key
point here is the 'authority' of this project namespace: there can only
be one assigning the numbers!
> This all hinges on the namespace and namespace fingerprint for the
> developers local Daisy instance and the central Daisy build to be
> identical. Or so I thought. I am getting the errors:
>
> <document id="24-DSY" branch="main" language="default">
> <description>*A document ID can only be specified for foreign
> namespaces, which DSY is not.*</description>
> <stackTrace>(stacktraces disabled, enable them via import
> options)</stackTrace>
> </document>
>
> When I attempt to test this process.
>
each repo is assumed to be 'authority' for his own namespace which is
coupled to its own fingerprint
see SELECT * FROM daisyrepository.daisy_namespaces and
myconfig.xml#/targets/target[@path="/daisy/repository/repository-manager"]/configuration/namespace
this means that each repo you install will have this automatically (each
with a unique fingerprint created during one of the installations scripts)
Repositories will always claim authority over their own namespace: they
will thus not allow that during import (or via the API by calling
setRequiredId) some external process is assigning numbers under their
own authority == in their own namespace.
So, repository's that are known (right away most of the time) to hold
documents that will need to be shared with other repositories should
change the namespace suffix as soon as possible. (From DSY to some XYZ)
>From then on id's created on that repository (under it's authority) will
be 'globally' unique, allowing them to be imported elsewhere.
In fact: during that import the fingerprint-to-XYZ link will be
automatically registered in the receiving repository. Effectively
prohibiting imports coming from repositories that use the same suffix,
but have different fingerprints.
Hope this clarifies, don't hesitate to ask further questions if it doesn't.
Your line of questions shows you're thinking about Daisy in 'large
scale', 'reproducible-development', even 'industrial' mode, which I like
a lot.
I definitely look forward to any feedback or further discussions on this
and similar working scenario's.
regards,
-marc=
--
Marc Portier http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at http://blogs.cocoondev.org/mpo/
mpo at outerthought.org mpo at apache.org
More information about the daisy
mailing list