[silva-dev] OpenOffice/ODF support for Silva?

Dave Kuhlman dkuhlman at rexx.com
Thu May 17 05:42:08 CEST 2007

On Tue, May 15, 2007 at 10:26:38AM +0200, eric casteleijn wrote:
> Dave Kuhlman wrote:
> > - Are our goals similar to those of the Docma server, minus the
> >   connection using COM?
> Well, partly. We would love to get rid of Docma, since it is a pain to 
> set up and maintain, and if we could do the same through OOo that would 
> be great. Replacing all of Docma is ambitious though, so I'd like to aim 
> for smaller, more realistic steps towards that goal.

And, better to do a small task well than a big task poorly.

> Aside from the renderers and the stuff you already found, not really. 
> There is some example code I could dig up we used for a customer 
> project, which does a *very* limited export of a tiny subset of silva 
> xml to ODF, but that may be tied too heavily to the specific content for 
> that customer to be of much use.

OK.  I'll look at that existing code more closely.  It seems quite

Some of the following is just "thinking out loud" in an attempt to
get my thoughts straight.

I've looked a little more closely at the task of exporting Silva
documents to ODF/.odt.

One approach I'm considering is to:

- Create a "style-sheet" and store it in the Silva site.  A
  style-sheet is actually an ODF/oowriter document (.odt file)
  containing style definitions.

- Implement a Python class that uses lxml to parse exported the Silva
  document and then performs a tree walk of the lxml element tree.

- During the tree walk, use lxml to build an element tree that
  will become the content.xml.

- Use the ZipFile module in the Python standard library to create
  the .odt file, which is a compressed Zip file containing:

  + styles.xml             -- Copy these from the style-sheet.
  + content.xml            -- Produce this with etree.tostring()
  + Pictures/*             -- Contains image files
  + META-INF/manifest.xml
  + meta.xml               -- boiler plate
  + mimetype               -- boiler plate

A few comments on styles:

- Using an ODF/.odt document as a style-sheet enables the user to
  edit styles in oowriter using the oowriter styles editor.

- The style-sheet (styles.xml) will contain predefined styles such
  as silva-bold, silva-heading1, silva-heading2, silva-bulletlist,
  silva-enumlist.  We'll attach those styles as we generate the
  document (content.xml).

I noticed that the standard Silva export uses a more event driven
approach.  (It's implemented in Silva/silvaxml/xmlexport.py and
Sprout/src/sprout/saxext/xmlexport.py.) I need to study that to
find out whether using that mechanism might be more suitable than
performing a tree walk of an lxml element tree.  So, we have at
least the following alternatives:

- Implement an lxml.etree document tree walk, which would be
  reasonably clean and straight-forward.  I wrote a bit of this
  code as a trial.  After I wrote a few of the methods in the tree
  walk, the rest (at least the non-ODF-specific parts) were
  repetitive copy, paste, and edit.

- Or, it would be nice if we could re-use the existing
  event-driven framework.  This also have the benefit of
  eliminating the need to export to XML before performing a
  transformation.  The style-sheet comments, above, apply to this
  approach as well.

- Or, XSLT -- This seems like a very complex task for XSLT, but I
  suppose an XSLT expert could make it fit.

I'll try to explore these and and provide some evaluation.


Dave Kuhlman

More information about the silva-dev mailing list