[silva-dev] OpenOffice/ODF support for Silva?

Dave Kuhlman dkuhlman at rexx.com
Thu May 17 05:42:08 CEST 2007


On Tue, May 15, 2007 at 10:26:38AM +0200, eric casteleijn wrote:
> Dave Kuhlman wrote:
> 
> > - Are our goals similar to those of the Docma server, minus the
> >   connection using COM?
> 
> Well, partly. We would love to get rid of Docma, since it is a pain to 
> set up and maintain, and if we could do the same through OOo that would 
> be great. Replacing all of Docma is ambitious though, so I'd like to aim 
> for smaller, more realistic steps towards that goal.
> 

And, better to do a small task well than a big task poorly.

> Aside from the renderers and the stuff you already found, not really. 
> There is some example code I could dig up we used for a customer 
> project, which does a *very* limited export of a tiny subset of silva 
> xml to ODF, but that may be tied too heavily to the specific content for 
> that customer to be of much use.
> 

OK.  I'll look at that existing code more closely.  It seems quite
elegant.

Some of the following is just "thinking out loud" in an attempt to
get my thoughts straight.

I've looked a little more closely at the task of exporting Silva
documents to ODF/.odt.

One approach I'm considering is to:

- Create a "style-sheet" and store it in the Silva site.  A
  style-sheet is actually an ODF/oowriter document (.odt file)
  containing style definitions.

- Implement a Python class that uses lxml to parse exported the Silva
  document and then performs a tree walk of the lxml element tree.

- During the tree walk, use lxml to build an element tree that
  will become the content.xml.

- Use the ZipFile module in the Python standard library to create
  the .odt file, which is a compressed Zip file containing:

  + styles.xml             -- Copy these from the style-sheet.
  + content.xml            -- Produce this with etree.tostring()
  + Pictures/*             -- Contains image files
  + META-INF/manifest.xml
  + meta.xml               -- boiler plate
  + mimetype               -- boiler plate

A few comments on styles:

- Using an ODF/.odt document as a style-sheet enables the user to
  edit styles in oowriter using the oowriter styles editor.

- The style-sheet (styles.xml) will contain predefined styles such
  as silva-bold, silva-heading1, silva-heading2, silva-bulletlist,
  silva-enumlist.  We'll attach those styles as we generate the
  document (content.xml).

I noticed that the standard Silva export uses a more event driven
approach.  (It's implemented in Silva/silvaxml/xmlexport.py and
Sprout/src/sprout/saxext/xmlexport.py.) I need to study that to
find out whether using that mechanism might be more suitable than
performing a tree walk of an lxml element tree.  So, we have at
least the following alternatives:

- Implement an lxml.etree document tree walk, which would be
  reasonably clean and straight-forward.  I wrote a bit of this
  code as a trial.  After I wrote a few of the methods in the tree
  walk, the rest (at least the non-ODF-specific parts) were
  repetitive copy, paste, and edit.

- Or, it would be nice if we could re-use the existing
  event-driven framework.  This also have the benefit of
  eliminating the need to export to XML before performing a
  transformation.  The style-sheet comments, above, apply to this
  approach as well.

- Or, XSLT -- This seems like a very complex task for XSLT, but I
  suppose an XSLT expert could make it fit.

I'll try to explore these and and provide some evaluation.

Dave

-- 
Dave Kuhlman
http://www.rexx.com/~dkuhlman



More information about the silva-dev mailing list