[Fwd: Re: [formulator-dev] using other encodings]

Joachim Schmitz js at aixtraware.de
Thu Sep 18 22:16:28 CEST 2003


I think Martijn forgott to cc this to the list.


-------- Original Message --------
Subject: Re: [formulator-dev] using other encodings
Date: Thu, 18 Sep 2003 20:12:22 +0200
From: Martijn Faassen <faassen at vet.uu.nl>
To: Joachim Schmitz <js at aixtraware.de>
References: <3F69986C.3000301 at aixtraware.de>

Joachim Schmitz wrote:
> as far a I can tell this is "only" an issue, if one uses the XML 
> definition for forms. As long as you don't use this, one can use at 
> least the iso-8859-xx without problems.

A summary, perhaps superfluous but others may be following this
discussion.

I think this will also be an issue if you set your output encoding
header to UTF-8 and use unicode content in your
site. Example of setting output encoding:

REQUEST.RESPONSE.setHeader('Content-Type', 'text/html;charset=utf-8')

This happens in Silva, for instance. Zope picks up the output
encoding. If unicode strings enter the page template, the whole page
template ends up being unicode (Zope 2.6 and up). Zope then uses
the output encoding to decide which 8-bit encoding to use for the
web page, along the lines of zpt_output.encode('UTF-8').

Unicode strings in Python can only safely be mixed with ascii strings,
i.e. strings that *only* contain ascii characters. Accented characters
for instance don't exist in ascii, so if you use those (by using
latin1 for instance) you have a problem as soon as you start mixing
with unicode.

This triggers the misbehavior:

<html tal:define="dummy here/set_encoding">
   <body>
     <p tal:content="python:u'Foo'" />
     <p tal:content="python:'ö'" />
   </body>
</html>

where the second 'p' contains an 'o-umlaut' in latin-1 encoding, and
set_encoding is a python script containing the content-type line
before.

Currently I suspect we'll have trouble in at least Silva if we use
strings that are in *any* other encoding besides ASCII, including UTF-8
and Latin1.

> The xml-handling is done in XMLToForm.py and FormToXML.py. If you go to 
> the xml-tab of a form, FormToXML.py is invoked, which fills the 
> textaerea,the encoding is hardcoded to be "iso-8859-1".

Yes, I remember doing this as a copout. I should really encode the
output as UTF-8 as that's really the sanest way to encode XML.

[snip]
> one can only use characters in that encoding AND to display them 
> correctly in your browser the Zope-default-encoding must be set to that
> encoding. Which is the problem with Silva, where the encoding is set to 
> "utf-8", so in Silva chars above 127 are not displayed.

Right, we should get rid of the latin-1-ness of Formulator. It's just
hard to do this without upgrade script..

> So why not use utf-8 as the hardcoded charset ? This does not work, then 
> already the xml-parser breaks, with invalid token error, when entering a 
> char like "ä".

I'm not entirely sure why this is happening in Formulator already.

The best strategy to go forward I suspect is to store unicode strings
in Formulator natively for the field properties where it makes sense.
If Formulator outputs unicode though for instance for 'title' or
'description' current code would be broken though (just like my
sample page template is). There's also a Formulator upgrade issue. I am not
quite clear yet on how to proceed there..

I'm considering moving towards a Formulator 2.0 anyway at some point,
which means we might accept at least the upgrade issue and a bit of
code breakage. There's also a usability issue though; if you use *any* 
latin-1
encoding in your page template use Formulator you'd get unicode errors if I
made title and description store their data as unicode...

Suggestions?

Regards,

Martijn

-- 
Mit freundlichen Grüßen                                Joachim Schmitz
......................................................................
AixtraWare eK ..Joachim Schmitz ..www.aixtraware.de ..t: +49-2464-8851
Hüsgenstr. 33a .....d-52457 Aldenhoven .............f: +49-2464-905163




More information about the formulator-dev mailing list