[formulator-dev] Re: encoding solutions.

Stuart Bishop zen at shangri-la.dropbear.id.au
Thu Sep 25 04:53:44 CEST 2003


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday, September 24, 2003, at 10:51  PM, Martijn Faassen wrote:

>> It may be possible to make TAL.TALInterpreter.FasterStringIO handle
>> mixing Unicode string and traditional strings sanely. It would involve
>> replacing the inherited getvalue() method with one that catches the
>> ASCII encoding exceptions, converting these strings to Unicode strings
>> (by checking the RESPONSE headers for the expected character encoding,
>> or using a heuristic method to guess if you're feeling brave).
>> I haven't tried this yet, and fixing DTML is left as an exercise to 
>> the
>> reader :-)
>
> I'm not sure I understand what you're proposing here. Are you proposing
> changing TAL? That could have a pretty far reaching impact.

Yes, but if it worked, the only impact would be that pages that used
to fail with a Unicode 'ASCII not in range' error would now probably
work happily, and only with a slowdown if such an exception was raised.

At the moment, FasterStringIO does:
def getvalue(self):
     return ''.join(self.data)

I'm proposing something like:
def getvalue(self):
     try:
         return ''.join(self.data)
     except UnicodeError:
         data = self.data
         encoding = get_encoding_from_response_content_type_header()
         for i in xrange(0,len(data)):
             if type(data[i]) == type(''):
                 data[i] = data[i].decode(encoding)
         return ''.join(data)

> Right now Zope seems to be following Python's behavior; you can't
> mix traditional strings and unicode unless the traditional string only
> contains ascii (first 128 characters) text. If you put in unicode at 
> all,
> all ZPT output will turn into unicode strings.

In Python, this works quite happily:
	print u'\u2122'.encode('utf-8'),
	print u'\u2122'
ie. You can mix *yourself* if you are careful. Zope doesn't give you
this luxury except in those cases you can use RESPONSE.write.

When Python gets a traditional string with high-bit characters and
tries to concatenate it with a Unicode string, it has no choice but
to throw an exception because it has no idea what the encoding scheme
was. Zope, however, *may* be able to recover in this situation as 
someone
has specified the probable character set in RESPONSE.

It might be possible to have this sort of behavior in Python too by
using an environment variable (magic) or a 'default_encoding' argument
to ''.join(...) (less magic), but there is less need for it since you
can usually work around the problems in other ways. The Zope framework 
is
not that flexible though, so a migration path might be necessary.
Probably no hope of getting this into 2.7, and it might be too much of
a new feature to go into the 2.6 series, so its most likely just wishful
thinking :-)

- -- Stuart Bishop <zen at shangri-la.dropbear.id.au>
http://shangri-la.dropbear.id.au/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (Darwin)

iD8DBQE/clhbh8iUz1x5geARAgLuAJ9x2NLsyCVDmAWQNiLsuOmkdkOxHACfXk4X
DJ3HoYUou1FZPrlW6xCD5iw=
=L9X/
-----END PGP SIGNATURE-----


-- 
Stuart Bishop <zen at shangri-la.dropbear.id.au>
http://shangri-la.dropbear.id.au/




More information about the formulator-dev mailing list