[crossfire] Data File (Maps, Archetypes) Encodings

Mark Wedel mwedel at sonic.net
Tue Feb 6 02:03:49 CST 2007


Christian Hujer wrote:
> Hello dear co-devs.
> 
> 
> We have a common problem with the text encodings of data files.
> 
> Examples:
> * Daimonin used (until a few minutes ago) the ISO paragraph character 0xA7 for 
> separating a map's sound spec from its name.
> * Daimonin uses the ISO degree character 0xB0 for highlights in messages.
> * Crossfire uses the a circumflex character 0xE2 for the name of a wine in 
> map /maps/scorn/houses/house3.bas2.

  Not sure if still the case, but at one time there were some objects that also 
used special characters - Mjølnir  comes to mind.

> 
> This leads to some problems.
> * Crossfire x11 client displays 0xE2 as a circumflex.
> * Crossfire gtk client displays 0xE2 as ? (tested by Ragnor).

  And it appears in the GTK2 client, it won't draw the entire line/message that 
has the bad character.

> 
> For both projects, it makes sense to rethink the file formats. I see three 
> possible solutions:
> 
> 1. Use US-ASCII text only.
> That means, only data files with bytes 0x13, 0x20-0x7E are valid.
> Pro: easy
> Pro: stable
> Pro: no changes required.
> Con: very limited solution

  And one that is currently not in use, as demonstrated we already have some non 
ASCII characters making their way in.

> 
> 2. Use ISO-8859-15 text.
> That means, bytes 0x13, 0x20-0x7E, 0xA0-0xFF are valid.
> Pro: easy
> Con: clients need special handling for non-ascii chars if they are UTF-8 aware 
> and run on UTF-8 systems (e.g. gtk client).
> Con: limited solution
> 
> 3. Use UTF-8 text.
> That means, only valid UTF-8 streams with Unicodes u0013, u0020-u007E, 
> u00A0-... are valid.
> Pro: future-proof
> Pro: Allows full unicode (e.g. Chinese chars if somebody likes, or even 
> klingon if the underlying system supports it).
> Con: clients need special handling.
> Con: Windows users or users of other ancient OS editions with no good UTF-8 
> support will have more problems than with ISO-8859-15.
> 
> I see two places, where the encoding needs to be specified:
> * Data files
> * Network protocol
> 
> My favorite solution would be 3. UTF-8, followed by 1. US-ASCII. I dislike 2. 
> ISO-8859-15 very much.

  #3 probably makes the most sense, and at least for the gtk2 client, looks like 
it would actually be handled properly (as the message generated on the wine 
bottle is about invalid utf8 character).

  Also, I'm not sure how easy #2 is - it is easy from a person writing the maps 
or archetypes, but as demonstrated, pretty much all clients would have to do 
special string handling.

  #3 does make it harder for people putting the strings in (I'd think the map 
editors could try to do the right thing in those cases and covert ISO 8859 15 
characters to unicode)

  So I'd vote unicode.  I'd suspect that for clients that don't support utf8, 
things won't really be any more broken than right now - the client would display 
a funky character instead of the correct one.  But I don't believe that would 
break any portion of the clients or protocol.



More information about the crossfire mailing list