Omnis Technical Note TNSQ0013 September 2006
Mapping Character Sets
For Omnis Studio 4.1 or above.
By Dr Michael Hufschmidt
Problem
Since the days of the Tower of Babel there have been thousands of different
languages in the world. As long as you stay within your native country
or culture this isn't usually a problem. But as soon as you want to communicate
with foreigners, you either have to use a common language or you need
an interpreter. And to make it worse, the multitude of languages comes
with a multitude of different character sets.
The same is true for programming: As long as you develop stand-alone programs you won't find any problems, but when you develop client/server solutions that communicate with a remote database, either all clients must speak the language of the database server or you will need an interpreter.
The most common problems are concerning different character sets: Letters are in most cases saved in one byte. Therefore, for a program there are 256 different characters available (code 000 to 255). In ASCII code only the first 128 characters are fixed, whereas National special characters have got an ASCII code over 128. There are many different assignments of these codes over 128 (= character set tables), for example, refer to:
www.iana.org/assignments/character-sets
The ultimate solution to any problems regarding character sets is to use Unicode. In Unicode each character is saved in up to 4 bytes, so that there are 4,294,967,296 different letters/characters available. For further details about Unicode and the character sets available, refer to:
Since December 2005, Omnis Studio version 4.1 is available in a Unicode version so that Omnis can handle all characters in all languages correctly. When you are using the Non-Unicode version of Omnis Studio and want to store text with national special characters in a database you have to make sure that the clients and the server are using the same character set table. Otherwise national characters, for example, German umlauts such as ÄÖÜ and äöü won't be displayed correctly.
Omnis as an interpreter
But what can you do in the Non-Unicode version of Omnis-Studio and you
need to communicate with a database that has set up a different character
set? Don't worry, there is help available: The Data Access Modules (DAMs)
in Omnis Studio can re-code the character set in a way that it fits to
the character set of the server. For this the appropriate mapping tables
have to be created and must be assigned as a property to the session object
being used:
Do mySessionObj.$charmap.$assign(kSessionCharMapTable)
Do mySessionObj.$maptable.$assign('myMapTable')
There are always two map tables required: one for mapping the characters in communications from the client to the server (file name myMapTable.OUT), and a reciprocal table for the character mapping in communications from the server to the Omnis client (file name myMapTable.IN). Omnis provides you with a tool to create these tables, called the Charmap utility. You can find a detailed description of this mapping procedure in the "Omnis Programming" manual (edition November 2005), Chapter 9 "SQL Programming", section "Character Mapping", page 375. This manual is included as a PDF file on the Omnis product DVD.
A special case: Oracle
When using Oracle, the Omnis DAM is not communicating directly with the
database server. In fact, on the Omnis client you have to have the Oracle
clientware installed. The Omnis DAM communicates with the local Oracle
clientware and this in turn communicates with the Oracle server.
The mapping of the character sets as described above can be done either
by Omnis or in the Oracle clientware. The Oracle clientware however, stores
the character set used either in an environment variable NLS_LANG and/or
in the Windows Registry in the node:
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOME0
The value of the environment variable will be preferred, and could contain values such as:
GERMAN_GERMANY.WE8ISO8859P1
or
GERMAN_GERMANY.WE8MSWIN1252
When you first install an Oracle server, by default, the first character
set is ISO-8859-1 (Alias Latin-1), and when you install an Oracle client
under Windows the second character set is usually set up as Windows-1252.
Therefore, the character set problems as described here will typically
happen.
Please note that you must map the character sets either by the Oracle
client or by Omnis, but not both! Otherwise the special characters would
be mapped twice, and that of course will lead to the unexpected results.
A special case: MySQL
With a normal installation of a MySQL server, by default, the character
set is ISO-8859-1, like Oracle, but you can set up the character set each
time you create a new table on the database, such as:
CREATE TABLE myTable ( ... ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
With the MySQL command
SHOW CHARACTER SET;
You will receive a list of all character sets supported by any particular server, so that you can edit the SQL string to create a server table if necessary as follows:
CREATE TABLE myTable ( ... )ENGINE=MyISAM DEFAULT CHARSET=cp1250;
When MySQL is using the default character set ISO-8859-1 you must set the property
Do MySQL_SessionObj.$charmap.$assign(kSessionCharMapNative)
in the Omnis session object. Then .IN and .OUT map tables are not required anymore.