ISO 10646


The ISO / IEC 10646 standard describes the universal character set (UCS) for describing characters (letters, numbers, symbols, ideograms, etc.) from many languages, scripts, and traditions in the world. The set contains approximately 100,000 abstract characters, each with a unique name and a character code.

Theoretically, the standard allows you to store more than 2 billion characters (in 31 bits, see UCS-4), but in common usage only the first 65,536 characters (UCS-2) from the Basic Multilingual Plane (BMP) .

UCS is still under development, and amendments to ISO / IEC 10646: 2003 Amendment 1 and Amendment 2 are in progress.

Relationships to Unicode

In 1991, the ISO Working Group began working with the Unicode Consortium to create a single standard for multi-language text. Unicode 1.1 published in 1993 was already compliant with ISO / IEC 10646-1: 1993. Since then Unicode has become the official implementation of the above. standard.

See §1 of The Unicode Standard for detailed information. Conversion between ISO 10646 encoding and others

The format specified in ISO 10646 is already used in XML. Due to problems with encoding ISO-8859-2, it is suggested to encode non-ASCII characters into appropriate lexemes.

The sample XML file can be: & lt; test description = "Yellow goose self" & gt;

The conversion looks like this: <test opis="Za&#380;&#243;&#322;&#263; g&#281;&#347;l&#261; ja&#378;&#324;">

AWK can be used to perform the conversion. { gsub(/Ą/, "\\&#260;"); gsub(/Ć/, "\\&#262;"); gsub (/ / /, "\\ & amp; # 280;"); gsub (/ L /, "\\ & amp; # 321;"); gsub(/Ń/, "\\&#323;"); gsub(/Ó/, "\\&#211;"); gsub (/ s /, "\\ & amp; # 346;"); gsub(/Ź/, "\\&#377;"); gsub(/Ż/, "\\&#379;"); gsub(/ą/, "\\&#261;"); gsub(/ć/, "\\&#263;"); gsub (/ / /, "\\ & amp; # 281;"); gsub (/? /, "\\ & amp; # 322;"); gsub(/ń/, "\\&#324;"); gsub(/ó/, "\\&#243;"); gsub (/ s /, "\\ & amp; # 347;"); gsub(/ź/, "\\&#378;"); gsub(/ż/, "\\&#380;"); print }

wiki

Comments

Popular posts from this blog

Association of Jewish handicrafts "Jad Charuzim"

Grouping Red Arrows

Stanisław Kryński (translator)