Systems and Means of Informatics

2014, Volume 24, Issue 4, pp 124-134


  • I. M. Adamovich
  • D. V. Zemskov


The article describes ACE (Adjustable Character Encoding) - a variable-length character encoding scheme, which is capable of encoding the full range of UCS (Universal Coded Character Set, ISO/IEC 10646) code points as sequences of one to four octets (8-bit code units). The main reason of creating this encoding was to increase, in comparison with UTF-8 (Unicode Transformation Format, 8-bit), the number of code points encoded as one-octet code unit sequence, thus allowing more compact representation of texts containing characters of a chosen national alphabet, and also to increase the capability to preserve binary representation of encoded characters of such alphabet to match their binary values in a single-byte code table. This encoding retains such properties of the UTF-8 encoding as statelessness (the representation of an encoded character does not depend on the values of previous characters), selfsynchronization (none of the valid code sequences can occur inside the other one, nor inside any adjacent sequences across their boundaries), and the possibility to locate the beginning or the end of a code sequence at any place of encoded text.

