Grammar detail: CJK compatibility ideographs

CJK compatibility ideographs
353 words
When the ability to display kanji on computer screens was first developed, different countries developed different approaches in parallel, and several ways of 'encoding' kanji, as well as several different 'character sets' were created. Today, these sets have largely been unified and subsumed into Unicode, a standardized set of all characters in all languages.
In some cases, the same character was represented in multiple character sets in different ways, even though the different versions might look identical. This can cause problems - for example, pasting one version into a dictionary may not return the expected results, because the dictionary's version of a word uses another version of the character.
To address this issue, a special set of Unicode characters known as 'CJK Compatibility Ideographs' was developed. Characters in this set act as 'pointers' to the preferred version of a character with multiple conflicting versions.
In some cases, though, the preferred version of the character is visually a little different. This can also cause confusing results, because applications that perform 'Unicode normalization' will automatically replace the 'compatibility' version with the 'preferred' version. This can mean that the character you enter in a search appears to change to another character.
For example, consider the character . This character was originally included in the Shift JIS encoding of Japanese, the most commonly used until Unicode. It is actually an earlier variant of the modern Joyo kanji : in the original Shift JIS variant, the left side, which is a form of , actually looks closer to it than the left side of the Joyo version, where one stroke is missing. However, if you paste into any text box in Safari, you will notice that what actually appears is !
Another commonly encountered example is , which is an older form of - an extra dot is present to the top right of the 日. is a common component, used in 19 Joyo kanji - and in almost all of them, the modern version (without the dot) is used. But not all! In and you can see the dot is still present.
