UniHan: Intro / 統一漢字庫 簡介

Unihan is a database maintained by the Unicode Consortium. The two-decades old database seeks to compile all known features about Chinese-Japanese-Korean (CJK) ideographs, from readings to variations, from stroke counts to its index in various authoritative dictionaries. CJK ideographs have etched within it the cultural history and geopolitical intrigue of billion+ people over thousands of years, and UniHan is the most impressive attempt at systematically documenting this.

Unihan (統一漢字庫) 是個由統一碼聯盟維護的資料庫。 由大概千禧年開始建立, 在過往20多年, 團隊傾力將所有關於中、日、韓 (CJK) 漢字的資料收納歸一。 資料庫內容廣闊,由發音到異體字、 筆劃數量到各種字典內的頁數,全部通通收納。 在UniHan資料庫,我們可以看到歷史長河下,東亞地區億萬人口 數千年政治文化演變遺下的痕跡。

Click on the following tabs to reveal some concepts within UniHan.

Ideographs ("drawings that expresses ideas") are the basic unit of discourse, and roughly maps to what we call "a character" in every day usage. The UniHan database, as of version 15.0 (2022 Sep) contains information about _____ ideographs. Sample of ideographs include:

Each ideograph contains multiple fields. A field describes a particular kind or source of information. Examples include kKangXi describing the page and position of where the ideograph is found in the 16th century KangXi Dictionary, or kCantonese describing an ideograph's most common pronunciation in Cantonese. There are over 90 unique fields in UniHan.

It is surprisingly difficult to say whether two CJK ideographs are identical. Take for example the three shapes 广 広 廣: they share meaning in Simplified Chinese, Japanese Hanzi, and Traditional Chinese but is shaped differently. Variations occur within Traditional Chinese: Taiwan prefers 峰 wheres Hong Kong favors 峯, but both population recognizes both glyphs. There is so much to say here that we will later devote an entire page for its exploration.

In this series of articles, I will be your guide, but we will explore together. The exhibitions are interactive in two senses:

  1. you are encouraged to change the default and see what you find, and
  2. you can be inspired by what others find

Some of you will have so much fun exploring, that you run up against the limitations of the exhibits. Don't fret! The exhibits are interactive because we wrote an Elixir library to access UniHan, and Kip Cole and I made open-sourced this underlying code. The addendum teaches you (just a little) programming so you can explore for yourself.