Sunday, May 10, 2009

Chinese computing

I've been learning Mandarin Chinese for 2 years now. I've been attending a Mandarin language school with Tzu Chi http://en.tzuchi.ca/canada/home.nsf/home/index . My teacher is 傅老師 (Teacher Fu) and we've been learning from a book named 五百字說華語 (Speak Mandarin in Five Hundred Words).

Problem: I recently installed Ubuntu Linux on my computer at home. I figured out how to install and use the Chinese input method offered in the Windows OS. I used the Windows New Phonetic 2002a code table for input because it allowed me to use pinyin for input and obtain traditional chinese characters for output. Ubuntu has a input method (IM) framework so that you can add multiple language input methods and create your own easily. It's called SCIM (Smart Common Input Method). Unfortunately it has multiple input methods but not one suited for me. It does have Pinyin but it produces simplified characters. Not being able to find the one I wanted I went online to read about all of them. I found that 五筆字型 (wu bi zi xing) was the coolest. It's difficult that's for sure but it's also efficient when you've acquired the skill and the memory for the character strokes. Since the system is built around how the characters are written with a brush it means that I kill 2 birds with one stroke (if you will). I can learn characters, character strokes and an input method all at once. And when I finally build up the repertoire of characters I won't be (necessarily) defficient in any particular area (that is the hope anyway).

It was a problem at first. Because 五筆字型 produces simplified characters. Further research indicated that there are more than one wubi method. wubi86 supported the GB86 character set. There was an update called GBK which included traditional characters as well. However, GBK wasn't very popular and it was because GBK wasn't backward compatible with wubi86. And so wubi 2000 provided support for GB 18030-2000 which was backward compatible with GB86 but also contained traditional characters. I was looking for a wubi code table which supported GB 18030-2000. I couldn't find it.

Lucky for me, after starting this entry for my blog I discovered a method for looking up input codes for various chinese characters (until I actually know some of these codes) and I discovered that the wubi input in Linux does have traditional characters contained in the table. So that the character 謝 (traditional) and 谢 (simplified) both have the same wubi code "ytmf". So I'm not limited to a particular character set after choosing a specific input method.

My next step is to begin cataloguing the characters I've learned from my chinese textbook and inputting them into a table for quicker lookup. I'm also contemplating how to produce a database to store the various methods of locating characters.
There are various systems available for looking up characters. Radicals, pinyin, ... I will probably begin cataloguing these methods to get a better picture of how to go about searching.

More later.

No comments: