RubyConf: I18N, M17N, Unicode And All That

Tim Bray of Sun Microsystems gave an insightful presentation on Internationalization (I18N), Multilingualization (M17N), and unicode. Even Tim who has spent most of his career in unicode support add admitted that “for some this is not the most stimulating subject.” Tim asked the audience of programmers, “why do we care about internationalization?” The reason is not so obvious to native English speakers but the answer can be found online. English is no longer the predominant language of the internet. A software project should think about localization (L10N) and internationalization from the onset. According to Tim, it doesn’t make much sense to develop and application that does not support I18N, M17N, and L10N.

Tim stated that if you had the following regular expression piece of code, it is probably a bug:

/[a-zA-Z]+/


Many rails applications have code like this. Perhaps worst yet is that rails heavily depends on Ruby string methods such as capitalize, upcase, downcase, etc. Tim noted that functions not all unicode characters can be capitalize and often some unicode strings will choke one of these functions are applied on them. In addition to capitalize, upcase, and downcase Ruby has I18N issues with the swapcase, match, size, strip, ==, =~, [], eql?, and more…

Tim suggested that to better support internationalization and unicode Ruby programmers should avoid case folding, that is the use of capitalize. According to Tim, case folding routinely provides the wrong answers and should be avoided.

Tim ended this discussion by saying, “I want Ruby to be a good citizen of the world.” If you want to take a look through Tim’s presentation, it can be found .

Technorati Tags: , , , , , , , ,