Sunday, June 14, 2009

Architecture FAQ for Localization and Globalization: Part I


When we see around, architectures mainly discuss about loose coupling , scalability , performance etc etc. Many architecture forget one of the important aspects in software is making application globalized. Depending on project some application would not really want to have multi-language based websites , but I am sure many will. So in this article we will go through a series of FAQ which will give you a quick start on making application multi-language based.

What is Unicode & Why was it introduced?

In order to understand the concept of Unicode we need to move little back and understand ANSI code. ASCII (ask key) stands for American Standard Code for Information Interchange. In ASCII format, every character is represented by one byte (i.e. 8 bits). So in short we can have 256 characters (2^8). Before UNICODE came in to picture programmers used code page to represent characters in different languages. Code page is a different interpretation of ASCII set. Code pages keep 128 characters for English and the rest 128 characters are tailored for a specific language.

Below is a pictorial representation of the same.

Figure 14.1:- Code page in action

There are following disadvantages of the CODE page approach:-

  • Some languages like Chinese have more than 5000 characters, which is difficult to represent only with 128-character set.
  • Only two languages can be supported at a one time. As said in the previous note you can use 128 for English and the rest 128 for the other language.
  • The end client should have the code page.
  • Code Representation change according to Operating system and Language used. That means a character can be represented in different numbers depending on operating system.

See full detail: