Sorting Thai is rather more complex than sorting, say, English word lists. Two considerations are:
Whilst many software environments include an algorithm for sorting Thai words, some don't, most notably some databases and some programming environments. In such cases we may need to implement a Thai sorting algorithm.
Perhaps the earliest such sorting algorithm was described by Londe & Warotamasikkhadit (1969). It converts a Thai string into a comparison string which can be sorted character by character. The steps are:
The following app demonstrates this process.
Instructions
Following is sample Java code that implements the algorithm.
First some static variables and a couple of utility methods.
static final char SARA_E = 0x0E40; static final char SARA_AI_MAIMALAI = 0x0E44; static final char MAITAIKHU = 0x0E47; static final char THANTHAKHAT = 0x0E4C; // a.k.a. "garan" static boolean isLeadingVowel(char c) { // Returns true if character is in the range from SARA E to SARA AI MAIMALAI, // i.e. if the character is a leading vowel return (c >= SARA_E && c <= SARA_AI_MAIMALAI); } static boolean isToneMark (char c) { // Returns true if character is in the range from MAITHAIKHU to THANTHAKHAT // which includes the four tone marks. I.e. all "above" symbols return (c >= MAITAIKHU && c <= THANTHAKHAT); }
The comparison string class:
static String getThaiComparisonString(String s) { // Convert String to a character array char[] chars = s.toCharArray(); // Swap all leading vowels with next character for (int i = 0; i < chars.length; i++) { if (isLeadingVowel(chars[i])) { char c = chars[i]; chars[i] = chars[i + 1]; chars[i + 1] = c; i++; } } // The String for comparison is built in to parts, here referred to // as "head" and "tail". "tail" always begins with "00". String head = ""; String tail = "00"; // Add each character to the "head" unless it's a tone mark, // MAITHAIKHU, or THANTHAKHAT, in which case add a 2 digit // String to "tail" representing its original position from the // END of the original String, then append the mark itself to "tail". for (int i = 0; i < chars.length; i++) { if (isToneMark(chars[i])) { int pos = chars.length - i; tail += (pos >= 10) ? "" + pos : "0" + pos; tail += chars[i]; } else { head += chars[i]; } } // Return the Comparison string return head + tail; }
Comparison is now as simple as:
@Override public int compare(String s1, String s2) { String cs1 = getThaiComparisonString(s1); String cs2 = getThaiComparisonString(s2); return cs1.compareTo(cs2); }
The override is of the compare() method in the java.util.Comparator interface.
Now, to sort a String[] array, all that's needed is
Arrays.sort(array, new Comparator(){ @Override public int compare(String s1, String s2) { return comparator.compare(s1, s2); }});
References