Sorting Thai is rather more complex than sorting, say, English word lists. Two considerations are:
Whilst many software environments include an algorithm for sorting Thai words, some don't, most notably some databases and some programming environments. In such cases we may need to implement a Thai sorting algorithm.
Perhaps the earliest such sorting algorithm was described by Londe & Warotamasikkhadit (1969). It converts a Thai string into a comparison string which can be sorted character by character. The steps are:
The following app demonstrates this process.
Instructions
Following is sample Java code that implements the algorithm.
First some static variables and a couple of utility methods.
static final char SARA_E = 0x0E40;
static final char SARA_AI_MAIMALAI = 0x0E44;
static final char MAITAIKHU = 0x0E47;
static final char THANTHAKHAT = 0x0E4C; // a.k.a. "garan"
static boolean isLeadingVowel(char c) {
// Returns true if character is in the range from SARA E to SARA AI MAIMALAI,
// i.e. if the character is a leading vowel
return (c >= SARA_E && c <= SARA_AI_MAIMALAI);
}
static boolean isToneMark (char c) {
// Returns true if character is in the range from MAITHAIKHU to THANTHAKHAT
// which includes the four tone marks. I.e. all "above" symbols
return (c >= MAITAIKHU && c <= THANTHAKHAT);
}
The comparison string class:
static String getThaiComparisonString(String s) {
// Convert String to a character array
char[] chars = s.toCharArray();
// Swap all leading vowels with next character
for (int i = 0; i < chars.length; i++) {
if (isLeadingVowel(chars[i])) {
char c = chars[i];
chars[i] = chars[i + 1];
chars[i + 1] = c;
i++;
}
}
// The String for comparison is built in to parts, here referred to
// as "head" and "tail". "tail" always begins with "00".
String head = "";
String tail = "00";
// Add each character to the "head" unless it's a tone mark,
// MAITHAIKHU, or THANTHAKHAT, in which case add a 2 digit
// String to "tail" representing its original position from the
// END of the original String, then append the mark itself to "tail".
for (int i = 0; i < chars.length; i++) {
if (isToneMark(chars[i])) {
int pos = chars.length - i;
tail += (pos >= 10) ? "" + pos : "0" + pos;
tail += chars[i];
}
else {
head += chars[i];
}
}
// Return the Comparison string
return head + tail;
}
Comparison is now as simple as:
@Override
public int compare(String s1, String s2) {
String cs1 = getThaiComparisonString(s1);
String cs2 = getThaiComparisonString(s2);
return cs1.compareTo(cs2);
}
The override is of the compare() method in the java.util.Comparator interface.
Now, to sort a String[] array, all that's needed is
Arrays.sort(array, new Comparator(){ @Override public int compare(String s1, String s2) { return comparator.compare(s1, s2); }});
References