Sorting Thai is rather more complex than sorting, say, English word lists. Two considerations are:

  1. whilst leading vowels are written first, they are less significant than the initial consonant of a syllable.
  2. the four tone marks, máy ​tày​khúu, and thanthakhâat (kaaˑ​ran) should be ignored unless all other parts are identical.

Whilst many software environments include an algorithm for sorting Thai words, some don't, most notably some databases and some programming environments. In such cases we may need to implement a Thai sorting algorithm.

Perhaps the earliest such sorting algorithm was described by Londe & Warotamasikkhadit (1969). It converts a Thai string into a comparison string which can be sorted character by character. The steps are:

  1. Swap every leading vowel in the string with the next character.
  2. Append two zero digits to the string.
  3. Going from left to right, remove each tone mark, máy ​tày​khúu, and thanthakhâat, then append a two digit string showing the mark's original position from the end of the string plus the mark itself to the end of the comparison string.

The following app demonstrates this process.

Instructions

Following is sample Java code that implements the algorithm.

First some static variables and a couple of utility methods.

static final char SARA_E = 0x0E40;
static final char SARA_AI_MAIMALAI = 0x0E44;
static final char MAITAIKHU = 0x0E47;
static final char THANTHAKHAT = 0x0E4C;  // a.k.a. "garan"  

static boolean isLeadingVowel(char c) {
	// Returns true if character is in the range from SARA E to SARA AI MAIMALAI, 
	// i.e. if the character is a leading vowel

	return (c >= SARA_E && c <= SARA_AI_MAIMALAI);
}

static boolean isToneMark (char c) {
	// Returns true if character is in the range from MAITHAIKHU to THANTHAKHAT
	// which includes the four tone marks.  I.e. all "above" symbols

	return (c >= MAITAIKHU && c <= THANTHAKHAT);
}
	

The comparison string class:

static String getThaiComparisonString(String s) {

	// Convert String to a character array
	char[] chars = s.toCharArray();
	
	// Swap all leading vowels with next character
	for (int i = 0; i < chars.length; i++) {
		if (isLeadingVowel(chars[i])) {
			char c = chars[i];
			chars[i] = chars[i + 1];
			chars[i + 1] = c;
			i++;
		}
	}
	
	// The String for comparison is built in to parts, here referred to
	// as "head" and "tail".  "tail" always begins with "00".
	String head = "";
	String tail = "00";
	
	// Add each character to the "head" unless it's a tone mark,
	// MAITHAIKHU, or THANTHAKHAT, in which case add a 2 digit
	// String to "tail" representing its original position from the
	// END of the original String, then append the mark itself to "tail".
	
	for (int i = 0; i < chars.length; i++) {
		if (isToneMark(chars[i])) {
			int pos = chars.length - i;
			tail += (pos >= 10) ? "" + pos : "0" + pos;
			tail += chars[i];                                
		}
		else {
			head += chars[i];
		}
	}
	
	// Return the Comparison string
	return head + tail;
}

Comparison is now as simple as:

@Override
public int compare(String s1, String s2) {                   
	String cs1 = getThaiComparisonString(s1);

	String cs2 = getThaiComparisonString(s2);

	return cs1.compareTo(cs2);
}

The override is of the compare() method in the java.util.Comparator interface.

Now, to sort a String[] array, all that's needed is

Arrays.sort(array, new Comparator(){
	@Override
	public int compare(String s1, String s2) {              
		return comparator.compare(s1, s2);
	}});

References

Index
โก,โก่,โก้,โก๋,เคียว,เคี่ยว,เคี้ยว,จุน,จุ่น,จุ้น,ปา,ป่า,ป้า,ป๋า,ผา,ผ่า,ผ้า,เลน,เล็น,เล่น,วาว,ว่าว,ว้าว,โอ้ก,โอ๊ก,กวยจั๊บ,ก๋วยจับ,ก๊วยเตี๋ยว,ก๋วยเตี๋ยว,ขมขื่น,ข่มขืน,ลงทอง,ลงท้อง,โล่งโต้ง,โล้งโต้ง