Friday, July 17, 2009

String Permutations

Apparently the "Challenge" in the COMP102 assignment was to write some code to print out all the permutations of a string. (Say for ABC, that'd be ABC, ACB, BAC, BCA, CAB, CBA). However that challenge was apparently left over from last year, and due to the number of points of the paper changing, they haven't covered recursion yet. So I got challenged by a friend to write a program that did this without using recursion (which is the natural way I would do it).

I puzzled over this for a while, trying a couple of ways, with little success. Then I went to a tutorial for another paper, but didn't really pay much attention, more trying to figure out a solution for this problem. I made a little bit of progress, but the solution I implemented when I got back to the lab didn't work. But then I had a bit of a brain wave as I was working and managed to do it. Here's the code:
  1: public static List<String> getAllPermutationsIteratively(String s) {
  2:    int length = s.length();

  4:    List<Character> chars = new ArrayList<Character>(length);
  5:    for(char c : s.toCharArray()) {
  6:       chars.add(c);
  7:    }

  9:    int numPermutations = factorial(length);

 11:    List<String> permutations = new ArrayList<String>(numPermutations);

 13:    for(int permNo = 0; permNo < numPermutations; permNo++) {
 14:       List<Character> charsLeft = new ArrayList<Character>(chars);

 16:       char[] currentString = new char[length];

 18:       for(int i = 0; i < length; i++) {
 19:          int index = (permNo % factorial(charsLeft.size())) / factorial(charsLeft.size() - 1);
 20:          currentString[i] = charsLeft.remove(index);
 21:       }

 23:       permutations.add(new String(currentString));
 24:    }
 25:    return permutations;
 26: }
Line 19 contains the magic statement. I commented this as, "The division means that the index only increases after so many iterations. The remainder reduces the number range to the number of iterations the index stayed the same for in the previous division." I don't know if that helps all that much, but if you write down a table (which I can't be bothered doing) it makes a lot more sense what's happening. (Or get it to process "0123" or "01234" and print out each string in the resulting list)

I then wondered how this would compare to a recursive implementation, so did one of those too:
  1: public static List<String> getAllPermutationsRecursively(String s) {
  2:    int length = s.length();

  4:    List<String> permutations = new ArrayList<String>(factorial(length));

  6:    for(char[] c : getAllPermutationsRecursively(s.toCharArray(), length, length - 1)) {
  7:       permutations.add(new String(c));
  8:    }

 10:    return permutations;
 11: }

 13: private static char[][] getAllPermutationsRecursively(char[] chars, int length, int pos) {
 14:    char[][] toReturn = new char[factorial(pos + 1)][];

 16:    if(chars.length == 2) {
 17:       toReturn[0] = new char[length];
 18:       toReturn[1] = new char[length];

 20:       toReturn[0][0] = toReturn[1][1] = chars[1];
 21:       toReturn[0][1] = toReturn[1][0] = chars[0];
 22:    } else {
 23:       int toReturnPos = 0;
 24:       for(int i = 0; i < chars.length; i++) {
 25:          char c = chars[i];

 27:          char[] newChars = new char[chars.length - 1];
 28:          System.arraycopy(chars, 0, newChars, 0, i);
 29:          System.arraycopy(chars, i + 1, newChars, i, chars.length - i - 1);

 31:          for(char[] charArray : getAllPermutationsRecursively(newChars, length, pos - 1)) {
 32:             charArray[pos] = c;
 33:             toReturn[toReturnPos++] = charArray;
 34:          }
 35:       }
 36:    }
 37:    return toReturn;
 38: }
Despite this being a little longer, conceptually I think it's much simpler. (Yes, I did strip out the comments to be annoying). It's not the first version I wrote, but rather as optimised as I could be bothered doing, making it much faster than my original naive implementation.

This version turns out to be more than twice as fast as the iterative version. The two main reasons for this are that the iterative version copies, then removes the elements of a List copy of the chars in the string for every string returned, and it also does a whole lot of remainder and division operations.

Oh yeah, if you're wondering this is my factorial function:
  1: private static int factorial(int n) {
  2:    int factorial = 1;
  3:    for(int i = 2; i <= n; i++) {
  4:       factorial *= i;
  5:    }
  6:    return factorial;
  7: }
I tried a few versions, including saving each value to a list, then an array so they wouldn't have to be calculated again, and having the first 10 values precalculated, but these were slower, so I went back to this implementation. It's probably got something to do with the way the compiler optimises it.

The other thing to note with this implementation is that there is a maximum length for the string. Say the length of the input string is n, then n! must be less than 2^31 (the maximum value of an int) otherwise the arrays cannot be made big enough to fit all the permuatations. This means you can use words at most 12 letters long. In practice, I only got it to work for strings of length 10 or less. I ran out of stack space, even when allocating 2500 MB for a 11 letter word. It wouldn't let me set the memory to 3000 MB or higher for some reason.

This shows I'm up to the COMP102 challenge. Boy is that satisfying.

No comments:

Post a Comment