Thursday, August 23, 2012

Reversing a string in C

I was asked today if I could write a C function that reversed a string without knowing the string size. What a boring question! I don't think they were particularly interested in whether I could reverse strings. I had told them I could write C, and I think they were trying to verify this by handing me some rope. Here's what I whiteboarded.

Let's assume the string is HELLO BURGER!. Why hello burger? Because "Hello World!" is overdone and I like burgers. Burgers from Grill'd, to be specific. Grill'd is a faux-90's Australian hamburger chain aimed at Sydney hipsters with too much disposable money. To be clear, I'm not a hipster and I don't have much disposable income unless you count pocket lint as disposable income, I just like their burgers.

HELLO BURGER in a table with corresponding index numbers
I recommend the Zen Hen.

I've placed an index grid above the phrase. Keep in mind that in C, we start counting at 0. So even though there are 13 letters in "HELLO BURGER!", the final character (an exclamation mark) is at index 12. 

If we were to print the string with the statement printf("%s", input)then "HELLO BURGER!" would appear. The statement printf("%c", input[0]) would print "H" and printf("%c", input[6]) would print "B". Pretty straight forward. You might have noticed there's a \0 in the end: in C, this is a "end of string" terminator. In C, a string is defined as a...err..string of letters followed by a \0. The \0 tells C that the string has ended. If a string consists only of \0 then it's empty.

The empty string is a \0 in string position 0
This is an empty string. If input[0] is \0, it's empty.

The following is not a string. And because it's not a string, functions like strlen() that iterate until they see the terminator won't work.

BURGER as a string without \0
Not a banana either.

Let's go line by line through this application.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

First of all, let's get the basics out of the way: include string.h along with your usual libraries: you'll need it to calculate the length of an input string. The reason why this is required will become obvious later.

char* reverseString(char* input) 

In our function signature, we'll be accepting string input. Since we're accepting a string, the type of the input will be a character pointer (char*). We'll also be returning a reversed string, which is why we return the same type. Remember, there's no string datatype in C, which is a reason why the language is unsuited for any quick and dirty string manipulation.

int length = strlen(input);
int i = 0;
int j = 0;

To reverse a string in C, we'll be using a for loop. We'll need to iterate through each character in the string which will require us to know how long the string is. We'll use the variable i to iterate from the end of the string, and j to iterate from the beginning. This will make sense later.

char* newString = malloc(sizeof(char)*length+1);

This line allocates some memory for the string. malloc() allocates memory by giving you a pointer to a block of memory you can play with. To do this, malloc() needs to know how much memory you want: do you want 1kb? 2TB? I can't think of a string that would require 2TB of memory (maybe an e-mail from my mother, if she knew how to send e-mails).

We know how many characters we want: this in the variable length. However, malloc() needs to know how much memory you need, not the string length. To figure this, we use the function sizeof() which returns the amount of memory used by a type. On most computers, a char is 1 byte. But this isn't guaranteed, so we need to use sizeof(). If we multiply the size of the unit we need (a char) by the length (and add one char for the \0), we'll get the size of memory we need.

for (i=length-1, j=0; i>=0; i--, j++) {}

To reverse the string, the end is a good place to start! The final character in the string (an exclamation mark) is at input[12]. Using input[length] results in position 13 (which is the \0 and definitely isn't what we want), so our start position i should be length-1. We want to iterate until we reach index 0: the H, so we continue while the statement i>= 0 is true. After every iteration, we decrease i. In a normal for loop, you increase your loop control variable, but in our case we're iterating through the string backwards so we decrease the loop variable.

HELLO BURGER in a table with corresponding index numbers
Did I mention the fries at Grill'd? Do yourself a favour and upsize.

We'll use j as the loop control variable to write the reversed string. When writing the new string, j will start at index 0 and increment for each new character written.

newString() is ready to be written to...
newString[] is ready to be written to...

newString[j] = input[i];
This takes the current position of input[i] and writes it to the position of newString[j]. Because i starts from the end of the input string and j starts from the beginning of the new string, we are writing the reverse of the input string!
return newString;
We return a character pointer to the reversed string. This pointer points to a block of memory containing the reversed string. Because it was allocated by malloc(), the memory needs to be returned with free() once you're done with it.

Done! You can give it a try by pasting the code into Eclipse.

Why didn't you just use an array to store the reversed string?
Variable length arrays like char newString[strlen(input)] are illegal in C89 standard C. In C89, the memory structure required to store an array is allocated at compile time and not runtime. This means we needed to use a function that allocates memory at runtime, like malloc().

Why didn't you use a large array like char newString[50000]?
Because the string could have 50001 elements! Plus, that's a waste of memory: you'd be allocating sizeof(char)*50000 bytes instead of just sizeof(char)*strlen(input)+1 bytes. If you were running this function once, nobody would care. If this function were being called 20,000 times a minute (say, if you were writing some sort of function to parse every song lyric even written to see if there were any hidden backward messages), you'd want it to be efficient. In fact, there are more efficient ways of writing this function.

Are there any other ways to reverse a string?
Heaps, this isn't Python!

No comments:

Post a Comment