Legends of the C Pointers

When people ask me what my first language is, I sometimes playfully answer, “C”. C taught me what it meant to be a programmer. Few days back when I decided to start hacking the linux kernel again, I realized how much I miss programming in C. Playing with the pointers was undoubtedly the best part! So, I thought I would write this post.

If you have irrational fear of pointers or preparing for a C interview or just want to have fun, you should read this post! I am assuming you know the basics of pointers, but just cannot get a total hold on them.

I will start by stating a quote by Joel Spolsky:

I don’t care how much you know about continuations and closures and exception handling: if you can’t explain why while(*s++ = *t++); copies a string, or if that isn’t the most natural thing in the world to you, well, you’re programming based on superstition, as far as I’m concerned: a medical doctor who doesn’t know basic anatomy, passing out prescriptions based on what the pharma sales babe said would work.

What is a pointer? In layman’s term, it is a variable that stores a memory address. If all pointers stores memory addresses, then why do we need different pointers like a integer pointer or a character pointer? We need different type pointers for pointer arithmetic. Since pointers are variables they can be incremented or decremented. Incrementing a character pointer will increment the address by 1 byte as a character is 1 byte. Similarly incrementing a integer pointer will increment the address by 2 bytes (we assume that int is 2 bytes for the compiler).

What does the following function do?

void swap(int x, int y){
	int t;
	t=x;x=y;y=t;
}

Of course, ‘swap(a,b);’ does nothing! Remember, C always calls functions by value. So, here x and y will have copies of the original variables a,b and will be destroyed as soon as the program returns from the function. In fact, there is no direct way to alter any variable within a function! Here is a fix for the ‘swap’ function:

void swap(int* x, int* y){
	int t;
	t=*x;*x=*y;*y=t;
}

You must understand that calling ‘swap(&a,&b);’ is still by value, x,y will still have the copies of the addresses of a,b, and those copies will still be destroyed as soon as the function returns. So, why does it work now? Let’s assume that addresses of a and b are 1008 and 1020. So, x and y are 1008 and 1020, which are indeed copies. In the function, whatever is in the address 1008 is swapped with the content of address 1020. So, it does not matter if x=1008 and y=1020 are destroyed because the addresses of a and b are still 1008 and 1020 and whatever were in those addresses are now swapped! This simple function provides the fundamental understanding of pointers.

Let’s look at another function that copies a source string to a destination string:

//assuming dest has enough memory for src
void strcpy(char *dest, const char* src){
	while(*dest++=*src++);
}

To understand this there are couple of concepts that you will find handy.

  • *p = *p + 1; increments the value pointed by p and stores it in the same address.
  • p = p + 1; increments the address, so if p is an integer pointer then p now 2 bytes incremented.
  • Unary operators ++, *, etc., are associate right meaning if brackets are not provided then they will be executed from right to left.
  • a=b++; saves b to a then increments b by 1 and a=++b; increments b by 1 and then saves b to a.

With these concepts, let’s work out an example. Suppose, p is an integer pointer that points to address 1008, which stores value 5.

  • x=*p++; saves the value to x and then increments the address [x=5, *p=5, p=1010].
  • x=++*p; or x=++(*p); increments the value and save it to x (address remain unchanged!) [x=6, *p=6, p=1008].
  • x=(*p)++; saves the value to x and then increments the value (address remain unchanged!) [x=5, *p=6, p=1008].
  • x=p++; saves the address to x and then increments the address [x=1008, p=1010, *p=unknown].
  • x=++p; increments the address and then saves the address to x [x=1010, p=1010, *p=unknown].
  • If put alone both p++; and *p++; statements are equivalent, but ++p; and ++*p; are not!

Back to the strcpy function. The value is extracted from the address pointed by current src, then it is saved to the address pointed by current dest, and finally both addresses are incremented. This process is repeated until the extracted value is '' which is the loop termination point.

The next concepts you need to revisit are memory stack and memory heap. Every time a function (including the main() function) declares a new variable, it is “pushed” onto the stack memory and when a function exits, all of its variables are “poped” from the stack and deleted (therefore lost forever). Once a stack variable is freed, that region of memory becomes available for other stack variables. You can also say that stack variables are local variables of a function. The stack is managed by the CPU. Heap, on the other hand, is managed manually by the programmer. You can allocate memory in the heap by malloc or calloc method and when you do, the allocated memory will not be available to any other process. So, it is your responsibility to deallocate the memory from heap using the free method and if you fail to do so then there will be a “memory leak”. Consider the following program:

int* getAnArray(){
	int arr[3]={1,2,3}; // stack memory
	int* pt = &arr[0];
	while(*pt) printf("%d",*pt++); //print 123
	pt = &arr[0];
	return pt;
}
/**
* This function can be optimized. 
* Written like this to maintain the similarity with the previous function.
**/
int* getAPointer(){ 
	int* arr=malloc(3*sizeof(int)); //heap memory
	*arr=1;*(arr+1)=2;*(arr+2)=3;
	int* pt = arr;
	while(*pt) printf("%d",*pt++); //print 123
	pt = arr;
	return pt;
}
int main(){
	int *pt=getAPointer(); //get the heap address
	int *t=pt; // a temporary variable to go through the array
	while(*t) printf("%d",*t++); //print 123
	free(pt); //free for the malloc in the getAPointer() function

	int *arr=getAnArray(); 
	while(*arr) printf("%d",*arr++); //garbage
	return 0;
}

Since you studied pointers before, you know that whatever you can do with arrays you can do with pointers, except pointers are more powerful! One reason is that when arrays are declared, stack memory is allocated for them, therefore you simply cannot return an array from a function! Because before returning the stack memory will be destroyed. However, with pointers you can allocate memory in the heap and you can have the memory as long as you want.

The advantage of using a array of pointers or pointer to pointers over a 2D array is that the 2nd dimension can have different size. That is why, the command line argument of main has a array of pointers rather than a 2D array since each of the argument will probably have different length. You need to allocate memory for each of the pointers in the array. Consider the following program:

/**
* If less than 2 command line argument provided, print default bc def. Otherwise, print the arguments
* Example: ./a.out xyx ttyu     will output xyx ttyu
*          ./a.out     will output bc def
**/
int main(int argc, char *argv[]){
	if(argc<3){
		char ** my_argv= malloc(3*sizeof(char*));
		for(int i=0;i<3;i++)
			*(my_argv+i)= malloc((i+1)*sizeof(char));
		**my_argv='a'; // the name of the program
		**(my_argv+1)='b';
		*(*(my_argv+1)+1)='c';
		**(my_argv+2)='d';
		*(*(my_argv+2)+1)='e';
		*(*(my_argv+2)+2)='f';
		return main(3,my_argv);
	}
        for(int i=1;i<argc;i++)
		printf("%s ",argv[i]);
	return 0;
}

Last few words to make sure that you and I, are on the same page.

  • Multiple pointers can point to the same address and when they do changing a content using one pointer will change the content of the other pointer (obviously).
  • Assigning a pointer with another pointer will not replace the content of the first pointer with the content of the second pointer, the first pointer will now simply point to the address pointed by the 2nd pointer.
  • You must use the address return by malloc as the parameter of free that means you would normally use another temporary pointer to increment the address.
  • After you free, the allocated memory is released, however the content may still be there if the memory is not assigned to other processes. It may appear that free is doing nothing! (Some people think that assigning the pointer to NULL after free is a good programming practice, I have neutral feeling about this)

To summarize these points consider the following program.

int main(){
	char *s1=malloc(3*sizeof(char));
	*s1='a';*(s1+1)='b';*(s1+2)='c';
	char *s2=s1;
	const char *s3="wxyz";
	*s2='p'; //changes s1
	s2=s3; //does not change s1
	while(*s1) printf("%c",*s1++); //prints: pbc
	while(*s2) printf("%c",*s2++); //prints: wxyz
	while(*s3) printf("%c",*s3++); //prints: wxyz
	s1-=3; // go back to original address to free
	free(s1);
        while(*s1) printf("%c",*s1++); //may or may not print: pbc, don't be fooled, you should not do this. 
	return 0;
}

Let’s now write few functions using pointers that you may encounter in a C interview. The following function returns the length of a string:

int strlen(const char* s){
	const char *end=s;
	while(*end) end++;
	return end - s;
}

The following function concatenate a string with a source string and return the answer string:

/**
* Given enough memory available after dest.
* both dest and return have the answer.
**/
char *strcat(char *dest, const char *src)
{
    char *ret = dest;
    while (*dest) dest++;
    while (*dest++ = *src++);
    return ret;
}

The following function reverse a string:

void strrev(char* str){
	char* end=str;
	while(*end) end++;
	end--;
	char* start=str;
	while(start<end){
		char temp=*start;
		*start++=*end;
		*end--=temp;
	}
}

The following function returns the first duplicate character of a string:

/**
* Works for 0-127 ASCII string. Assuming, int is 16 bit.
**/
char firstduplicate(const char* str){
	int hash[8]={0,0,0,0,0,0,0,0};
	while(*str){ 
		int pos=*str;
		if(hash[pos/16]&(1<<(pos%16)))
			return *str;
		hash[pos/16]|=(1<<(pos%16)); 
		str++;
	}
	return '';
}

The following function removes duplicate characters from a string:

/** 
* Works for 0-127 ASCII string. In C, int is 16 bit. 
**/
void removedup(char* str){
	char* s=str;
	char* s1=str;
	int hash[8]={0,0,0,0,0,0,0,0};
	while(*s){
		int pos=*s;
		if(!(hash[pos/16]&(1<<(pos%16)))){
			*s1++=*s;
			hash[pos/16]|=(1<<(pos%16));	
		}
		s++;
	}
	*s1='';
}

The following program replaces every space of a string with %20:

char* encode(char * str){
    int count=0;
    int length=0;
    char* s=str;
    while(*s){
        if(*s==' ') count++;
        else length++;
        s++;
    }
    char * ret=malloc((3*count+length+1)*sizeof(char));
    s=str;
    char *r=ret;
    while(*s){
        if(*s==' '){ *r++='%';*r++='2';*r++='0';}
        else *r++=*s;
        s++;
    }
    *r='';
    return ret;
}

The following function checks if the one string is a sub-string of another:

int issubstring(const char* str,const char* sub){
	const char* s1=str;
	const char*  s2=sub;
	int count=0;
	while(1){		
		if(*s2=='') return 1;
		else if(*s1=='') return 0;
		else if(*s1==*s2){
			count++;
			s2++;
		}	
		else{
			if(count!=0){
				s1-=count;
				count=0;
				s2=sub;
			}
		}
		s1++;
	}
	return 0;
}

The following function checks if 2 strings are rotations of each other or not:

int isrotation(const char* s1,const char* s2){
	int l1=strlen(s1);
	if(l1!=strlen(s2)) return 0;
	char* t=malloc(2*l1*sizeof(char));
	strcpy(t,s1);
	strcat(t,s1);
	int r=issubstring(t,s2);
	free(t);
	return r;
}

The following program checks if 2 strings are anagrams or not:

/**
* Works for 0-127 ASCII string
**/
int isanagram(const char* s1,const char* s2){
	int hash[128];
	int i;
	for(i=0;i<128;i++)
		hash[i]=0;
	while(*s1) hash[*s1++]++;
	while(*s2) hash[*s2++]--;
	for(i=0;i<128;i++)
		if(hash[i]) return 0;
	return 1;
}

The final concepts I am going to cover are void pointers and pointer to function. A void pointer can be assigned with any type and vice versa. You have to be careful though, void pointers are normally fixed size (depending on the machine), so if you are assigning a type that is not same as the void pointers you may have unexpected results! However, if you assign any other pointers to void pointers and vice versa, you will be fine. Consider the following program:

void swap(void *v[], int i, int j){
	void *t;
	t=v[i];v[i]=v[j];v[j]=t;
}
int main(){
        int a[4]={4,6,1,3};
        char* b[4]={"4","6","1","3"};
        swap(a,0,1); // a is now {1,3,4,6}.
        swap(b,0,1); // b is now {"6","4","1","3"}, which is correct
        return 1;
}

When I call the ‘swap’ function with an int array in my machine, it swaps 2 items at a time, because void pointers are twice as big as int in my machine. In C, you can write pointers to functions, which can be assigned, passed to a function, return by a function and so on! If ‘comp’ is a function with 2 void pointers as parameters and returns an int, then you would declare it like ‘int (*comp)(void *, void*)’ and use it like ‘if((*comp)(2,3)<0){…}’. Remember that ‘int *comp(void *, void*)’ is a function that returns a integer pointer whereas ‘int (*comp)(void *, void*)’ is the pointer to a function! As an example, let’s write a quick sort implementation that takes the comparator function as its parameter:

int acomp(char* a, char* b){
	return a[0]-b[0];
}
int dcomp(char* a, char* b){
	return b[0]-a[0];
}
void quicksort(void *v[], int left, int right, int (*comp)(void *, void*)){
	int i, last;
	if(left>=right) return;
	swap(v,left, (left+right)/2);
	last=left;
	for(i=left+1;i<=right;i++)
		if((*comp)(v[i],v[left])&lt;0)
			swap(v,++last,i);
	swap(v,left,last);
	quicksort(v,left,last-1,comp);
	quicksort(v,last+1, right, comp);
}
int main(int argc, char *argv[]){
	int i;
	quicksort(argv,1,argc-1,acomp); //ascending
	for(i=1;i<argc;i++)
		printf("%sn",argv[i]);
	return 0;
}

Notice that I also write 2 comparators here. If the quicksort is called with acomp on some array of strings then the array will be sorted by the first character in an ascending order, however calling it with dcomp function will sort it in a descending order!

[Ref: Dennis Ritchie, Brian Kernighan, Gayle Laakmann McDowell]

Tagged: , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: