Notes on Coding Style for C Programming

A well-written computer program is more than just a debugged and tested program, it should also be readable to yourself and to others. Doing this requires a certain amount of discipline, but fortunately this discipline is not difficult to acquire. The purpose of paying attention to coding style is to make reading and fixing code easier to do, and the effort of writing using a good style makes it more likely your program will be correct in the first place.

Computer code doesn't need to look pretty, it just needs to look simple and easy to understand. A lot of students spend a lot of effort trying to make their code look pretty, but creativity in typography is often better redirected at making the code actually work well.

This set of notes aims to describe a simple, neat programming style which you should try to follow in your own coding for this course. You don't have to follow this style, as long as the other people in your programming team follow a common style as you; otherwise the code handed in by your team will be mismatched and more difficult to follow.

The C Programming Language (2nd edition) by Brian Kernighan and Dennis Ritchie, defines a neat coding style which we will follow here. We'll call this the Kernighan and Ritchie (or K&R) style.

The rest of this document will discuss the rules of this coding style, and the reasons why those rules help in writing code which is correct.

Names

A variable or function name should be sufficient to understand the meaning, while remaining concise. The shorter the lifespan or scope of a variable, the shorter its name should be; conversely, the more important a variable, and the more places it is used, the more descriptive that name should be.

Two important aims of good programming style are brevity and clarity. Poor choice of names can severely affect both of these. Consider the following code:

	void init (int elem[], int nelems)
	{
		int i;

		for (i=0; i < nelems; i++)
			elem[i] = i;
	}

Now compare to:

	void initialiseElementArray (int elementArray[],
		int numberOfElements)
	{
		int theElementIndex;

		for (theElementIndex = 0;
			theElementIndex < numberOfElements;
			theElementIndex++)
		    elementArray[theElementIndex] = theElementIndex;
	}

Loop index names such as i are so common that it makes little sense to change them. Because it is a local variable, there is no need for a more descriptive name. The longer a name, the less meaningful it can become, and the code will become harder to understand.

Similarly, function names can be used to benefit comprehension, or hinder it. Functions typically take the place of verbs in a programming language; they are "doing" words. As such, a good function name should suggest its function, and be unambiguous about its returned result:

	if (string_is_int("1234")) ...

Clearly, the above function returns a boolean result, because of the use of the verb "is" within its name. The string either is or is not a representation of an integer. Compare with this more ambiguous name:

	if (checkString("1234")) ...

Or this function name, which does not contain a verb:

	if (stringint("1234")) ...

Some texts use the strangeCapitalisation way of writing function names, variables, constants and class names. However, long before computers were invented, writing styles had evolved in the English language (and many others) to use capital letters to signify importance, and spacing to separate words. (Of course, pictographic languages such as Chinese use single characters to represent words, and hence have no capitals and little need for spacing to separate words). We cannot use spaces within variable or function names in most computer languages, but the next best thing is the underscore character:

	if (next_entry == maximum_value)
		return 1;

As with all things, underscores can be overused. Often, names will be short enough that there is no need to use spacing to separate words, as in many ANSI C library functions:

	str[i] = toupper(ch);

A single consistent style of naming will greatly help when others need to understand or maintain your code, so mixing styles should be avoided.

Indentation and Spacing

Tabs should be used to indent code, rather than spaces. Tabs usually represent eight spaces, which some programmers find inconvenient because it leads to code which is squashed on the right hand side of the page. Such code can be dramatically simplified by the good use of functions, or other improvements to the design. If you find yourself worried about tabs being too big, chances are your code needs to be redesigned to avoid nesting so many statements inside one another. Consider the following:

	for (i=0; str[i] != '\0'; i++) {
		if (str[i] == ' ') {
			if (strcmp(&str[i+1], "static") == 0) {
				...
			}
			else {
				if (strcmp(&str[i+1], "final") == 0) {
					...
				}
			}
		}
	}

This code is complicated and prone to error. A better approach would be to use a simpler structure for the if-statement:

	for (i=0; str[i] != '\0'; i++) {
		if (strcmp(&str[i], " static") == 0) {
			...
		}
		else if (strcmp(&str[i], " final") == 0) {
			...
		}
	}

Spaces should be used to separate components of expressions, to reveal structure and make clear intention. Compare

	for(wn++;wn<100;field='\0');
	 *i='\0'; return('\n');

with

	for (wn++; wn < 100; field[wn++] = '\0')
		;
	*i = '\0';
	return '\n';

Good spacing reveals and documents your intentions. Poor spacing can lead to confusion. This is another reason to use tabs to indent each line, because spaces can often be out by one, and lead to misunderstandings about which lines are controlled by if-statements or loops. An incorrect number of tabs at the start of line is easier to spot.

Comments

Comments, if used improperly, can make code very difficult to understand. Use comments sparingly, to document important parts of the code, and let the code speak for itself at other times. A comment at the beginning of each function is a good idea, and within the code comments are best kept short and used to document unusual things.

Consider the following switch-statement:

	switch (ch) {
		case 'd':
		case 'i':
			/* print the int */
			print_int(next_arg());
			break;
		case 's':
			/* print the string */
			print_string(next_arg());
			break;
		/*
		 *  Default:
		 */
		default:
			break;
	}
	/* return number converted */
	return num_converted;

All of the comments in the above code are redundant. The code is quite self-explanatory, and the comments just make it harder to follow what is happening. Make your comments count.

A good idea is to place one-line comments within a function slightly to the right of the code, using a single tab-stop. This keeps them out of the way if someone is just reading the code. Avoid trying to right-align comments however, since tab stops may vary from editor to editor, and your code will be hopelessly messy if you rely on a tab stop being a certain size.

Similarly, you should not rely on the width of your text editor as a guide to how many characters might be visible in other person's editor. While you may be able to resize your editor so you can write lines of code which span more then 80 characters, the default in many editors is only 80, so sticking to that limit will help others to read it. For this reason, using long comment lines of asterisks, hyphens or forward slashes to separate functions from each other can lead to unintentional formatting problems when a different editor views the code.

	/**********************************************
	****/
	/*  Sorting algorithm: based on qsort          
	   */
	/**********************************************
	****/

A suggested approach is to use multi-line comments to describe each function like so:

	/*
	 *  Sorting algorithm: based on qsort
	 */

and single line comments within the code:

	if (! isdigit(ch))
		return 0;	/* test failed */

A note about using the // comment sequence: The double-slash comment sequence is used in C++ and Java to write a comment which lasts until the end of the line. This comment sequence is not part of the ANSI C standard, although many C compilers allow its use. If you wish to write C code which may be easily ported to C++ or Java, the /* ... */ comment sequence is preferable since it is valid in each of those languages. That being said, there is a new ANSI C standard in the works, and the chances are good that // will become part of the standard soon.

Function Declarations

Declare function names, return types, parameter types and names all on the one line, if you can. This applies both to function prototypes within a header file, or within the code itself:

	static void print_int (int i)
	{
		printf("%d", i);
	}

If the line is too long, fold it in an appropriate place, and space the next line so the start of the function name can be seen:

	static int dispatch_instruction (Instruction *i,
			long pc, Process *p)
	{
		...
	}

Loops, Loop Tests and Parentheses

In general, you should use while or for loops in preference to the do ... while construct. For-loops are best for manipulating arrays and lists, but while-loops are almost interchangeable with them. The do ... while construct allows you to test at the end of one pass through the loop, but often this is too late:

	do {
		c = getchar();
		putchar(c);
	} while (c != EOF);	/* bug */

What happens if c == EOF ? A bad character is written to the output before the loop has a chance to test for this problem. Using a typical C idiom is better:

	while ((c = getchar()) != EOF)
		putchar(c);

Here, the assignment within the loop-test fetches the next character from the input, and a test is done to see if the end of file has been reached before the character is written to output. Note that in the above code the parentheses around the assignment are necessary since the following code differs dramatically in meaning:

	while (c = getchar() != EOF)
		putchar(c);

Since the inequality operator != has a higher precedence than assignment = in C, the second piece of code is the same as this:

	while (c = (getchar() != EOF))
		putchar(c);

That was probably not the intention. For this reason, it is a good idea to use parantheses around all expressions which mix assignment with equality operators, and indeed any bitwise or logical operators too. For instance, what is the following code meant to do? What does it actually do?

	long src1, word;

	src1 = word & 0xFF0000 >> 16 & 0xE0 >> 5;

Since the right-shift operator >> has higher precedence than the bitwise-and operator & the grouping of the above assignment is actually:

	src1 = word & (0xFF0000 >> 16) & (0xE0 >> 5);

To prevent such odd results, parentheses around each binary or unary operation in an expression will make the intention unambiguous. Parentheses also have another benefit. Java operators have a different set of precedence rules to C or C++, so if you want to write code which will be portable between these languages, parentheses within expressions can not only make your meaning clear, but prevent porting problems by ensuring the same interpretation in all languages.

Data Types and Typedef

The typedef instruction in C allows you to define new words in the language. For instance:

	typedef long  Register;

This makes the word Register mean a long integer, everywhere it is used in your code. Such definitions are best placed in a header file so that all source code files which rely on that header file have access to the same set of definitions. Using typedef is a good idea in case you need to modify your code later, since it is simply a matter of changing one line in the header file and all code which uses that definition will work differently after being recompiled. This makes modifications much easier than having to use an editor to replace all instances of a type throughout your code.

Structures are best defined using the typedef mechanism. It is possible to define a name using typedef before the structure is actually defined, which allows recursive definitions:

	typedef struct Employer  Employer;
	typedef struct Worker    Worker;

	struct Employer {
		int worker_count;
		Worker *worker_list;
	};

	struct Worker {
		int employer_count;
		Employer *employer_list;
	};

A structure is similar to a class in C++ or Java, except that it stores only data, not method functions. Structures should be used for all compound data types where related information needs to be kept together. Structures can be passed to functions or returned from functions, and even assigned to each other:

	Employer e1, e2;

	e1 = e2;

Global Declarations

Declare global variables at the top of a file of source code, so that they can be easily seen. Constants declared using const can also be placed there. A short comment next to each global variable may help readers understand your code.

It is often a good idea to avoid the use of global variables, if you can. Globals have two undesirable properties: they can cause side-effects in your code; and they can make your code difficult to run in multi-threaded applications, because simultaneous access to a single global variable may have unpredictable results. But sometimes, globals are simply the best way to solve a problem of sharing information between disparate parts of your code, so in these cases keep the definitions together in one place to make it easy to maintain them.

Enum and Constants

The enum construct is a neat way of keeping a set of integer constants bound together in the one definition:

	enum {
		MAX_SYMBOLS = 1023,
		TABLE_SIZE,
		BLOCK_SIZE = 4096
	};

In the above, TABLE_SIZE will be equal to the previous integer plus one since it is not assigned a value. It's a good idea to have some naming convention to distinguish constants from variables. One common naming technique is to use all capital letters; another is to just capitalise the first letters of each word. Decide on a scheme and stick to it.

Some constants are not integers, some are floating-point numbers, some are long integers. The only way to define them is to either declare them using the const keyword, or use the #define mechanism in the C preprocessor:

	#define WHITE 0x00FFFFFF
	#define PI    3.1415926

String constants should not be declared using #define, because the C preprocessor works by textual substitution, which means that you might end up with many copies of the string in your executable program after compilation. (Having many copies of an integer or floating point number is usually not a problem, since there are ways of encoding those numbers into the instruction stream in most executables, but strings can be much larger objects and will take up memory when the program is running). So, to declare a string, make it a variable inside one source file, but declare it as extern in a header file:

	/*
	 *  Header .h file.
	 */

	extern char * program_name;
	extern const long size;

The extern keyword means that the actual data value is defined somewhere other than in the header file, i.e. in a source code file somewhere. The linker will resolve the problem when linking all the compiled source code together into an executable program. So, in one of the source code files only, we write:

	/*
	 *  Source .c file
	 */

	char * program_name = "Gadget 1.0";
	const long size = 256;

Only #define , enum and extern data definitons should appear in header files, along with typedef, struct and function prototypes. Data such as strings, or global variables, should all be defined in source code files. Otherwise, every file which #includes a header file will end up with their own separate copies of those variables.

In the above example, the long int size was declared as const, and hence cannot be modified by the program at a later stage. In practise, you can use const just about anywhere you might want to use #define, and doing that is also in-keeping with the way C++ and Java define constants.

Static Objects

The keyword static has many meanings and uses in the C language. When applied to a variable, it means the variable is only visible within the current source code file being compiled. When applied to a function, it means that function can only be used within the current source code file. When applied to a variable inside a function, it means that the variable persists from one call of that function to next: it is global data masquerading as a local variable (a thing to be avoided).

The first two uses of the static keyword are quite useful and valid things to do:

	static int buffer_size = 1024;
	static char * buffer = NULL;

	static char * read_line(FILE *f)
	{
		if (buffer == NULL)
			buffer = malloc(buffer_size);
		...
	}

The above defines a function which can only be used within the current file, and which automatically creates a global character array called buffer on the first use of the function. If another file also uses a global variable called buffer there is no problem, because this variable was declared static, so there will be no linker conflict with the other variable. If neither were declared static there would be a conflict.

Because of this kind of linking problem, the Java language uses the static keyword to allow global data only within a class object. Thus, each global variable or constant exists within a certain class, and each class is uniquely identified, so linkage problems with variables or functions of the same name do not occur.

Error Checking

Check every return value from a system call which could cause problems. In particular, don't ignore the possibility of finding EOF when reading files, and assume that a filename might have been mistyped, so the file might seem to be nonexistent. The C function fopen returns NULL if it cannot open a file. The malloc function returns NULL if it cannot supply any more memory to your program.

If printf fails to work properly, there is not much you can do and it's something which almost never occurs. Ignoring the return value is quite commonplace in that circumstance. Similarly, ignoring the return value from fclose is also common, since there is nothing else which can be done to recover from that error, whereas the likelihood of fopen failing is quite high, and so this needs to be checked, and error messages given if there is a failure.

Avoiding Casts

Casting a value to another type can often conceal an error. For instance, the following code is valid but its behaviour is undefined:

	char * str = "1234"
	int i = (int) str;

The code casts str, which is a pointer to an array of characters, into an int. Some students of C erroneously assume this cast will fill the integer i with the number 1234, but this is not so. Instead, the integer will become equal to the memory address of the first character of the string, in other words, the pointer s converted into an integer. However, although this is an error, the compiler will not give a warning, since the cast is assumed to be intentional and correct.

There are almost no circumstances in which you need to use a cast expression. Avoiding casts has the practical advantage of allowing you to much more easily find errors such as the one just mentioned. Why bother to do this:

	(void) printf("Printing something.\n");

when the cast is redundant, and the following code works just as well but is less complicated:

	printf("Printing something.\n");

Idioms

Every language has its own ways of saying things, and this includes programming languages. Each language uses its own set of these "idioms", which arise because often there is one good way of achieving an effect.

For instance, reading characters can be achieved using this idiom:

	int ch;

	while ((ch = getc(stdin)) != EOF) {
		/* do something with ch */
	}

Or visiting every element of an array with this idiom:

	Element array[NUM_ELEM];
	Element e;
	int i;

	for (i=0; i < NUM_ELEM; i++) {
		e = array[i];
		/* do something with e */
	}

Idioms exist for using other data structures such as lists, queues, trees and so forth, and also for other activities such as allocating memory, handling command-line options, or manipulating strings. Learn to use such idioms, and recognise them in the code others write. When learning a new language, find out the techniques others use by finding well-written example programs.

Your code will become easier to understand, and you will build a repertoire of coding patterns which will help you solve problems in future.

Macros

Macro functions are C preprocessor "functions" which use textual substitution to achieve an effect similar to function calls. They have all kinds of potential side-effects and are generally a bad idea. They were one of the reasons why inline functions were invented in C++ to provide the same capabilities but with proper syntax. Consider the following macro function:

	#define lowbyte(a)  a & 0x00FF 

The intention is to return the least-significant byte from a. This might work in some cases, but not all cases. What happens if an expression is passed as a parameter to this macro?

	int x = 0x1200;
	int i = lowbyte(x | 0x0034) << 8;

The bitwise-or expression (x | 0x0034) should yield 0x1234 in hex. This is passed to the macro function which uses bitwise-and & to return the lowest byte, which should be 0x0034. This should then be left-shifted by 8 bits, to yield the result 0x3400. Is that what actually happens? Not at all.

Remember operator precedence? In C, the highest precedence of the above operators is left-shift, followed by bitwise-and, then followed by bitwise-or. So the expression above is actually expanded by the preprocessor at compile-time to mean:

	int x = 0x1200;
	int i = x | (0x0034 & (0x00FF << 8));

This means i is assigned the value 0x1200, which is just x. The problem could have been fixed by appropriate use of parentheses:

	#define lowbyte(a)  ((a) & 0x00FF) 

Note the need to parenthesise the parameter a. Even with parentheses, macro functions are a problem. If a macro uses a parameter more than once, textual substitution means that paremeter will be evaluated more than once at run-time:

	#define is_lowercase(c)  (((c) >= 'a') && ((c) <= 'z'))

	if (is_lowercase(getchar())) ...

The above macro uses the variable c twice, so the getchar function will be called twice every time the is_lowercase macro is used. These are dangers inherent in textual substitution. What is more, there already exists an ANSI C function to do this, called islower, which is available from the ctype.h header file, and is hence also a part of the C++ language! Why reinvent the wheel, and then get it wrong? (There is also another problem with the above code, in that the return value of getchar was not checked for EOF.)

The only reason people use macros is because they believe they will gain run-time speed, by avoiding the overhead in time of calling a function, but even this is misguided. In a recent test of the speed of a computationally intensive program, replacing a number of key functions with macros to determine if this would increase the speed of the machine actually decreased its speed. Why?

There are many possible reasons for the speed decrease. Here's one: CPUs these days have "on-board caches" which are small amounts of memory on the CPU chip, used to store data and instructions. Because it is physically located on the chip, accessing these portions of memory are very fast. Using macro functions means the macro code is expanded inside every line which uses that macro, at compile-time. This means the compiled program is larger, and fewer instructions can fit inside the on-board cache. Hence, the processor must fetch more instructions from slower RAM memory as it is running.

The bottom line is: don't bother using macro functions. Even inline functions in C++ may slow down your program. In general, allow the compiler to make decisions about how to speed up the code. Once you have your algorithm and data structures optimised, turn on compiler optimisations, and function calls will become very fast. The compiler knows the intimate details of how to obtain speed from a given chip-set. Let it do its job.

Another reason to avoid macros, and the C preprocessor as much as possible? Designers of new languages such as Java have realised the inherent dangers of some of these powerful but dangerous features, and created languages without them. Writing code using them is thus highly language-specific, and more difficult to port to other languages. Function calls and the const keyword can essentially replace all uses of the C preprocessor, and make your code less prone to errors.

Better language design can help you avoid some common pitfalls, but there are many cases where the quality of your coding style will ultimately decide how bug-free and maintainable your programs will be. A neat style, writing concise code, using consitent naming and typical idioms can go a long way towards achieving those goals.