A Bounds Checking C Compiler
By
Richard W.M. Jones
(rj3@doc.ic.ac.uk)
Supervisor
Paul Kelly
(phjk@doc.ic.ac.uk)
Second marker
Nuranker Dulay
(nd@doc.ic.ac.uk)



Abstract

This report describes in detail how array bounds and pointer checking were added to the GNU C compiler.
[...expand to 1/2 a page...]

Acknowledgements



Contents

	1	Introduction
	2	Background
	3	Implementation
	4	Performance
	5	Conclusions and Future Enhancements
	B	Bibliography
	A.?	Appendices
	U	User Manual



1. Introduction

The C language is famous for allowing the programmer freedom and speed at the expense of safety. While Pascal goes to great lengths to check your program at compile time and at run time, C makes only minimal compile time checks (which you may override) and provides no run time checking at all. Naturally, then, C compilers are smaller and faster than Pascal compilers, and produce faster code. However, it's clear that the extra checking that Pascal compilers provide would be nice in C: developers could then have more confidence that their code was correct, and could later turn off checking to gain the extra speed. With C becoming the most popular language in modern operating systems, correct C code is becoming more and more necessary.

This project adds the safety features of Pascal to the GNU C compiler. In addition, it will allow programmers to mix their own checked code with libraries that were not compiled with checking. This is a particularly important feature, since programmers often do not have the source code to commercial libraries, and so cannot recompile them with checking. The changes made to the GNU C compiler to do this were quite minimal (only a few thousand lines of code were added to the 500,000 lines of source).

[...]

2. Background

Operating systems that run in protected mode, such as Unix, Windows NT and OS/2, already provide a very crude form of checking. Such operating systems, for security and reliability reasons, separate processes from each other. So if one process accidentally or deliberately tries to read or write the memory of another process, the first process is usually prevented from doing this - often the first process is killed outright. In a sense, then, there is a primitive form of checking going on. A process that has crashed, that is stopped behaving as it is suppose to behave, will usually start to read or write a location in memory which is beyond the space allocated for it, and as a result will be killed by the operating system. When this happens, Unix, for instance, gives the message
	Segmentation fault
and the program is killed. Other operating systems, Windows NT for instance, give slightly more information: register contents or the violation address. This message tells the programmer that the program definitely is faulty, but gives no guidance as to where the fault might lie. The programmer needs to know which line the program failed at, and what object the program was trying to manipulate at the time. In addition, such checking is extremely crude, and only finds the worst errors. For instance, the following C program runs without giving any error message on at least one Unix system.

	1	#include <stdio.h>
	2	#include <stdlib.h>
	3
	4	int p[10];
	5
	6	main ()
	7	{
	8	  int i;
	9
	10	  for (i = 0; i < 100; ++i)
	11	    printf ("%d\n", p[i]);
	12	}

The program should, of course, fail at line 11 when it tries to read element p[11]. If we look at the memory layout of this program on the particular operating system that it is running on, we see at once why it didn't fail.

	+--------------+----------------------------------+
	| program code | p[0..9] ///// spare memory ///// |
	+--------------+----------------------------------+
	0	      8192                              16384

This particular operating system allocates program sections in multiples of 8K. So all addresses between 0 and 16383 are allowable. The expression 'p[99]' corresponds to the 4-byte integer at address 8488. The operating system considers this to be a valid address. On another operating system, this program might or might not fail.

The following program is also wrong, but will run on just about every version of Unix without a hitch.

	1	#include <stdio.h>
	2	#include <stdlib.h>
	3
	4	int p[10], q[10];
	5
	6	main ()
	7	{
	8	  int i;
	9
	10	  for (i = 0; i < 20; ++i)
	11	    printf ("%d\n", p[i]);
	12	}

Again, it should fail at line 11, trying to read p[11]. But in fact, it will happily read the next array in memory, q.

These unsafe behaviours stem from the fact that C does not
	. check that the address it is about the read from or write to lies within the program's allocated memory, or
	. enforce a strong separation between separate entities in memory (such as arrays p and q in the example above).

Hardware is available which will accomplish the first of these safety goals. For instance, the Intel 386 family is able to check that a pointer lies between addresses 0 and N where N is any arbitrary byte address. This support is not universal though. Other processors are only able to check pointers to the nearest page boundary. The second goal is much more difficult to achieve in hardware. The Intel 386 can be persuaded to do it provided that you have fewer than 16K memory objects, and the compiler is prepared to reload segment selectors every time you reference a different object. Reloading a segment selector is very slow on a 386, and only 4 data selectors can be loaded at once. The 16K limit is too restrictive for programs of any significant size. In addition, objects would have to be placed in separate 4K pages, so small objects would waste huge amounts of memory. Other processor families are not even capable of this level of checking.

With the hardware needed not widely available, we are forced to do the checking in software.

There are several possible strategies for adding bounds checking in software. We chose one particular method, which I'll explain more fully in the main part of this report. But there are other possible approaches. I'll describe some other possible methods, all of which we rejected for one reason or another. I'll also consider performance.

A simple analysis of the problem seems to conclude that we must, somehow, keep track of:
	. all memory entities, which from now on I will call objects, and/or
	. all pointers to objects.

Keeping track of all pointers to objects is an attractive idea, but in the end we had to reject it. There are several ways this might work, which I will now describe.

2.1 Keep boundary information along with the pointer.

In this scheme, we replace the simple 4-byte [footnote: we are assuming a 32-bit machine here] pointer with a structure like this one (in C++ notation):

	template <class T> typedef struct {
	  T *pointer;		/* The original pointer value. */
	  T *base;		/* The base of the object it points to. */
	  T *extent;		/* The limit of the object it points to. */
	  /* Other information ... */
	} _pointer_t;

We now alter the compiler, so that pointer types are substituted with _pointer_t wherever they occur. For instance, the following program:

	1	#include <stdio.h>
	2	#include <stdlib.h>
	3
	4	main ()
	5	{
	6	  int i[10], *p;
	7
	8	  for (p = &i[0]; p < &i[10]; ++p)
	9	    *p = 0;
	10	}

would be compiled as if it were:

	1	#include <stdio.h>
	2	#include <stdlib.h>
	3
	4	main ()
	5	{
	6	  int i[10];
	7	  _pointer_t<int> p;
	8
	9	  p.pointer = &i[0];
	10	  p.base    = &i[0];
	11	  p.extent  = &i[10];
	12	  while (p.pointer < &i[10]) {
	13	    if (p.pointer >= p.base && p.pointer < p.extent)
	14	      *(p.pointer) = 0;
	15	    else
	16	      bounds_error (...);
	17	    ++(p.pointer);
	18	  }
	19	}

Briefly, at line 7, we declare a large checked pointer-to-int p. The assignment 'p = &i[0]' is replaced with a more complex C++-like constructor (lines 9-11), so that the pointer becomes intelligent and knows the base and size of what it points to. Accesses through the pointer, such as '*p = 0' are replaced by checking code (lines 13-16). Simple pointer operations, such as '++p', which don't access memory stay essentially unchanged (line 17).

There are several reasons why we rejected this approach. Firstly, GCC likes pointers to fit into registers. These large checked pointers are 12 or more bytes long, and so don't fit into a register. Making GCC understand such a large pointer would be very difficult - requiring extensive changes to all parts of the compiler.

Secondly, the size of static objects is not always known at compile time, particularly if those objects are declared as external. For instance, suppose that we declared 'int i[10]' in the program above as 'extern int *i', ie. refering to a global array in another file called 'i'. The program would still be correct, but GCC would have no way of knowing how many elements were in 'i', and could not build the assignment to 'p.extent' (line 11) correctly. (A way to overcome this problem, incidentally, is to store the size of the static object along with the object, say as the first word of the object.)

Thirdly, these pointers are incompatible with code compiled by other compilers for the same target machine, and with code compiled by GCC with bounds checking switched off. For instance, on a typical 32-bit target machine, the following structure is 12 bytes long with bounds checking off, but 28 or more bytes long with the above bounds checking scheme:

	struct tree {
	  struct tree *left;
	  struct tree *right;
	  int datum;
	};

Checked and unchecked code cannot be mixed with this scheme. So on a Sun, for example, the supplied C library could not be used, since it is compiled using Sun's own version of the Portable C Compiler ('cc') and the source code is not available.

2.2 Replace the pointer with a 4-byte identifier.

The second scheme we considered is a modification of the first one, designed to overcome some the difficulties with the first. Instead of keeping the bounds information along with the pointer itself, we keep the same information in a global table, and replace the pointer with an offset into this table.

Suppose, then, that we have the following array and pointer into that array:

	int i[10], *p = i;

Suppose that array 'i' is located at address 1000 in memory. Normally in C, pointer p would start with value 1000. Instead, we build a global table of 'known pointers'. This table might look like this:

			+-----------------------+
	offset 0	| reserved for 'NULL'   |
			| pointers              |
			|                       |
			+-----------------------+
	offset 1	| pointer value    1000	|
			| base of object   1000	|
			| extent of object 1040 |
			+-----------------------+
	offset 2	| pointer value    1020 |
			| base of object   1000 |
			| extent of object 1040 |
			+-----------------------+
	offset 3	|	etc.		|

We now give pointer 'p' the value 1. References to pointer 'p' go via the global table of pointers, so 'p++' would increment the pointer value in the table, not the pointer 'p' itself. If we copy a pointer, say 'int *q = p;', we would have to create a new entry in the table, rather than just copy the representation of 'p' itself.

GCC would be more amenable to such a scheme, since the size of the offset is the same as size of an ordinary pointer. These 'pointer identifiers' easily fit into a register, and the size of structures doesn't change.

There are, unfortunately, many problems, which was why we rejected this idea.

Firstly, checked and unchecked code can still not be mixed. We would have to ensure that unchecked code never saw a 'pointer identifier' but only saw a true pointer. The following code, compiled in checking mode, would fail:

	1	#include <stdio.h>
	2	#include <stdlib.h>
	3
	4	extern char *p;
	5	char *q;
	6	extern void unchecked_fn (char *);
	7
	8	main ()
	9	{
	10	  unchecked_fn (q);
	11	  p[0] = 0;
	12	}

Assume that 'unchecked_fn' is a function in an unchecked part of the program, and 'extern char *p' (line 4) refers to a pointer object that is also unchecked. Line 10 fails, since 'q' is an offset into a table, but 'unchecked_fn' is expecting a true pointer. Line 11 fails, since 'p' is a true pointer, but we are expecting an offset into the global table.

We might overcome this first problem by the following scheme, but it is tricky to implement in practice, and we did not pursue it:

	i. Create a list of all global functions in all unchecked libraries before you start.
	ii. Create a list of all static objects in all unchecked libraries.
	iii. Before compiling a checked file, load in both these lists, so we know where all the unchecked code is.
	iv. When calling an unchecked function, replace pointer identifiers in the arguments with actual pointer values.
	v. If a function returns a pointer, make a new entry for it in the global table, and mark it as 'cannot be checked'.
	vi. References to unchecked objects through pointers are compiled normally, without pointer checking.
	vii. Make all global checked pointer variables private, so they cannot be seen by any unchecked code. [footnote: This is not quite as restrictive as it may sound. In a well structured program, libraries ought not to alter the value of global variables in the application.]

A second, and more serious problem, affects all schemes that try to track all the known pointers in a C program. Clearly, every single pointer in the program's store must have a separate entry in the global table. If this were not so, then altering the value of one pointer, would affect the apparent value of another pointer, as in this fragment:

	1	int i[10];
	2	int *p = i;
	3	int *q = p;
	4	p++;

At line 2, a new global table entry is created for 'p', say entry no. 1. At line 3, 'p' is copied into 'q', and we do this by copying table entry no. 1 to a new entry, say entry no. 2. Now, line 4 increments table entry no. 1, but doesn't change entry no. 2. (Otherwise 'q' would appear to be incremented, which is not what we want).

The problem comes about because we copied a pointer (line 3). C lets us copy pointers in many different ways. For instance, we could replace line 3 with the following:

	1	int i[10];
	2	int *p = i;
	3a	int *q;
	3b	memcpy (&q, &p, sizeof (int *));
	4	p++;

Now, at line 3b, 'q' has the value 1, the same as 'p', and line 4 fails to work as expected.

In general, these scheme would not allow us to use 'memcpy' on, for instance, arrays of pointers, or arrays of structures containing any pointers, which is a frequent technique that C programmers use.

2.3 Other schemes that keep track of all pointers in the program at run time.

This last problem certainly affects all schemes which attempt to track pointers at run time. Since C lets us copy pointers with 'memcpy' or even save them onto disk and reload them (software VM schemes do this), we cannot hope to track all pointers. The benefits of such a scheme would be immense. We could, for instance, implement efficient garbage collection and memory defragmentation if we knew where all the pointers were.

We abandoned schemes to track pointers in favour of schemes to track objects.

2.4 Keep track of all memory objects.

The scheme that we finally settled for tracks objects in memory, not pointers to those objects. Pointers, in fact, stay the same, but when we come to use them, we look up what object they point to.

For example, given the following declaration:

	int i[10], *p = i;

and supposing that array 'i' is at locations 1000-1039 in memory, 'p' will start off with the value 1000. When we come to use 'p', as in the following statement:

	p[23] = 5;

we look up 'p' in our table of objects, and find that it points somewhere into the object 'i':

		    1000            1040
	- - - - - --+---------------+-- - - - - -
	            | i[0] ... i[9] |
	- - - - - --+---------------+-- - - - - -
		     ^
		     |
		     p

Now, we know the limits of 'p', so when we come to add offset 23*4 to 'p', we notice that 'p' has gone out of bounds, and we can flag the error. [footnote: In fact, the error is signalled some time later, but I will discuss this in more detail in the main chapter of this report.]

In fact, the way GCC is written makes it quite simple to alter array references and pointer arithmetic in this way. The main difficulties lay in finding out where all the objects are to start with, and efficiently looking up a pointer to find what it points to. How these were done I will describe in the main body of the report. First, however, I will quickly review some other bounds checking programs, and how they work.

2.5 Alternative approaches to altering GCC.

There are a number of completely different approaches to the problem that I have not followed up in any detail, but I will describe them quickly here. Whereas in the preceeding sections, I dealt with different ways to alter the program at the C language level, here I will discuss methods that work at lower levels: the intermediate code level and the assembler level.

2.5.1 Altering GCC's intermediate level code.

Instead of fiddling with the abstract syntax tree to rewrite C expressions as other C expressions, we might just as well fiddle with GCC's intermediate representation (RTL). There are certain advantages to this approach, and some disadvantages too.

Advantages:
	. RTL is language independent. You would effectively be adding bounds checking to all language front-ends supported by GCC.
	. By delaying inserting bounds checking until late in the optimization process, unnecessary reads and writes would have disappeared, and so not be checked. Alternatively, by inserting bounds checking early, the optimizer could do a good job of reducing the work of bounds checking.
	. RTL is extensively documented in Info pages, unlike the rest of GCC.

Disadvantages:
	. Static object information is not present in the RTL. You may be able to capture this in the body of GCC (eg. in 'assemble_variable' in varasm.c), or you may still need to alter the language front end to find this information.
	. Stack object names disappear, along with direct information about stack lifetimes.
	. Type information disappears: pointers turn into 32-bit integers, for instance.

2.5.2 Altering the object files ('*.o') containing assembler code.

Two programs, Checker and Purify, go a step beyond RTL and modify the assembler code directly. They do not provide strong separation between objects, and so the technique could not be considered for this project. Nor are they readily portable between processor architectures. In sections 2.6 and 2.7 below, I will describe both of these programs.

2.6 How 'Checker' (C) 1993, 1994 Tristan Gingold <gingold@amoco.saclay.cea.fr> works.

Checker, at version 0.6 at time of writing, also starts with a modified GCC, but works in quite a different way from any of those described above. It provides very fine-grained checking (for instance, it allows you to check that you don't read from an uninitialized array element down to the level of individual bytes in that array). It has a very advanced malloc/free library, which, for instance, checks that you don't free a pointer that has already been freed, and that you don't read or write to memory that has been freed. It also works with C++.

The disadvantages are that it doesn't provide strong separation between adjacent memory objects. You can still happily increment a pointer beyond the edge of one memory object into another. There is a 25% memory overhead for all static and dynamic data. It is also limited to 386 family systems running Linux. Why this is so will become clear.

It works like this: The compiler part of GCC ('cc1' and 'cc1plus') stay the same, but the GNU assembler has been altered. Whenever the assembler builds a memory access, it inserts a call to a checking function in front of that instruction. For instance, the following example is from the Checker manual:

	movl -8(%ebp),%edx

is replaced by:

	pushl %eax		; Save a register
	leal -8(%ebp),%eax	; EAX := Address to check
	call chkr_1_6_1_4_chkr	; Call a function to check this access
	movl -8(%ebp),%edx	; The original instruction
	popl %eax		; Restore EAX register

The function 'chkr_1_6_1_4_chkr' is an assembler stub to a C function. About half of all the i386 instructions assembled are loads and stores and are rewritten like this. The processor overhead running Checker-compiled programs is quite large.

Each memory address has two bits in a bitmap associated with it. The bits are used to determine whether reading and writing may go ahead, and detect reads from uninitialized memory.

The malloc/free library replaces the ones in the C library. Freed memory has its bits changed in the bitmap so that attempts to access it fail with appropriate error messages. In addition, freed memory ages, and eventually can be recovered and reused by the system.

A simple garbage detector is also included. When you call the garbage detector, it searches through the current registers and all memory, looking for anything at all that looks like a pointer. (Integers, pointers and general rubbish are not distinguished by this search). Furthermore, pointers that might have been saved to disk, or sent over the Internet and so on will not be found, and such memory will be marked as garbage. Nevertheless, you can be pretty sure that if Checker's garbage collector spots memory that is allocated, but not referenced, then you do have a memory leak, though Checker does not guarantee to find memory leaks by any means.

Checker requires that you recompile your C and X libraries, but this is not a problem under Linux, since all software for this operating system is free.

Overall, Checker is an interesting and effective tool, but unfortunately does not fulfil two important criteria for this project, namely
	. portability to all GCC platforms, and
	. ability to work with existing C and X libraries.

2.7 Purify (R) from Pure Software Inc.

Purify works to all intents and purposes like the free program Checker described in the previous section. As I mentioned, such programs are very architecture specific. Purify has been implemented for Sparc and HP-PA architectures, but no others.

The manual for Purify mentions that Purify'd programs run 2 to 5 times more slowly.

2.8 DOS checking programs.

MS-DOS has no memory protection at all, so buggy C programs can - and do - overwrite the operating system, interrupt vectors, and so on. One particular DOS checking program [who, where?] provides the same hardware protection as Unix, ie. it ensures that the program will not access memory outside its own memory, but does nothing more. I have not seen any DOS programs that do more than this.

3. Implementation

3.1 Overview

I have already described in this report the many decisions that myself and my supervisor, Paul Kelly, made as to how to proceed. By an early stage, we had decided that the project ought to accomplish the following goals:
	. require no change to the program's source code,
	. work with mixtures of checked and unchecked code, including commercial libraries (which are impossible to recompile), and
	. warn of all violations in checked code, but not give "false positives".

We chose to track memory objects on the stack, heap and in static areas. We store the objects in address order in a binary search tree. Given a pointer, we look up the corresponding object in the tree, and decide whether arithmetic on the pointer is valid. To minimise the time spent searching the tree, several techniques are used, which I will describe later.

To track pointer arithmetic, we modified pointer operations in GCC to include extra checking code. For instance, when the parser meets the following expression in a C program:

	char *p, *q;
	int i;
	q = p + i;
	    -----

instead of building the bit of abstract syntax tree describing 'pointer plus integer', it builds a call to a function '__bounds_check_ptr_plus_int' (which may later be inlined) that checks the operation first.

The second change made to GCC, which turned out to be far more difficult, was to track the creation and destruction of all sorts of memory objects. Easiest are heap objects, which are created with malloc and destroyed with free. I rewrote the respective library functions to change the object tree as appropriate. Stack objects proved somewhat more difficult. Eventually, I found that using the C++ constructor/destructor mechanism (built into GCC, though not used) allowed me to gain control when a stack object enters or leaves scope. For instance, in the following diagram, I show the creation and destruction of some stack objects:

	f (int *i)
	{
	  int n;			| i, n
	  {				|
	    char *p = "hello, world!";	|	| p
	    puts (p);			|	|
	  }				|
	  for (n = 0; n < 10; ++n)	|
	  {				|
	    int m = 9 - n;		|	| m
	    printf ("%s\n", i[m]);	|	|
	  }				|
	}

The situation is complicated further by use of 'goto' or 'break' or 'return' which allow a scope to be entered or left arbitrarily.

Static objects are the third storage class that we track. They may occur in a number of different places, and tracking them proved to be quite difficult. They can also appear as a side-effect of initialization. For instance, the declaration in a function of:
	char *p = "hello, world!";
declares a stack object (p) and a static string.

A number of other minor changes to GCC were made. For instance, I changed the executive program ('gcc') so that it recognized the flag '-fbounds-checking' and linked with the correct library as a result.

3.2 Alterations to GCC pointer arithmetic

I altered the way the C front end parsed several pointer operations. For example, ordinarily a 'pointer to double + integer' encountered in the source file will generate the following abstract syntax tree:

		PLUS_EXPR
		/	\
	 POINTER	 MULT_EXPR
			 /	 \
		  INTEGER	  INT_CONST 8

In other words, at run time, the pointer in added to the integer times 8 (the size of doubles). In bounds checking mode, we instead construct the expression:

	(double *) __bounds_check_pointer_plus_int (pointer, integer, 8, 1, "filename", line)

This function (which may be later inlined) looks up 'pointer' in the table of objects. Having found it, it will check that adding 8 x integer is a valid thing to do, ie. that it won't go over the end of the object too far. If all is well, it returns the new pointer. A number of possible errors may be generated:
	. pointer is NULL or ILLEGAL, so pointer arithmetic is undefined
	. the pointer points to a freed heap object
	. the pointer points to a stale stack object
	. the pointer may end up pointing before the object or after the last byte + 1 of the object

In the last case, the function doesn't stop with an error, but returns 'ILLEGAL' (normally defined as -1). It is valid to generate such illegal pointers, but you may not use them later.

The parameters 'filename' and 'line' passed to __bounds_check_pointer_plus_int reflect the current source file and source line, and let us print meaningful error messages.

The following operations are intercepted in this way:

operation		function substituted
pointer + integer	__bounds_check_pointer_plus_int
pointer - integer	__bounds_check_pointer_plus_int (4th argument 0)
*pointer (dereference)	__bounds_check_reference	(1)
array[index]		__bounds_check_array_reference
pointer - pointer	__bounds_check_ptr_diff		(2)
pointer < pointer	__bounds_check_ptr_lt_ptr*	(2)
pointer > pointer	__bounds_check_ptr_gt_ptr*	(2)
pointer <= pointer	__bounds_check_ptr_le_ptr*	(2)
pointer >= pointer	__bounds_check_ptr_ge_ptr*	(2)
++pointer		__bounds_check_ptr_preinc*	(3)
--pointer		__bounds_check_ptr_predec*	(3)
pointer++		__bounds_check_ptr_postinc*	(3)
pointer--		__bounds_check_ptr_postdec*	(3)
truthvalue of pointer	__bounds_check_ptr_true*	(4)
!pointer		__bounds_check_ptr_false*	(4)
pointer == pointer	__bounds_check_ptr_eq_ptr*	(5)
pointer != pointer	__bounds_check_ptr_ne_ptr*	(5)

Notes:

* These functions have not yet been implemented (19/2/95).
(1) References to NULL and ILLEGAL pointers are not allowed. Nor are references which are not aligned to the fundamental granularity of the type of object.
(2) Pointers in pointer difference and comparison must point at the same object.
(3) Pointers may not be incremented or decremented outside the object. Such pointers become ILLEGAL.
(4) You may not determine the truthvalue (or falsity with '!') of an ILLEGAL pointer, or one that points to a stale stack object or a freed memory object.
(5) These check for ILLEGAL pointers and pointers to stale objects.

3.3 Making GCC track memory objects

3.3.1 Heap

All heap objects are allocated with malloc or realloc and destroyed with free, and so tracking them proved no problem. I wrote enhanced malloc/realloc/free functions that operate in two modes. In one mode, the functions always allocate from an fresh area of memory, and never free memory up as they go along. This mode allows us to reliably check for stale pointers - that is, use of pointers to memory that has been freed. There is a danger, if we reuse memory, that a pointer to an old area of memory might be used without warning on newly allocated memory. In the second mode, we reallocate memory as per the old malloc, but we can optionally 'age' blocks of memory. I borrowed this idea from Checker. A block which has been freed ages until it is actually reused. This makes checking for stale pointers slightly more robust.

The following fragment which uses a stale pointer will always flag an error in the first mode (at line 15), but may or may not in the second mode:

	1	struct _list {
	2	  struct _list *next;
	3	  int datum;
	4	};
	5
	6	int *
	7	move_list_to_array (struct _list *p)
	8	{
	9	  int *a = NULL, c = 0, d;
	10
	11	  while (p) {
	12	    d = p->datum;		/* extract data item */
	13	    free (p);			/* free list element */
	13	    a = realloc (a, c++ * sizeof (int)); /* grow array */
	14	    a[c] = d;			/* add data item */
	15	    p = p->next;		/* next list element */
	16	  }
	17	  return a;
	18	}

3.3.2 Stack

Stack objects present the greatest potential difficulties to tracking memory in C, but in fact, GCC made this rather easy. The difficulty arises when programmers start to use 'goto', 'break' or 'return', all of which upset the flow of execution sufficiently to make tracking difficult. For instance, consider the lifetime of the stack object 't' in the following fragment:

	1	switch (i) {
	2	  int t;
	3	case 0: case 1:
	4	  t = i; i = j; j = t;		/* swap i, j */
	5	  break;
	6	case 2:
	7	  j = i;
	8	  break;
	9	default:			/* i is invalid => error */
	10	  return -1;
	11	}

Whilest 't' is live between lines 2 and 10, normal flow of execution enters and leaves this block at no fewer than six different points. We have to add 't' to our tree of objects at lines 3, 6 and 9, and at lines 5 and 8 the object must be deleted. At line 10, we delete not only 't', but 'i', 'j' and any other variables declared so far in the function.

Luckily, GCC takes care of most of the complexity, since the C++ constructor/destructor mechanism has similar problems. Regrettably, the stricter C++ rules concerning the use of 'goto' will now apply to bounds checked C programs. In Tk3.6, a 65,000 line program, two uses of 'goto' violated the stricter rules, and a couple of simple changes needed to be made.

For stack objects, we build a C++ constructor and destructor. The constructor is executed when the variable comes into scope. It adds the object to the tree of objects. When the variable leaves scope, the destructor is executed and it deletes our record of the object.

Destructors presented no problem. When a variable goes out of scope, we tell GCC to call '__bounds_delete_stack_object', passing a pointer to the object. This function may later be inlined.

Constructors are implemented in the same manner as C initializers. Thus the 'constructor' of:
	int i = 5;
is the expression 'INTEGER_CONST 5', and the same mechanism is used for C++ constructors. We aimed to add an initializer (constructor) to uninitialized variables, and change the initializers for initialized variables so they evaluated to the same value, with the side effect of calling our own function.

For example:

Original declaration	Equivalent declaration after bounds checking
int i;			int i = (__bounds_add_stack_object (&i, ...), 0);
int i = 5;		int i = (__bounds_add_stack_object (&i, ...), 5);
int i[10];		int i[10] = {(__bounds_add_stack_object (i, ...), 0)};
int i[10] = {5, 6, 7};	int i[10] =
				{ (__bounds_add_stack_object (i, ...), 5),
				  6, 7 };
struct _t i[10];	struct _t i[10] =
				{ { (__bounds_add_stack_object (i, ...), 0) } };
etc.

As you can see from the examples, in aggregate types (arrays, structures and unions), we search iteratively down to the first non-aggregate member and we add or replace that initializer.

There are two points to note. Firstly, all uninitialized stack declarations become initialized. This has several unintended side effects [that will need to be resolved ...]:
	. '-Wuninitialized', the GCC option that warns if an uninitialized variable is used, will have no effect,
	. all variables will contain zero, even if they aren't initialized,
	. C++ rules for 'goto' don't allow jumps into a binding contour after an initialized variable.*

*Mostly this restriction has no effect. However, there is one case which is fairly common and which is now disallowed:

	1	switch (c) {
	2	  int t;	/* local variable here, becomes initialized */
	3	case 0:		/* jump into scope after initialized variable */
	4	  ...
	5	}

Such cases may be rewritten as follows:

	1	switch (c) {
	2	case 0: {	/* OK: jump doesn't violate any rules */
	3	  int t;	/* temporary variable used only in this case */
	4	  ...
	5	}
	6	/* other cases here ... */
	7	}

A second issue to note is that all initializers become non-constant, therefore usually non-static. For instance, the declaration:

	int i[10] = { 5, 6, 7 };

is normally implemented by GCC by writing a 3 word array (5,6,7) into static data, then copying this array into 'i' each time the variable is initialized. This is relatively efficient. GCC may not be able to make such optimizations with bounds checking switched on, since the initializer expression becomes non-constant. GCC may resort to another, less efficient way of initializing this variable. This hardly matters in performance terms (the overhead of adding an object to the tree outways other considerations), but may come as a surprise if you were expecting to see arrays reproduced verbatim in the 'a.out' file.

3.3.3 Static

Static objects proved to be the hardest objects to find. They can occur in many different contexts. The following examples all contain static data in one form or another:

(1)	1	extern int errno;	Static, in unchecked external library.

(2)	2	f ()
	3	{
	4	  static int init = 0;	Static, in function.
	5	  ...
	6	}

(3)	7	int data[] = {0,1,2};	Single file-scope static object.

(4)	8	g ()
	9	{
	10	  char *p = "hello!";	String is static data, p is stack object.
	11	  ...
	12	}

(5)	13	h ()
	14	{
	15	  char s[] = "hi";	String is copied onto the stack from static space each time.
	16	  ...
	17	}

(6)	18	char *names[] = {	Seven separate static objects in all.
	19	  "fred",
	20	  "john",
	21	  "peter",
	22	  "jane",
	23	  "mary",
	24	  "judith",
	25	  NULL
	26	};

(7)	27	char *duplicates[] = {	Duplicates are optimized out in C: only two objects here.
	28	  "mark", "mark",
	29	  "mark", "mark",
	30	  NULL
	31	};

(8)	32	char *p = "peter";	These two declarations contain three objects, one (the string) shared.
	33	char *q = "peter";

etc.

We can classify these different variations into three distinct problem areas. Firstly, at load time, we need to identify all the static objects (somehow) and build them into the initial object tree. For example, in the following code:

	1	#include <stdio.h>
	2
	3	int i;
	4	char *p = "inefficient hello, world\n";
	5
	6	main ()
	7	{
	8	  for (i = 0; p[i]; ++i)
	9	    putchar (p[i]);
	10	}

before main () is called, we need to find the location of the three static objects declared at lines 3 and 4 and build them into the initial tree. When I say 'before main ()', I mean either at load time, or at link time, even, perhaps, during compilation.

Unfortunately, we can't know for sure the exact location of each static object until after the program has loaded, especially in the case of link-loading as done under, for example, SunOS.

The second problem is that initializers on static and dynamic objects usually imply static data, and this static data lacks an explicit pointer. For instance, a GCC declaration like:

	char *strings[] = { "tom", "dick", "harry", NULL };

will be translated into assembler code like this:

	1	.data		; strings array goes into writable data segment
	2	.globl _strings
	3	  .long LC5	; pointers to the initializers
	4	  .long LC6
	5	  .long LC7
	6	  .long 0
	7	.text		; initializers go into read-only text segment
	8	LC5:		; 'LC5' etc. are unknowable identifiers
	9	  .ascii "tom\0"
	10	LC6:
	11	  .ascii "dick\0"
	12	LC7:
	13	  .ascii "harry\0"

Not only are there four separate objects that need to be notified, but the objects fall into two segments - the static object and initializers are not even in a contiguous piece of memory (the GCC option '-fwritable-strings' forces the initializers into the data segment, but this is undesirable for other reasons). We cannot take the address of the compiler generated labels 'LC5' etc. very easily. Consider another case where statics end up in the text segment:

	1	main ()
	2	{
	3	  int i[5] = {0, 1, 2, 3, 4};
	4	  ...
	5	}

When assembled, this becomes (in essence, I have abbreviated it for clarity):

	1	.text
	2	LC0:		; initializer starts here in r-o text segment
	3	  .long 0
	4	  .long 1
	5	  .long 2
	6	  .long 3
	7	  .long 4
	8	.globl main
	9	_main:
	10	  ; ...
	11	  ; copy 20 bytes from LC0 into i
	12	  ; ... body of main follows ...

In this case, I would argue that we don't know or care about the object at 'LC0'. The programmer will never access this object directly, only a copy of it. Since we have already found the location of 'i', we need know nothing more.

The third and final problem, which affects, I believe, only string constants in the text segment, is that the assembler and linker tend to freely merge similar strings into one. This optimization may reduce the size of the executable considerably, and since strings in the text segment are read-only, doesn't change the semantics of the program. For instance, the declaration:

	char *p = "peter and the wolf";
	char *q = "wolf";

may end up being assembled like this:

	.data
	.globl _p			; declaration of p
	_p:
	  .long LC0
	.globl _q			; declaration of q
	_q:
	  .long LC1
	.text
	LC0:				; overlapping strings in text segment
	  .ascii "peter and the "
	LC1:
	  .ascii "wolf\0"

The current implementation of the object tree does not allow overlapping objects, and changing it to fit this example fundamentally alters the assumptions made in the last two chapters of this report.

--

Solving the static data problem, then, involves solving many disconnected problems all at once. Broadly, we must

	. identify named static objects in file scope and within functions, and
	. identify unnamed initializers,
	. except when the initializer doesn't matter - we'll always be using a copy.

The different cases we consider are:

(1) Any uninitialized static object.

Make a global tree constructor call* for the object, using a pointer to the declaration and the real declaration size field.

(2) An initialized static object. No strings are present in the initializer.

The static object will be wholly initialized by the constant initializer. It won't contain pointers to unnamed statics in the text segment. Solution is the same as for case (1).

(3) An initialized static object. Strings are present in the initializer.

Find the named part of the object as in (1).

[.. what do we do to find the initialized strings, which are marked by 'LC??' symbols in the text segment? ..]

(4) An initialized stack object. Strings are present in the initializer.

Iterate over the initializer. When a string is encountered, replace it with a call to '__bounds_note_constructed_string'. For instance, in the following:

	1	f ()
	2	{
	3	  char *p[] = {"tom", "dick", "harry", NULL};
	4	  ...
	5	}

the declaration of 'p' will be replaced by:

	char *p[] = {
	  __bounds_note_constructed_string ("tom", ...),
	  __bounds_note_constructed_string ("dick", ...),
	  __bounds_note_constructed_string ("harry", ...),
	  NULL
	};

The function returns type 'const char *'. In the case of strings which overlap, but do not intersect exactly, a warning is printed and neither of the strings is checked. Strings which overlap exactly (ie. are the same) will still be checked.

(5) A bare string present somewhere in code.

Replace the string with a call to '__bounds_note_constructed_string'. For example:
	printf ("hello world.\n");
is replaced with:
	printf (__bounds_note_constructed_string ("hello world.\n", ...));

3.4 Checking library vs. inline checking functions

Normally GCC builds external calls to functions like '__bounds_add_stack_object' and '__bounds_check_array_reference'. These calls are resolved by linking the checked program with libcheck.a (this is done automatically). By inlining the functions instead, we avoid the costly function call and we allow the back end to make more intelligent optimizations. The cost is a considerable increase in code size (none of the checking functions are small).

As GCC stands, inlining the functions would take only a minimal amount of work (perhaps one day to implement, and a couple of days to test). A better approach might be to profile the checking library with typical programs to find out which pointer functions are performed most frequently, then to inline those functions in simpler cases.

3.5 Implementation of the checking library

The checking library (libcheck.a) contains code to manipulate the object tree, new malloc, realloc and free functions, and code to check pointer arithmetic.

The file 'objects.c' contains functions to maintain a single splay tree of objects. Splay trees are described in an article in Dr. Dobb's Journal (see bibliography). They are a form of dynamic tree that tends to keep frequently referenced objects near the root of the tree, and the tree otherwise balanced. All operations on the tree (including look up) alter the shape of the tree and bring the node of interest to the root.

'objects.c' exports the following functions:
	__bounds_add_heap_object		(1)
	__bounds_is_heap_object
	__bounds_delete_heap_object
	__bounds_add_stack_object		(2)
	__bounds_delete_stack_object
	__bounds_find_object			(3)
	__bounds_find_object_by_base
	__bounds_note_constructed_object	(4)
	__bounds_note_main_args			(5)

(1) These functions are used by the replacement malloc library exclusively, tracing the use of malloc and free by the programmer.
(2) The compiler alone generates calls to these two functions using the C++ constructor/destructor mechanism.
(3) These two functions are mostly used by the checking code to take a pointer and return the corresponding object.
(4) Each static object will be turned into a call to this function using a global constructor function (described previously).
(5) In main (), this function locates the program's argument arrays.

The file 'init.c' is responsible for initializing the bounds checking library and reading parameters from the environment variable GCC_BOUNDS_OPTS. Every object file compiled with bounds checking includes a single global all to '__bounds_initialize_library' which is the only entry point to 'init.c'. Thus if one or more bounds checked modules are present in the final program, this function will be called at least once. The first time it is called, it sets a flag '__bounds_checking_on' to 1, initializes the library and attempts to parse the environment variable. Subsequent times, it sees that '__bounds_checking_on' has already been set and does nothing.

A program that was compiled without bounds checking and not linked to libcheck.a may still consult the flag '__bounds_checking_on' (it will find that it contains 0). The flag is declared in libgcc.a

'check.c' contains the checking functions (described in section 3.2). By and large, each function is passed a pointer to check. It first makes sure that the pointer isn't NULL or ILLEGAL. Usually this will result in an error message. Then it attempts to look up the pointer in the tree (using the function __bounds_find_object from 'objects.c'). If the pointer -> object conversion didn't succeed, it checks to see if the pointer might plausibly be an unknown static or stack object. If neither, the pointer is 'wild', which is to say that it points to random memory outside the program or below the stack pointer. This results in an error message.

If a pointer is successfully converted to its corresponding object, then the checking function is able to see if the outcome of the proposed pointer operation is valid. Often invalid pointer arithmetic results in returning an 'ILLEGAL' pointer which will be caught if the pointer is used later. Sometimes, invalid operations will result in an error there and then. The checking library pays particular attention to object boundaries and pointer alignment. You may not be able to read an array of doubles using a char pointer, for instance.

The file 'malloc.c' contains a new malloc, realloc and free library that replaces the existing functions. It uses 'sbrk' directly, and so is not compatible with other sbrk allocators (nor is the original malloc). Calls to realloc and free are checked strictly. You may only free a pointer to the base of a region allocated with malloc or realloc which has not been freed already. Similarly you must only reallocate a heap region which has not been freed. The following peculiarities are reproduced to allow compatibility with existing implementations:

	. realloc (NULL, size) is equivalent to malloc (size),
	. malloc (0) is equivalent to malloc (1), and
	. free (0) is permitted, but generates a warning.

Two 'back-door' functions are also provided in 'malloc.c'. The functions:

	__bounds_malloc (size, 1);
	__bounds_free (ptr, 1);

allocate and deallocate memory silently, without creating a new object description. The bounds library itself uses these functions to create unchecked heap objects to store the tree and for other temporary purposes. It is not generally recommended that applications should use these functions directly.

Two other support files are used in the library. 'error.c' prints error messages and warnings in a standard way. 'print.c' provides printf, fprintf and vprintf functions that are guaranteed not to use malloc. The standard versions generally call malloc, causing uncontrolled recursion in the malloc library when debugging.

4. Performance.

Inevitably, performing bounds checking in hardware or software has a performance penalty. In hardware, checking that a process doesn't write beyond its allocated memory has to be done once for each memory access, lengthening the processor cycle. Checking individual arbitrarily-sized objects in hardware makes processor design harder. The processors become more complex, less orthogonal and slower.

[...]

5. Conclusions and Future Enhancements.

[...]

B. Bibliography

Pure Software Inc.	(1992-3)	Purify User's Guide
Splay trees ...

U. User Manual

U.1 Installation

GCC-CK comes as a set of patches to the current version of GCC. You may find that your copy of GCC has already been patched and compiled for bounds checking, in which case, you can skip this section about installation.

You will need to obtain a clean copy of the latest version of GCC and the latest GCC-CK patches. Make sure that the GCC and patch versions agree. You can download the latest GCC source from:

	ftp://src.doc.ic.ac.uk/pub/gnu/gcc-2.x.x.tar.Z

Uncompress the GCC source and create the source tree using the following command:

	zcat gcc-2.x.x.tar.Z | tar xvf -

Now apply the GCC-CK patches to the source tree, as follows:

	zcat gcc-ck-2.x.x.tar.Z | tar xvf -
	cd gcc-2.x.x
	patch <gcc-ck.cdiff

You will now need to read the file 'INSTALL' which comes with GCC to find out how to compile GCC for your machine.

U.2 Compiling a program with bounds checking.

GCC works as normal when the bounds checking patches have been applied, unless you supply the flag '-fbounds-checking' on the command line. For example, to compile C source files into object files with bounds checking, you might do:

	gcc -fbounds-checking -O -c file1.c -o file1.o

When you link a program consisting either of all checked object files, or a mixture of checked and unchecked object files, you must also give the '-fbounds-checking' flag:

	gcc -fbounds-checking -O file1.o file2.o file3.o -o program

If you have a Makefile, then all you may need to do to add bounds checking to your program is to add '-fbounds-checking' to the line which starts 'CFLAGS='.

The bounds checking extensions to GCC record information separately from the normal debugging information. Thus, you will not get any more or any less bounds checking information by specifying '-g' on the GCC command line.

U.3 Customizing bounds checking.

[... information on reusing heap, etc. ]

U.4 Mixing checked and unchecked code at the level of the object file or library.

GCC will allow you to freely mix checked and unchecked object files and libraries. Pointers and boundaries will only be checked when the following two conditions are fulfilled:
	. the currently executing code was compiled with '-fbounds-checking', and
	. the object that is being referred to was declared in a file compiled with '-fbounds-checking'.

For example, if we have the following two input files:

	-------- file1.c (checked) -----------------------------------------
	1	#include <stdio.h>
	2	#include <stdlib.h>
	3
	4	int array[10];
	5	extern void f (int *);
	6
	7	main ()
	8	{
	9	  int i;
	10
	11	  for (i = 0; i < 10; ++i)
	12	    array[i] = i;
	13	  f (array);
	14	}

	-------- file2.c (unchecked) ---------------------------------------
	15	#include <stdio.h>
	16	#include <stdlib.h>
	17
	18	void f (int *p)
	19	{
	20	  int i;
	21
	22	  for (i = 0; i < 10; ++i)
	23	    p[i] ++;
	24	}

and we compile and link these input files as follows:

	gcc -fbounds-checking -c file1.c -o file1.o
	gcc -c file2.c -o file2.o
	gcc -fbounds-checking file1.o file2.o -o program

then:

	. the assignment to 'array[i]' at line 12 will be checked (satisfies both conditions above), and
	. the increment of 'p[i]' at line 23 will not be checked (doesn't satisfy the first condition above).

In general, you will wish to compile your whole application with bounds checking, but then assume that the libraries you are linking against are correct. If you want to check your C libraries, you will need to find the source code and compile it with '-fbounds-checking' too.

U.5 Specifying that individual pointers and objects not be checked.

[...]

U.6 Calls to malloc and free are checked too.

[...]

U.7 Using GDB to debug bounds checked programs.

Bounds checked GCC programs can be easily and efficiently debugged with GDB. I suggest placing a breakpoint at the symbol '__bounds_breakpoint' which will cause GDB to stop when a bounds error happens. Then use 'where' to print a stack trace, which will locate the error, where it happened, and what operation was being done. For instance:

GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.12 (i486-unknown-linux), Copyright 1994 Free Software Foundation, Inc...
(gdb) break __bounds_breakpoint
Breakpoint 1 at 0x128a4f: file error.c, line 74.
(gdb) run
Starting program: /home/rich/c/bc-tests/tk3.6/./wish 
Program compiled with bounds checking features by Richard W.M. Jones.
  Bounds warning: free called with a NULL pointer.
  Bounds warning: free called with a NULL pointer.
In file tkConfig.c, line 138,
  Bounds error: attempt to reference memory overrunning the end of an object.
  Pointer value: 0x12f340
  Object commands:
    Address in memory:  0x12f238 .. 0x12f33f
    Size:               264 bytes
    Element size:       1 bytes
    Number of elements: 264
    Created at:         tkWindow.c, line 122
    Storage class:      static

Breakpoint 3, __bounds_breakpoint () at error.c:74
74      }
(gdb) where
#0  __bounds_breakpoint () at error.c:74
#1  0x128ae6 in __bounds_error (
    message=0x1286b3 "attempt to reference memory overrunning the end of an object", filename=0xee305 "tkConfig.c", line=138, pointer=0x12f340, obj=0x50b8e8)
    at error.c:90
#2  0x12887e in __bounds_check_reference (pointer=0x12f340, size=4, 
    filename=0xee305 "tkConfig.c", line=138) at check.c:360
#3  0xee66d in Tk_ConfigureWidget (interp=0x50c168, tkwin=0x514958, 
    specs=0x12fdfc, argc=4, argv=0x12f340, widgRec=0x531698 "XIQ", flags=0)
    at tkConfig.c:138
#4  0x2e8be in ConfigureFrame (interp=0x50c168, framePtr=0x531698, argc=4, 
    argv=0x12f340, flags=0) at tkFrame.c:391
#5  0x2e29c in TkInitFrame (interp=0x50c168, tkwin=0x514958, toplevel=1, 
    argc=4, argv=0x12f340) at tkFrame.c:264
#6  0x3d8c in Tk_CreateMainWindow (interp=0x50c168, screenName=0x0, 
    baseName=0xbffffaba "wish", className=0x263 "Tk") at tkWindow.c:712
#7  0x535 in main (argc=1, argv=0xbffffa38) at tkMain.c:187
(gdb) 

To print a list of all the objects known to the bounds checking library, use the GDB command 'print __bounds_debug_memory(0)'.

To single step a function, displaying calls to the library as you go, first use the GDB command 'print __bounds_debug_print_calls=1' then single step through the code of interest using GDB's 'next' command.

U.8 Common problems when running GCC with bounds checking.

Symptom:
	GCC declares a pointer to be illegal during pointer arithmetic that appears to be correct.
Example code:
	1	#include <stdio.h>
	2	#include <stdlib.h>
	3
	4	int i[10], *p, *q;
	5
	6	main ()
	7	{
	8	  p = i;
	9	  q = p+15-10;		/* fails here */
	10	}
Reason:
	The ANSI C standard declares that a pointer may become undefined if it incremented more that 1 element beyond the end of an object, or before the beginning of an object. In the example above, 'p+15' (line 9) points far beyond the end of the object 'i', and so is undefined. When you subsequently try to subtract 10 from this undefined pointer, GCC will fail and report a bounds checking error at run time. Such expressions cannot be accomodated because of the way the bounds checking works, and are not part of the ANSI C standard anyway.
Solution:
	. Rewrite the code to avoid such expressions, or
	. declare the pointers in the expression to be unchecked, or
	. use the unchecked pointer macros.

[...expand and explain more...]

Symptom:
	Bounds checked code runs too slowly.
Reason:
	For each pointer operation, GCC compiles a considerable amount of extra checking code in its place. Inevitably, this extra code will slow the program down, especially where pointer operations are done frequently, in a loop for instance.
Solution:
	. Compile GCC with -O or -O2, or
	. profile the program to isolate frequently used functions, then move these to an unchecked object file, or
	. live with the program running slowly, but when you come to use it for real, turn bounds checking off.
