Lecture 10

Administrivia

The graders ask that you:

  1. print sample output/ results of tests,
  2. use a monospace font when printing results,
  3. not use indents that are so large that lines of code spill across multiple lines when printed,
  4. write your names on all sheets when you hand in more than one, and that you staple multiple-sheet submissions together, and
  5. if you wish to attempt extra credit homework, please put the original homework at the beginning of their work, and clearly label the extra-credit component.

An Introduction to C, continued...

The programming model of C comes very close to the programming model of assembly language, i.e., the von Neumann architecture. Each process has it's own storage (read-write memory), registers, and a thread of control based on the sequential execution of in-memory instructions. [In less naïve models, some of that storage may be shared with other processes, and there may be multiple threads of control active at the same time. But you don't wander into these models accidentally.] Instructions typically consist of memory moves, e.g., loading a register with in-memory bytes, or storing the contents of a register in memory, operations on quantities contained in registers, various kinds of tests (which set status bits in control registers), and flow-control, e.g., jumps, conditional jumps, etc.

Process memory is viewed as a (partial) array of 2k bytes, for current values of k = 32 or 64. [Note that operating systems go to a lot of work to present to the process the illusion of 2k bytes of memory.]

Various types have different memory requirements, and different alignment requirements. Here's a small, and slightly opaque C program that computes the sizes of various data structures.

// sizeof // compute the size of various standard types. #include <stdio.h> #include <stdlib.h> #define sz(t) printf("sizeof(%s) = %ld\n", #t, sizeof(t)) typedef struct { char a; double d; } cd_t; int main(int argc, char **argv) { sz(char); sz(wchar_t); sz(short); sz(int); sz(long int); sz(long long); sz(float); sz(double); sz(long double); sz(void *); sz(cd_t); exit(0); }

along with a sample run from my (MacOS X 10.10) system:

$ ./sizeof sizeof(char) = 1 sizeof(wchar_t) = 4 sizeof(short) = 2 sizeof(int) = 4 sizeof(long int) = 8 sizeof(long long) = 8 sizeof(float) = 4 sizeof(double) = 8 sizeof(long double) = 16 sizeof(void *) = 8 sizeof(cd_t) = 16 $

Note here the use of a C-preprocessor definition, along with the seldom used "stringification" capability of the preprocessor.

Let's work through this. C has a basic (ASCII) character type char, which is 1 byte. This is a definition so far as C is concerned, which is regrettable: with the benefit of hindsight, byte would have been a better choice. There is also a wchar_t type, which is complier specific, but here is 4 bytes. Although the standard doesn't say what a wchar_t is, in practice UTF-32 is a good guess, although not a particularly good design decision (UTF-16 would have been better). Note again the existence of the iconv(3) family of functions if you have to deal with character-set conversions. The new (but not yet widely supported) C11 standard finally includes explicit recognition of Unicode, and new types to support it, via the char16_t and char32_t types (for storing UTF-16/32 encoded characters, respectively), and Python-like "u8," "u", and "U" prefixes for string literals to indicate UTF-8/16/32 encoding, and functions in the new standard library <uchar.h> for conversions.

C has a basic integer type, int, but also variations short == short int, int, long == long int, and long long. On my system, using the default compliation model, short variables are 2 bytes, int variables are 4 bytes, and long (as well as long long) variables are 8 bytes. Once the processors have native support for 128-bit arithmetic operations, I expect that long long will come to represent 16-byte integers.

C also has floating point types: float, double, and long double, and on my system, these are 4, 8 and 16 bytes respectively. As a practical matter, most floating point work in graphics uses float, and most other floating point work uses double. Generally speaking, contemporary CPUs have highly optimized hardware for dealing with double (and sometimes long double), whereas GPUs have highly optimized hardware for dealing with float, but not the other floating point types. Note that while long double takes 16 bytes, you don't actually get so-called "quad" precision on most contemporary hardware, but instead the "extended precision" (80bit) doubles of the old Intel x87 floating-point co-processor, i.e., 64 bits of precision, rather than the 112 bits of IEEE quad. Sometimes it matters.

C has pointers, i.e., data types that contain memory addresses, and so “point” to other data objects, which on a 64-bit system unsurprisingly are 8 bytes large.

Finally, and perhaps surprisingly, the user defined cd_t type, consisting of a char and a double, takes 16 bytes, rather than 9 as you might expect. This is due to alignment restrictions for double on the x86_64 architecture—they have to aligned to 64-bit word boundaries. So the 0-byte of a cd_t is going to allocated to the a field, bytes 1-7 will be unused, and bytes 8-15 will be allocated to the d field.

Now let's consider a simple function for moment;

// fib.c // The fibonacci function #include "fib.h" int fib(int n) { int a = 0; int b = 1; int k = 0; // a = fib(k) // b = fib(k-1) while (k++ != n) { int t = a + b; b = a; a = t; } return a; }

Things to understand.

Note that in older versions of C (notably ANSI-C, which is essentially the default compliation model for gcc at present), declarations (possibly with initialization) must precede all statements within a block. C-99 relaxes this. Note that there is also a more recent C standard: C-11, although compiler support remains iffy, and the changes comparatively modest.

Now, the complier's job is to translate code like this into machine instructions. The output from the compiler (relocatable machine code) is pretty opaque, but there is a symbolic form (assembly language) which verges on comprehensible, which we can get at by

$ cc -S fib.c

This produces (modulo a bit of cleanup)

_fib: pushq %rbp movq %rsp, %rbp movl %edi, -20(%rbp) movl $0, -4(%rbp) movl $1, -8(%rbp) movl $0, -12(%rbp) jmp L2 L3: movl -8(%rbp), %eax addl -4(%rbp), %eax movl %eax, -16(%rbp) movl -4(%rbp), %eax movl %eax, -8(%rbp) movl -16(%rbp), %eax movl %eax, -4(%rbp) L2: movl -12(%rbp), %eax cmpl -20(%rbp), %eax setne %al incl -12(%rbp) testb %al, %al jne L3 movl -4(%rbp), %eax leave ret

There's a fair bit we can figure out here. (NB, clang produces slightly different results, but the same style of analysis works there, too.) In particular where the local variables are located.

a -4(%rbp) b -8(%rbp) k -12(%rbp) t -16(%rbp) n -20(%rbp)

This is in AT&T syntax, which the GNU and LLVM folks prefer to Intel syntax. The lines that are left justified, don't begin with a period, and end with a colon, are labels—they indicate locations within the instruction stream.

We can see the initializations just before the jmp L2. There's a typical assembly language idiom here—body above, test below, so we have to jump into the test. The test involves a comparison between k and n, as well as a post-test increment of k. We can reasonably suspect that the x86_64 ISA permits register-memory comparisons, but not memory-memory comparisons. At this point, the x86 is parading its history in front of us. An orthogonal CISC (like the MC 680x0) would have permitted direct memory-memory comparisons. A RISC (like the PPCs) would have required register-register comparisons. I'll be honest here. We live (for now) in an Intel architecture world, but if I have to write assembly code, RISC (reduced instruction set computer) any day.

Anyway, we can look at the body of the loop, and see a move, a memory-to-register add, and various data moves. Again, we can do register-memory moves or register-memory moves, but unlike an orthogonal CISC, we can't do memory-memory moves..

Higher levels of optimization would have produced much tighter (but much stranger) code, presumably making much heavier use of the processor's registers.

But the important thing to note here is that the variables map to storage locations, either registers or in memory—and the “typing”, which is to say the interpretation of these bits is determined by the instructions which operate on these locations, e.g., —the addl interprets its arguments (a register and 8-bytes in memory, addressed via the stack pointer) as 8-byte signed integers.

Let's use the fibonacci function we've just written. First, we have to implement main()...

// fib.h // interface file for the fibonacci function int fib(int); // fib-main.c // top level fib function #include <stdio.h> #include <stdlib.h> #include "fib.h" int main(int argc, char **argv) { for (int i = 1; i < argc; ++i) { printf("%d\n",fib(atoi(argv[i])); } exit(0); }

and then compile and run the program:

$ make cc -std=c99 -c -o fib.o fib.c cc -std=c99 -c -o fib-main.o fib-main.c cc fib.o fib-main.o -o fib $ ./fib 10 20 30 55 6765 832040 $

Things to understand:

There's a big problem with this code—the fib function silently overflows for n = 47.

$ ./fib 47 -1323752223

This is clearly ridiculous, but consider

$ ./fib 48 512559680

This looks plausible, but is completely wrong. We can gain a little breathing room by going from int to long int, which doesn't overflow until fib(93) = -6246583658587674878, but this clearly isn't a long-term solution. The basic problem here is that finite-precision arithmetic is inappropriate for dealing with fast-growing functions like fib, and that C does not provide a native infinite-precision integral type. There are good infinite-precision arithmetic packages available for C, e.g., GNU's gmp (Gnu multi-precision) library, but using them takes us a bit further down down the rabbit hole (which is to say, this is really a 15400 problem), so we'll pass for now.

Exercise 10.1 Hailstone. Hailstone sequences begin with a non-zero integer n. Hailstone sequences end with 1. If n != 1, the next element of the hailstone sequence is defined as follows:

Write a C program that takes a single integer n on the command line, and produces (on one line) the hailstone sequence beginning with n, e.g.

$ hailstone 10 10 5 16 8 4 2 1 $