Lecture 9

Administrivia

Just a quick reminder—keep up with the wiki. Don't forget that it is the most important part of the course.

A call has come up for volunteers to tutor and teach at a code camp directed to middle-school girls. If you're interested: Femmes Code Camp.

The C Programming Language, I

The next few days are a lead-in to Chapter 6 of Kernighan and Pike. Kernighan and Pike assume you already know C. If you don't know C already, you should grab a book and work your way through it. I recommend the 2nd (ANSI C) edition of “The C Programming Language” by Kernighan and Richie. This describes an earlier version of C, but the changes are slight, and mostly center on a few productive C++-isms.

C is often referred to as a “high level assembler,” because the programming model that C presents maps very directly onto the programming model supported by typical computer hardware, and much of the power common to assembly-language programming models is available directly in C.

The translation from C code to machine code is unusually straightforward, and programmers are usually secure in making inferences about the nature of that code (e.g., how quickly will it run, how large will it be) that are difficult or impossible to make with higher-level languages.

C is by far the lowest-level language we will consider this quarter, and it has the most complicated tool-chain (i.e., the process of getting from a C source code to a working program is more complicated).

C is a compiled language. This means that the C programs are translated into machine language, and it is the machine language version of the program that is ultimately run. C is also an explicitly typed language—a language in which variables (not object) carry type, although the type system is weaker (in the sense that it is less expressive and less secure) than Haskell, and more burdensome (in the sense that C relies entirely on programmer type annotations, and does no type inference itself).

Let's start with a simple example:

// hello.c // hello -- greet the user. #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { printf("Hello, world!\n"); exit(0); }

To compile and run this, we do

$ cc hello.c -o hello $ ./hello Hello, world! $

Back in the old days, cc meant Dennis Ritchie's AT&T C compiler. These days, it's usually a link to gcc (the Gnu C Compiler) or clang.

Things to understand --

Question: Isn't the type of main a lie? After all, we exit from main, rather than returning a value. Why does main have a type at all?

Let's make this a bit more complex:

// hello.c // hello -- greet the user by name. #include <stdio.h> #include <stdlib.h> void greet(char *user); int main(int argc, char **argv) { greet(getenv("USER")); exit(0); } void greet(char *user) { printf("Hello, %s!\n", user == NULL ? "world" : user); return; }

First, we're changing the functionality of the program to look up and use the USER environment variable. Second, we've moved the greeting into a subroutine. This is ridiculous in this case, but the idea of controlling complexity by abstracting subtasks is fundamental to good software engineering.

Things to understand

Note that this could have been much more easily done in the shell:

#!/bin/bash print "Hello, ${USER:-world}!"

Remember this!! Mastering C is the programming language equivalent of learning to drive a stick-shift. It's useful, and a bit arcane. You can do things with it that lesser drivers with automatic transmission cars can't do, understand, or appreciate. That doesn't necessarily mean that it's the easiest/quickest way to get from here to the grocery store and back!

Let's do one more tweak on the hello program, though. We will often break a complicated program into multiple source files. This has a number of advantages, all of which center around modularity, reusability, and compilation efficiency. Let's say, for the sake of argument, that we wanted to break this into two files, one of which contains main, and the other of which contains the greet function. By the time we're done, we'll need four(!) files to accomplish this...

First, we have the obvious implementation files:

// hello.c // hello -- greet the user by name. #include <stdlib.h> #include "greet.h" int main(int argc, char **argv) { greet(getenv("USER")); exit(0); } // greet.c // The implementation of the greet module. // Greet a user by name. #include <stdio.h> #include "greet.h" void greet(char *user) { printf("Hello, %s!\n", user == NULL ? "world" : user); return; }

And then a header (interface) file, which declares the type of greet.

// greet.h // The greet interface. #ifndef GREET_H #define GREET_H void greet(char *user); #endif

Note that the header file greet.h is included in both the implementation file (greet.c) and its client files (hello.c). Header (.h) files should only contain declarations. The use of a preprocessor wrapper around the contents of greet.h is common, and a good discipline. It guarantees that it is safe to include greet.h multiple times, since only the first time will be expanded out. This may seem a simple condition to meet, but as we'll see, it is not unusual for header files to include other header files, and this makes avoiding multiple inclusion difficult.

It may seem unproblematic to have a header file included multiple times, but we will soon see declarations that cannot be repeated.

Once we've made the move to multiple source files, managing compliation becomes tricker. We can pretend it's simple:

$ gcc hello.c greet.c -o hello $ hello Hello, stuart! $

But a simple ls(1) shows a difference—

$ ls greet.c greet.h greet.o hello hello.c hello.o

What are those .o files? (Relocatable) object files. These are compiled versions of the corresponding .c files, which then have to linked together (along with the basic C runtime library) to create a valid program. For example, when we compile hello.c, it contains a reference to the function void greet(char *user), but it doesn't have a definition. The linker combines the object files, and resolves the definitions so that the use of void greet(char *) in hello.c refers to the location in the combined file that contains the definition of void greet(char *).

What's happening behind the scenes when there's more than one source file is a bit more complicated. The gcc script actually does separate compilation of the source files, and then a link step as follows:

$ gcc -c hello.c $ gcc -c greet.c $ gcc hello.o greet.o -o hello

The last gcc actually hides a call to ld(1) ...

There is good news and bad news here. If we decide to change the greet function, we need only recompile greet.c and relink...

$ gcc -c greet.c $ gcc hello.o greet.o -o hello

This doesn't seem like much, but it can be a big savings if there's a lot to compile. Remember here that C is a systems programming language, and we may be recompiling our operating system. Unfortunately, remembering what needs to be compiled and what doesn't can be difficult, and humans aren't good at remembering—especially, e.g., if an interface (.h) file changes, and we need to remember to recompile the implementation and all it's clients.

This is the problem solved by the make tool. We describe a collection of dependencies, e.g., the file greet.o depends on both greet.c and greet.h. A change to either file means that greet.o is invalid, and should be reconstructed.

# Makefile CFLAGS= -std=c11 hello: greet.o hello.o greet.o: greet.h hello.o: greet.h clean: rm -f greet.o hello.o hello install: hello cp hello ~/bin

Things to note:

The build process is now very simple:

$ make cc -std=c11 -c -o hello.o hello.c cc -std=c11 -c -o greet.o greet.c cc hello.o greet.o -o hello $

A subsequent make has nothing to do:

$ make make: `hello' is up to date. $

But if we edit a file

$ touch greet.h $ make cc -std=c11 -c -o hello.o hello.c cc -std=c11 -c -o greet.o greet.c cc hello.o greet.o -o hello $

There is one final build process tool, and you're in business. The task of maintaining make files can become daunting after a while, especially if you have deep include nesting. There is a simple tool called makedepend that modifies a Makefile so that it includes dependencies for all of the files introduced via include. The -Y flag keeps makedepend from traversing the system include directories.

# Makefile for the multiple source file version of hello. CFLAGS= -std=c11 SRC=greet.c hello.c OBJ=greet.o hello.o hello: ${OBJ} clean: rm -f ${OBJ} hello Makefile.bak install: hello cp hello ~/bin depend: makedepend -Y ${SRC} &> /dev/null # DO NOT DELETE greet.o: greet.h hello.o: greet.h

Subsequent runs of make depend will replace the lines below “# DO NOT DELETE.”

Note also both the definition and use of shell variables within the Makefile to improve maintainability. This is clearly overkill in the present case, but the technique is an important way to reduce overall Makefile complexity.

It's useful to remember make in other contexts where you have source/target file dependencies. It can be very useful in document preparation contexts, e.g., $ make book.

Exercise 9.1 Write, compile, and run a C program that has the following attributes:

  1. It should have multiple source files.
  2. Header (.h) files should be correctly used.
  3. It should produce output
  4. You should include a working and complete Makefile.

You should hand in all of your source files, including the Makefile, along with a sample run of your program illustrating its output.