Lecture 4

The content of this lecture is intended to augment your reading of Chapter 3 of Kernighan and Pike, and to shed light on useful ways modern Bourne shells extend the classic Bourne shell.

Interpreter files

A interpreter file is simple text file can function as a executable. The way this works is that when the kernel is passed a file to execute, it looks at the first few bytes. If they have the right “magic” values, then the file is a processor-specific binary file, otherwise, it is an “interpreter file.” The default interpreter is /bin/sh (these days often a hard link to /bin/bash), but other interpreters can be specified by using the “hash-bang” convention on the first line, and you should follow this convention even when /bin/bash is the intended interpreter.

#!/the/path/of/the/interpreter -any -flags

To evaluate an interpreter file, the kernel evaluates designated interpreter, providing the following command-line arguments in the following order:

Thus, for example, suppose we had the following interpreter file, saved in our home directory as ls, with the world execute bit set:

#!/usr/bin/tail +2 Please make sure that . is not on your path!

If one of our friends is silly enough to have . on their path before /bin, and they run ls in our home directory, they're in for a bit of a surprise:

$ ls Please make sure that . is not on your path! $

It could have been a lot worse. What's happening here is that the shell executes the first instance of a command on its path, and prefers our ls to /bin. When this file is evaluated, the kernel (not the shell!) does a de facto rewrite of the call ls to

/usr/bin/tail +2 /home/me/ls

which writes the contents of /home/me/ls to standard output, beginning with the second line.

It is good practice to ignore the existence of the default interpreter (after all, it may differ from system to system), and to code in the intended interpreter via hash-bang. The interpreter itself must be a binary, although it is possible to work around this limitation via env(1).

Typically, the interpreter is one of the shells, but as the previous example shows, it need not be. E.g.,

#!/bin/cat

is a simple self-replicating interpreter file.

The Shells

The shells are really programming languages, with variables, conditiionals, iteration, abstraction, etc. I'm going to assume that you're using /bin/bash, although much of the following will apply to any modern, Bourne compatible shell with minor modification.

When shells start up, they interpret a sequence of configuration files. The last configuration file interpreted, and generally speaking the most important, is a user-defined file. For bash, login shells evaluate .profile, while non-login interactive shells evaluate .bashrc. It is not unusual for the .profile file to invoke the .bashrc file:

. .bashrc

Note that the built-in . command causes the referenced file to be executed within the current shell, as opposed to within a subshell.

A best practice is to keep .bashrc light. Use it to set and export a PATH, and perhaps a few other shell variables, but leave more complicated customizations to the .profile.

Here is where you can set various user options to make your shell your home. For example, bash defaults to using standard emacs key bindings for command line editing. If you prefer vi, you'll want to put

set -o vi

in your .profile.

Command line arguments

The command line arguments are accessible to the shell via $0, $1, $2, etc.

Functions

Bash supports POSIX function definitions, e.g.,

lsl () { ls -l }

This defines the command lsl in the shell, which simply runs ls with the -l (long) flag in the current directory (any arguments are ignored). Functions take their arguments via positional parameters, although $0 does not get remapped. So

$ recho () { echo $2 $1 } $ recho a b b a $

is a not particularly useful example. Because the original Bourne shell did not support functions (and so K&P doesn't mention them), a culture developed in which functions are not used in shell scripting. This is unfortunate, because organizing a program into functions can result in programs that are both smaller and more comprehensible. This is one of those cases where you should feel free to improve on your elders.

Compositions

When a process (for now, think “program == process,” but recognize that this is an oversimplification) terminates, it has a exit status. The exit status is a one-byte quantity.

As Leo Tolstoy wrote at the beginning of Anna Karenina, “Happy families are all alike; every unhappy family is unhappy in its own way.” So it is with exit codes. There is only one exit code that represents success: 0. All of the other 255 exit codes are used to indicate various kinds of failure, each program in its own way.

It is an oversimplification, but a useful one, to think of Unix programs as stream-processors, i.e., each program takes a sequence of bytes (usually representing ASCII characters) from standard input, and produces a sequence of bytes (again, usually ASCII characters) on standard output. Because the “type“ of the input and output are both the same (byte streams), it is meaningful to compose programs, e.g., bolting the standard output of one program to the standard input of another.

The shell provides various kinds of compositional operators, and the notion of success/failure is crucial to understanding how these compositions work.

Basic compositional operators:

(...) — Run the pipeline in a subshell. For example,

$ exit 0

exits the current shell, whereas

$ (exit 0)

runs a subshell in which exit 0 runs. Note that the shell variable $? contains the exist status of the most recently evaluated child process. Thus

$ (exit -1) $ echo $? 255 $

Does everyone understand why?

; — Often described as a pipeline terminator, it actually serializes computations acting on the same pipes, i.e., the composed programs run sequentially from left to right (irrespective of exit codes), with the same input and and output pipe. Thus

$ echo foo; echo bar foo bar $

& — This is similar to ;, but causes the programs to run concurrently, which in practice means that input and output are interleaved in unpredictable ways, e.g.,

$ (echo foo & echo bar & echo baz) > test.out $ cat test.out foo baz bar

Exercise 4.1 Formulate an explanation for the output above.

As a practical matter, & is often used as a unary post-fix operator, i.e., with an empty second argument, which has the effect of running it “in the background,” i.e., control returns immediately to the shell, with the child running concurrently.

We now get to pipeline combinators whose behavior relies on exit codes.

|| — “or.” Combine two processes into a pipeline, running the second only if the first fails. The exit code will be that of the first command if it is unsuccessful, otherwise, it will be the exit status of the second. For example:

$ (exit 1) || (exit 0) $ echo $? 0 $

&& — “and.” The second process is run only if the first succeeds.

For example:

$ (exit 1) && (exit 0) $ echo $? 1 $

Unlike most programming languages, || and && have equal precedence, which is tighter than that of ; or &, both of which have equal precedence. This might seem like a problem, but in practice, it is unusual to see these operators mixed. Indeed, it is perhaps more useful to think of them idiomatically, with || used for failure handing, and && used as a “guard,” i.e., a means to protect downstream processes from upstream failure.

Variables

We assign variables like this:

PATH=${PATH}:~/bin

Note that there cannot be a space between the variable name and the equality symbol. Also note that variables take $ on the RHS (when we're asking for their values), and not on the LHS (when we're binding them). The {} are optional, but good practice. Note also that the shells don't support indirect assignment, i.e.,

foo=bar ${foo}=baz

is an error, and does not result in setting bar to the value baz.

There are some special variables:

Variable Value
$@,$* command line parameters (these forms differ in how they behave in quoted contexts).
$# number of command line parameters
$$ PID of the current process
$? exit code of last executed command

And some special editing forms:

Form Value
${var:-value} default value: ${var} if ${var} is defined, val otherwise
${var%suffix} delete a suffix: e.g., ${var%.*} gives the file name, with any extension deleted. Note that '.' has no special meaning here: these are not quite regular expressions.
${var/pattern/string} substitute: substitute string for first occurrence of pattern in the expansion of var
${var//pattern/string} global substitute: substitute string for all occurrences of pattern in the expansion of var

An example: pman

This is a MacOS X specific command that is similar to man, but does fancier typesetting, caching and displays the results through Preview.app rather than through the terminal. Note that this requires installation of the ghostscript package (e.g., via MacPorts) to get ps2pdf. Similar scripts should work on other OSs.

#!/bin/bash # pman -- view a typeset man page PATH=/bin:/usr/bin:/usr/local/bin/ cachedir=/tmp/${USER}/pman-cache/ # create cache directory if necessary if [[ ! -d ${cachedir} ]] then mkdir -p ${cachedir} fi # name cache file command="$*" source=$(man -w ${command}) target=${cachedir}/${command// /-}.pdf # typeset & cache man page, if it doesn't yet exist, or is out of date. if [[ ! -f ${target} || ${source} -nt ${target} ]] then man -t -c "$@" | ps2pdf - ${target} fi # view cached man page # "open" is MacOS X-ism. open -a Preview.app ${target} exit 0

Some things to note:

Quoting

A remarkable aspect to the shell is the extent to which it is really a language driven by the creation and interpretation of strings. E.g.,

$ cmd="echo foo" $ ${cmd} foo $

There are limits to this, e.g., you can't store shell syntax within the string value of a variable, expand it, and run it (we've already seen this with the “no indirect assignment” example). But still, strings rule in the shell.

One complication of this is in the interpretation of command line arguments. Let's do a small example. Let's suppose we want a version of the echo command that prints one argument per line, rather than separating each argument by a space:

#!/bin/bash # echoo -- print one argument per line for arg in $@ do echo ${arg} done

Yes, I know, we haven't seen for yet, but bear with me.

$ echoo foo bar foo bar $

until we try to create an argument that contains an imbedded space...

$ echoo "foo bar" baz foo bar baz $

The problem here is that our quoted argument to echoo gets expanded without the quotes in $@, and so the for sees two arguments, foo and bar rather than a single foo bar. The solution to this is to put the $@ variable itself into double quotes, which has the effect of putting each command line argument in double quotes.

#!/bin/bash for arg in "$@" do echo ${arg} done

This seems to work

$ echoo "foo bar" baz foo bar baz $

until we put extra spaces within the quoted string...

$ echoo "foo bar" baz foo bar baz $

Note carefully how a double space in the argument to echoo became a single space in the output. Why? There were no quotes around the argument to echo! So, one more adjustment:

#!/bin/bash for arg in "$@" do echo "${arg}" done

Exercise 4.2 Try an experiment, in which "$@" in the last script above is replaced by "$*", and run it using

$ echoo "foo bar" baz

Explain the output, accounting for every space in it. You might find it helpful to read the “Special Parameters” section of the bash man page.

Some morals. Variables within double quotes are expanded (variables within single quotes are not). Quoting (double or single) is used to marshall command line arguments, and to control variable expansion and the interpretation of white space. There is a lot of trickiness here, and a variety of idioms have emerged, which is pretty much the way of shell programming. It's a good idea to be attentive to shell idioms as you encounter them, and to figure out what they're doing, and how they might be useful.

Exercise 4.3 Write a shell script words that takes a pattern argument, and returns all words in your dictionary (/usr/share/dict/words or /usr/dict/words) that match that pattern completely (i.e., not just in part). Thus, e.g.,

$ words foo. food fool foot $

This program is very useful when attempting to solve crossword puzzles. The trickiness is in building a pattern on the command line for egrep that controls the beginning and end of the word, while working around the quoting and syntax rules for the shell.