Correctness

In sequential programs, rerunning a program with the same input will always give the same result, so it makes sense to debug a program: run and rerun the program with breakpoints until a problem is diagnosed; fix the source code; rerun to check if the output is not correct. In a concurrent program, some scenarios may give the correct output while others do not. You cannot debug a concurrent program in the normal way, because each time you run the program, you will likely get a different scenario. The fact that you obtain the correct answer may just be a fluke of the particular scenario and not the result of fixing a bug.

In concurrent programming, we are interested in problems—like the problem with Algorithm 2.4—that occur as a result of interleaving. Of course, concurrent programs can have ordinary bugs like incorrect bounds of loops or indices of arrays, but these present no difficulties that were not already present in sequential programming. The computations in examples are typically trivial such as incrementing a single variable, and in many cases, we will not even specify the actual computations that are done and simply abstract away their details, leaving just the name of a procedure, such as critical section. Do not let this fool you into thinking that concurrent programs are toy programs; we will use these simple programs to develop algorithms and techniques that can be added to otherwise correct programs (of any complexity) to ensure that correctness properties are fulfilled in the presence of interleaving.

For sequential programs, the concept of correctness is so familiar that the formal definition is often neglected. (For reference, the definition is given in Appendix B.) Correctness of (non-terminating) concurrent programs is defined in terms of properties of computations, rather than in terms of computing a functional result. There are two types of correctness properties:

Safety properties The property must always be true.

Liveness properties The property must eventually become true.

More precisely, for a safety property P to hold, it must be true that in every state of every computation, P is true. For example, we might require as a safety property of the user interface of an operating system: Always, a mouse cursor is displayed. If we can prove this property, we can be assured that no customer will ever complain that the mouse cursor disappears, no matter what programs are running on the system.

For a liveness property P to hold, it must be true that in every computation there is some state in which P is true. For example, a liveness property of an operating

system might be: If you click on a mouse button, eventually the mouse cursor will change shape. This specification allows the system not to respond immediately to the click, but it does ensure that the click will not be ignored indefinitely.

It is very easy to write a program that will satisfy a safety property. For example, the following program for an operating system satisfies the safety property Always, a mouse cursor is displayed:

while true
    display the mouse cursor

I seriously doubt if you would find users for an operating system whose only feature is to display a mouse cursor. Furthermore, safety properties often take the form of Always, something “bad” is not true, and this property is trivially satisfied by an empty program that does nothing at all. The challenge is to write concurrent programs that do useful things—thus satisfying the liveness properties—without violating the safety properties.

Safety and liveness properties are duals of each other. This means that the negation of a safety property is a liveness property and vice versa. Suppose that we want to prove the safety property Always, a mouse cursor is displayed. The negation of this property is a liveness property: Eventually, no mouse cursor will be dis- played. The safety property will be true if and only if the liveness property is false. Similarly, the negation of the liveness property if you click on a mouse button, eventually the cursor will change shape, can be expressed as Once a button has been clicked, always, the cursor will not change its shape. The liveness property is true if this safety property is false. One of the forms will be more natural to use in a specification, though the dual form may be easier to prove.

Because correctness of concurrent programs is defined on all scenarios, it is impossible to demonstrate the correctness of a program by testing it. Formal methods have been developed to verify concurrent programs, and these are extensively used in critical systems.

Linear and branching temporal logics

The formalism we use in this book for expressing correctness properties and veri- fying programs is called linear temporal logic (LTL). LTL expresses properties that must be true at a state in a single (but arbitrary) scenario. Branching temporal logic is an alternate formalism; in these logics, you can express that for a property to be true at state, it must be true in some or all scenarios starting from the state. CTL [24] is a branching temporal logic that is widely used especially in the verification of computer hardware. Model checkers for CTL are SMV and NuSMV. LTL is used in this book both because it is simpler and because it is more appropriate for reasoning about software algorithms.