GotW #61

Home Blog Talks Books & Articles Training & Consulting

On the
blog
RSS feed November 4: Other Concurrency Sessions at PDC
November 3
: PDC'09: Tutorial & Panel
October 26: Hoare on Testing
October 23
: Deprecating export Considered for ISO C++0x

CHALLENGE EDITION: ACID Programming 
Difficulty: 10 / 10

What are the similarities and differences in reasoning about exception safety in particular, and program safety in general? Is it possible to generalize an approach that encompasses general program correctness analysis, both in C++ and in other languages? And what guidelines can we extrapolate that will help improve our day-to-day programming? For answers, see this first-ever "10/10 difficult" issue of GotW.

Problem

Preface

In this issue of GotW, I want to suggest a new approach to program correctness analysis that covers both "normal" correctness (i.e., in the absence of exceptions) and exception safety analysis in C++ and other languages. Hopefully it is useful, and if so then it will doubtless be subject to refinement; no one ever bothers to refine unuseful ideas.

The purpose of this GotW is to define a taxonomy to better describe exception safety, and to see whether it can provide us with new ways of reasoning about program correctness. If successful, the method should meet the following goals:

bullet

It should be inclusive of existing work, covering existing exception safety analysis methods.

bullet

It should help to fill holes in existing methods, such as a situation where given sample code was known to support some exception-safety guarantee but that guarantee could not be adequately described using the existing models alone.

bullet

It should help us reason about and analyze the safety of code, making it less a craft and more a methodical engineering task.

This brings us to the GotW questions:

JG Questions

1. Consider again the Abrahams exception safety guarantees, as frequently described in GotW, and any others that you may know of. Then, review the technique described in GotW #59 (Question #4 and Conclusion #2). Which Abrahams (or other) guarantee does that technique provide? Discuss.

2. In the context of databases, what do the following terms mean:

a) transaction

b) ACID

Guru Questions

3. The key insight is to notice that ACID transactional analysis can be applied to program correctness, including C++ exception safety. Demonstrate a parallel between the two by showing a programming analog for each aspect of ACID.

4. Treat the analysis from Question #3 as a taxonomy and use it to describe the Abrahams and other prior guarantees. Discuss.

5. Does the answer from Question #4 equip us to give a better answer to Question #1? Explain.

Solution

INTRODUCTION

Exception Safety Recap

 

1. Consider again the Abrahams exception safety guarantees, as frequently described in GotW, and any others that you may know of.

Recapping from GotW #59, Dave Abrahams' guarantees are usually stated as follows.

A1. Basic Guarantee: If an exception is thrown, no resources are leaked, and objects remain in a destructible and usable -- but not necessarily predictable -- state. This is considered by many to be the weakest usable exception safety guarantee, and is appropriate where client code can cope with failed operations that have already made changes to objects' state.

A point I want to make here is that this essentially means "there are no bugs." It's not much different from saying: "If an error occurs, no resources are leaked, and objects remain in a destructible and usable -- not not necessarily predictable -- state."

For example, consider a Container::AppendElements() operation that appends multiple elements onto the end of an existing container. If called to append 10 elements, and it successfully adds 6 elements before encountering an error and stopping, then the operation provides only the basic guarantee: The container is still in a consistent state (it still meets the Container invariants, and further operations on the container will work fine), but the state is not necessarily predictable (the caller probably has no way to know in advance what the final state will be if an error is encountered; it may or may not be the initial state).

A2. Strong Guarantee: If an exception is thrown, program state remains unchanged. This guarantee always implies global commit-or-rollback semantics, including that no references or iterators into a container be invalidated if an operation fails.

Note that here we are already heading toward transactional semantics.

In addition, certain functions must provide an even stricter guarantee in order to make the above exception safety guarantees possible in higher functions:

A3. Nothrow Guarantee: The function will not emit an exception under any circumstances. It turns out that it is sometimes impossible to implement the strong or even the basic guarantee unless certain functions are guaranteed not to throw (e.g., destructors, deallocation functions). For example, an important feature of the standard auto_ptr is that no auto_ptr operation will throw.

Other experts have also proposed exception safety guarantees. Building on earlier work by Harald Mueller[2], Jack Reeves[3] suggested three guarantees regarding the state of an object after an exception had occurred:

R1. Good: The object meets its invariants.

R2. Bad: The object has been corrupted in that it no longer meets its invariants, and the only thing that you are guaranteed to be able to safely do with it is destroy it.

R3. Undefined: The object is throughly corrupted and cannot be safely used or even destroyed.

Over the years, other possible guarantees have been suggested, notably the "destructible guarantee," which is another way of saying that if an exception occurs then an object is in Reeves' "Bad" state.

Analyzing a Motivating Example

Then, review the technique described in GotW #59 (Question #4 and Conclusion #2). Which Abrahams (or other) guarantee does that technique provide? Discuss.

In GotW #59, the main technique presented was that a class contain its member objects by auto_ptr<Pimpl> rather than directly by value:

// The general solution to
// Cargill's Widget Example
//
class Widget
{
  // ...
private:
  class Impl;
  auto_ptr<Impl> pimpl_;
  // ... provide destruction, copy construction
  // and assignment that work correctly, or
  // suppress them ...
};

class Widget::Impl
{
public:
  // ...
  T1 t1_;
  T2 t2_;
};

void Widget::Swap( Widget& other ) throw()
{
  auto_ptr<Impl> temp( pimpl_ );
  pimpl_ = other.pimpl_;
  other.pimpl_ = temp;
}

Widget& Widget::operator=( const Widget& other )
{
  Widget temp( other ); // do all work off to the side

  Swap( temp ); // then "commit" the work using
  return *this; // nonthrowing operations only
}

See GotW #59 for further details.

Now we get to the crux of the matter: That GotW then made the claim that the above Widget::operator=() provides the strong guarantee without any exception-safety requirements on T1 and T2. As Dave Abrahams has pointed out, this isn't quite true: For example, if constructing a T1 or T2 object causes side effects (such as changing a global variable or launching a rocket) and then throws an exception, and that side effect is not cancelled by the corresponding T1 or T2 destructor, then Widget::operator=() has caused a side effect and so doesn't fully meet the requirements of the strong guarantee. (It does fully meet the requirements of the strong guarantee in all other ways besides the possibility of side effects.)

So here's the rub: The above solution to Cargill's Widget Example is an important technique, it does provide a safety guarantee, and that guarantee appears to be a fundamental concept -- it is the strongest exception safety guarantee one can make for assignment without requiring any assumptions at all about the used type (more on this below). Better still, it is a useful guarantee to be able to make; for example, it is useful to know that upon failure a change to a container object will produce no local effects on the container, even though it might invalidate existing iterators into the container. More on this in the final section.

So, since it is a guarantee, what guarantee is it? It doesn't meet the requirements of the Abrahams strong or nothrow guarantees, but at the same time it seems to give more than just the Abrahams basic guarantee or the Reeves Good guarantee.

There's got to be a name for a concept as useful and as fundamental as "the strongest guarantee you can make without requiring assumptions about the used type" -- and that's not a name, that's just an explanation. It bothered me greatly that we had no succinct name for it.

TOWARD TRANSACTIONAL PROGRAMMING 

Ideas From the Database World

Could another area of computer science yield any insights that would help? To introduce the answer, let's take a quick trip into the database world and its existing methods of analyzing program correctness with respect to database manipulation:

2. In the context of databases, what do the following terms mean:

a) transaction

A transaction is a unit of work. That is, it accomplishes some task (typically involving multiple database changes) in such a way that it meets the ACID requirements to a specified degree appropriate for its particular task:

b) ACID

ACID is an acronym for a set of four requirements:

RequirementDescription (Databases)

Atomicity

The operation is indivisible, all-or- nothing. A transaction can be "committed" or "rolled back"; if it is committed all of its effects must complete fully, otherwise if it is rolled back it must have no effect.

Consistency

The transaction transforms the database from one consistent state to another consistent state. Here "consistent" implies obeying not just relational integrity constraints but also business rules and semantics, such as that a given Customer record's Address and ZIP fields must not specify inconsistent information (e.g., an address in New York and a ZIP code in Alabama).

Isolation

Two transactions are fully isolated if they can be started simultaneously and the result is guaranteed to be the same as though one was run entirely before the other ("Serializable"). In practice, databases support various levels of isolation, each of which trades off better isolation at the expense of concurrency or vice versa. Some typical isolation levels, in order from best to worst concurrency, are: Read Uncommitted (a.k.a. Dirty Read), Read Committed, Repeatable Read, and Serializable (highest isolation), and describe what kinds of intermediate states of other transactions might be visible.

Durability

If a transaction is committed, its effects are not lost even if there is a subsequent system crash. This requirement is usually implemented using a transaction log which is used to "roll forward" the state of the database on system restart.

Note 1: Two transactions operating on nonoverlapping parts of the database are by definition always fully isolated.

The following notes show why isolation is disproportionately important:

Note 2: Isolation interacts with atomicity. For example, consider three transactions T1, T2, and T3 that run concurrently. All three are atomic. T1 and T2 run at Serializable isolation and do not interact with each other; that is, because of their isolation levels, T1 and T2 are also guaranteed to be atomic with respect to each other. T3, however, runs at Read Uncommitted isolation, and it is possible for T3 to see some but not all of the changes made by T1 and T2 before those other transactions commit; that is, because of its isomation level, T1 and T2 are NOT atomic as seen by T3.

Note 3: Because isolation interacts with atomicity, it also transitively interacts with consistency. This is because lower isolation levels may permit states to be seen by other transactions that would not otherwise be visible, and relying carelessly on such intermediate states can create inconsistencies in the database. For example, consider what could happen if transaction T4 observes some intermediate state of transaction T5, makes a change based on that observed state, but T5 is subsequently rolled back. Now T4 has made further changes based on phantom data that, as it turns out, never really existed.

For more information, see a standard database text like Date's An Introduction to Database Systems.[4]

In the database world, atomicity, consistency, and durability are always fully required in most real-world database systems. It is the isolation aspect that offers tradeoffs for concurrency (and, as noted, the tradeoffs can indirectly affect atomicity and consistency too). In practice, running at anything less than Serializable isolation opens windows wherein the database could get into a state that violates business rules (and hence the consistency requirement), but it's done all the time. In short, we live with this quite happily today, even in some financial database systems with stringent integrity requirements, because we can measure the exposure and we accept the tradeoff in order to achieve better concurrency. Nearly all production databases in the world run at less than the strongest isolation levels at least some of the time.

Mapping ACID Concepts to Non-SQL Programming

Note that in the following discussion, for an operation to "fail" can mean fail in any way, so that the analysis is the same whether the failure is reported by an error return code or by an exception.

3. The key insight is to notice that ACID transactional analysis can be applied to program correctness, including C++ exception safety. Demonstrate a parallel between the two by showing a programming analog for each aspect of ACID.

The elements of ACID were developed to describe requirements in database environments, but each has analogs that make sense in programming environments. Note that these analogs apply to reasoning about program correctness in general, both with and without exceptions and including exception support in other languages, not just to C++ exception handling.

The following analysis distinguishes between local effects and side effects:

1. Local effects include all effects on the visible state of the immediate object(s) being manipulated. Here the visible state is defined as the state visible through member functions as well as free functions (e.g., operator<<()) that form part of the interface. For more on what constitutes the interface of a type, see the Interface Principle as described in my original IP article[5] and in Items 31-34 of Exceptional C++[6].

2. Side effects include all other changes, specifically changes to other program states and structures as well as any changes caused outside the program (such as invalidating an iterator into a manipulated container, writing to a disk file, or launching a rocket), that affect the system's business rules, as they are expressed in object and system invariants. So, for example, a counter of the number of times a certain function is called would normally not be considered a side effect on the state of the system if the counter is simply a debugging aid; it would be a side effect on the state of the system if it recorded something like the number of transactions made in a bank account, because that affects business rules or program invariants. Note: In this article I use the terms "business rules" and "invariants" interchangeably.

Of course, if an effect is visible as a side effect but also through the manipulated object's interface (such as via ReadByte() or WasLaunched() functions), it is both a local effect and a side effect.

I propose that the following descriptions of the ACID requirements may be suitable for describing program correctness in general and exception safety in particular. They form four fundamental axes of ACID exception safety analysis:

RequirementDescription (Programming)

Atomicity

The operation is indivisible, all-or- nothing, with respect to local effects.

Consistency

The operation transforms system from one consistent state to another consistent state. Here "consistent" implies obeying not just object integrity constraints (i.e., objects continue to fulfill their invariants) but also wider system invariants. Note "no [memory or resource] leaks" is usually a required program invariant.

Before tackling Isolation, a note: In the database world, isolation is often all about concurrency -- that is, access and manipulation of the same data elements by different transactions operating concurrently. The same concepts we use to describe database transaction isolation map directly onto multithreaded and multiprocess access and manipulation of shared resources; the same issues apply -- read locks, write locks, read/write locks, deadlocks, livelocks, and similar issues.

There is a difference, however, between relational databases and programs. Standard relational database systems provide direct support in the database engine to enforce isolation options such as "Read Committed" by automatically creating, maintaining, and freeing locks on individual data objects. They can do this because the engine "owns," or acts as an arbiter and access point for, all the data. No such support exists in most programming languages (including C++ and Java; Java has some thread safety primitives, but they are not very useful and do not actually solve the thread safety problem even in simple cases). In languages like C++ and Java, if you want locking you have to do it yourself in an application-specific way by manually writing calls to acquire and release locks on program resources like shared variables.

RequirementDescription (Programming)

Isolation

Two operations are fully isolated if they can be started simultaneously and the result is guaranteed to be the same as though one was run entirely before the other ("Serializable"). In practice, programs implement various levels of isolation, each of which trades off better isolation at the expense of concurrency or vice versa. Some typical isolation levels, in order from best to worst concurrency, are: Read Uncommitted (a.k.a. Dirty Read), Read Committed, Repeatable Read, and Serializable (highest isolation), and describe what kinds of intermediate states of other operations might be visible.

Interestingly, however, note that concurrency isolation issues only arise between transactions manipulating the same data; so, as already noted earlier, isolation is fundamentally concerned with not only concurrency, but interaction of side effects. As Note 1 above stated: two transactions operating on nonoverlapping parts of the database are by definition always fully isolated.

In the interests of space, this article does not pursue concurrency issues in depth (which would be a lengthy discussion beyond the immediate scope of the article), except to note the following: The ACID program analysis method proposed in this article can be directly extended to address concurrency issues in the same way that those issues are addressed in database environments, with the caveat that most programming languages do not provide the same native support provided inherently by relational database management systems for basic concurrency control and that these must therefore be implemented by the application. A later article will expand on this wider aspect of isolation.

This article then focuses specifically on the aspects of isolation of interest in a nonshared/serialized environment, namely the interaction of side effects:

RequirementDescription (Programming)

Isolation (continued)

Regardless of threading and other (continued) concurrency issues, an operation on one or more objects is fully isolated if it does not cause side effects (i.e., only causes local effects on those objects).

Durability

If an operation completes successfully, all its effects are preserved even if there is a subsequent system crash. For example, a rocket once fired stays in flight even if the launching software in the submarine crashes afterwards.

Here's a difference: In the database world, atomicity, consistency, and durability are always fully required and it is usually isolation that is traded off (with some indirect effects on atomicity and consistency). In the programming world, atomicity is normally assumed (but if there are asynchronous interrupts or exceptions, atomicity has to be explicity programmed for), consistency in terms of maintain invariants is also normally assumed, and durability is not often considered except in database-like applications, where the durability is made the responsibility of an underlying database or file system.

Really, it looks like what we are building up to is a transactional approach to program correctness, including C++ exception safety, arrived at by adapting concepts that are already well-understood in another area of computer science.

A Notation for Safety Guarantees

Next, let's analyze the possible statements we can make about each axis. For each axis, we can make one of three statements:

a) that there is no guarantee at all (designated by a dash, '-', or simply omitting mention of the guarantee);

b) a statement about effects if an operation may fail (designated by a lowercase letter, such as 'i'); or

c) in addition to b), a statement about effects if an operation does not or cannot fail (designated by an uppercase letter, such as 'I').

Note that these are true "levels" in that each subsumes the one before it. I omit a fourth possible case ("a statement about effects if an operation succeeds [but with no statement about what happens on failure]") because it is no better than a) for analyzing failure cases.

Here's what we get:

LevelDescription

Atomicity

-

No guarantee.

a

If the operation fails, the manipulated object(s) are in their initial state, otherwise the manipulated object(s) are in their final state.

A

The operation is guaranteed to succeed and the manipulated object(s) are in their final state. (For exception- related failures, this is spelled "throw()" in C++; but an operation could still fail in ways other than throwing an exception.)

Note that atomicity as defined here concerns itself only with local effects, that is, with the state of a directly manipulated object.

Consistency

-

No guarantee.

c

If the operation fails, the system is in a consistent state. (Depending whether or not the operation also makes some atomicity guarantee, the consistent state may be the initial state, the final state, or possibly an intermediate state, depending on whether and how a failure occurred.)

C

If the operation succeeds or fails, the system is in a consistent state.

Any operation that can result in objects no longer meeting their invariants should be considered buggy. No program safety, let alone exception safety, is possible without a guarantee of consistency on success, so the rest of this article will generally rely on C. Further, we require that at the beginning of the operation the system is in a consistent state, else no reasoning about consistency is possible or useful.

Isolation

-

No guarantee.

i

If the operation fails, there will be no side effects.

I

If the operation either succeeds or fails, there will be no side effects.

Note that isolation as defined here concerns itself only with side effects. In many ways it is a mirror of atomicity.

Note that 'a'+'i' => 'c', because we require that the objects and the system were initially in a consistent state.

Durability

-

No guarantee.

d

If the operation fails, whatever state was achieved will persist.

D

If the operation succeeds or fails, the state persists.

Interlude: FAQs and Objections

At this point, several questions or objections arise, and it makes sense to pause briefly and handle them now:

Q: "I think 'c' and 'd' may not be useful, on the grounds that they seem backward: Shouldn't operations first specify whether they produce consistent and/or durable results if they succeed, and then optionally what they do if they fail?"

A: An interesting outgrowth of the above ACID view, then, is that it is actually more important first to state what operations do if they fail, and only then to state in addition what they do if they succeed. (I grant that programmers who find it annoying that their companies force them to document their functions' failure modes may also dislike the previous statement.)

Q: "The database version of 'atomicity' includes all effects, whereas the above doesn't. Is that right?"

A: Yes. It is useful in programming to distinguish between local effects and side effects, and the global atomicity guarantee still exists and is expressed as "both 'a' and 'i'."

This last point leads nicely into the question of combining guarantees:

A Notation for Combining Guarantees

The four axes of correctness analysis are not entirely independent; one can affect the other, as for example above 'a'+'i' => 'c'.

For notational convenience I will designate each combination of guarantee levels as a 4-tuple (such as "-CID" or "aC-D"), where each position specifies the level of the atomicity, consistency, isolation, and durability axis (respectively) that is guaranteed, if any. For convenience, the '-'s could also be omitted.

This article focuses primarily on the first three axes, so in the following discussion the durability guarantee level will often be shown as - or omitted. Because the durability axis is independent of the others, this can be done without loss of generality.

Note that, if the operation guarantees that it will always succeed (A), then it is not very meaningful to make a guarantee about what happens if the operation fails without a guarantee about what happens if the operation succeeds (c, i, or d). So, for example, in every case where ACi is guaranteed, AC is equivalent and should be specified instead.

(RE)VIEWING EXISTING APPROACHES THROUGH THE LENS OF TRANSACTIONAL PROGRAMMING

Describing Existing Guarantees as ACID Guarantees

The first goal of this taxonomy was that "[i]t should be inclusive of existing work, covering existing exception safety analysis methods."

Hence Question #4:

4. Treat the analysis from Question #3 as a taxonomy and use it to describe the Abrahams and other prior guarantees. Discuss.

The ACID model does indeed provide a concise and explicit notation to describe the Abrahams guarantees, Reeves' Good, Bad, and Undefined states, and the destructible guarantee:

----

Reeves Undefined state guarantee

Reeves Bad state guarantee

Destructible guarantee

-C--

Abrahams basic guarantee

Reeves Good state guarantee

aCi-

Abrahams strong guarantee

AC--

Abrahams nothrow guarantee

Note that this analysis points out at least two weaknesses in prior models:

First, the prior models generally ignored the issue of isolation. Note that none of the above-listed prior guarantees included I. When using the prior models we frequently found that we had to add amplifications like, "it meets the strong guarantee except there could be side effects." Descriptions like that implicitly point to missing guarantees, which are now accounted for above.

Second, the above ACID analysis makes it clear why Mueller's and Reeves' "Bad" and "Undefined" states, and the sometimes-cited "destructible" guarantee, are all the same thing from a program correctness point of view.

I have always thought that the "Bad," "Undefined," and "destructible" ideas had problems. There's nothing magical about exiting a function via a C++ exception compared to returning an error value indicating the operation failed, so consider: Say that a programmer on your team wrote a function that could never throw and just returned a code to indicate success or failure, and that could leave an object in a state where it no longer met its invariants (or, even worse, couldn't even be destroyed). Would you accept the work, or would you send the programmer back to his desk to fix his bug? I think most of us would agree that the function just described is plainly buggy (and so is its specification if the specification says it can render objects invariant-breaking). So why should the answer be any different in the case of failure reported by throwing an exception? Clearly it shouldn't, and in fact none of the Bad, Undefined, or destructible "guarantees" really gives any kind of useful program correctness or safety guarantee, because all useful program correctness guarantees include at least the concept of preserving object and system invariants.

[Aside: Consider "Bad" and "destructible" a little further. If a function f() can leave any object t in a state wherein all you can ever do is call t's destructor, then f() is a sink in all but name, and might as well destroy t too.]

Filling Holes: ACID and GotW #59

The second goal of this taxonomy was that "[i]t should help to fill holes in existing methods, such as a situation where given sample code was known to support some exception-safety guarantee but that guarantee could not be adequately described using the existing models alone."

This brings us to Question #5:

5. Does the answer from Question #4 equip us to give a better answer to Question #1? Explain.

When we considered Question #1, I said that it seems like there ought to be a name for a concept as fundamental as "the strongest guarantee you can make (for assignment) without requiring assumptions about the used type." In the ACID model, it turns out that there is:

aC--

The guarantee provided by the technique shown in GotW #59, if T construction and destruction provide at least -C-- (which should be a given for correct programs).

Without making any assumptions about the used type(s), is it possible to make a stronger guarantee than aC for a generalized assignment operator? I believe it is possible to show fairly rigorously that the answer is No, as follows:

1. Consider a simplified version of the T::operator=() code.

// (Simplified) solution to Cargill's Widget Example
//
class Widget { /*...*/ struct Impl* pimpl_; };

class Widget::Impl { /*...*/ T1 t1_; T2 t2_; };

Widget& Widget::operator=( const Widget& other )
{
  Impl* temp = new Impl( *other.pimpl_ );
  // construction could fail

  delete pimpl_;
  pimpl_ = temp;
  return *this;
}

As before, the above code invokes the Widget::Impl constructor to create a new object from the existing one. If the construction fails, C++'s language rules automatically destroy any fully-constructed subobjects (such as t1_, if the t2_ construction fails). So we could end up getting any of the following function call sequences:

Case 1 (Failure): T1::T1() fails

Case 2 (Failure): T1::T1(), T2::T2() fails, T1::~T1()

Case 3 (Success): T1::T1(), T2::T2()

2. Consider the C++ language rules. The C++ language rules strictly define the meaning of construction and destruction (in both cases we assume the Consistency guarantee because, for example, any type's constructor does not yield a valid object of that type is just buggy):

C++ Constructor Guarantee: aC--
A constructor, if successful, must create a valid object of the indicated type; if it fails, it has not created anything.

C++ Destructor Guarantee: AC--
A destructor destroys a valid object of the indicated type, and so is an inverse of any constructor of the same type. Further, a destructor is guaranteed to succeed, at least in the sense that the object it operates on is considered fully-destroyed, or as-destroyed-as- it-can-ever-be because you can't re-run a failed destructor. (Further, the standard requires that any destructor used in the standard library may not throw; this includes the destructor of any type you can legally instantiate a container with.)

At any rate, the above solution assumes implicitly that at least T1::~T1() guarantees A---, else there is no way to write a correct Widget::operator=() anyway. For more on these issues, see Item 16 in Exceptional C++[6].

3. Analyze Atomicity (i.e., local effects). In each of the two failure cases, the only local effects are the possible construction or destruction of the t1_ and t2_ member objects, because these are the only objects being directly manipulated.

Case 1 (Failure): T1::T1() fails The only attempted operation promises at least 'a' atomicity, so no local effects occur and the entire assignment has no local effects.

Case 2 (Failure): T1::T1(), T2::T2() fails, T1::~T1() T2::T2() promises at least 'a' atomicity, so does not have any local effects. But T1::~T1() is guaranteed to succeed and undoes all local effects of T1::T1(), and so the entire assignment has no local effects.

Case 3 (Success): T1::T1(), T2::T2() Success, all local effects have been successfully produced.

Failure is possible, but all failure cases have no local effects. Therefore the assignment's best possible atomicity guarantee level is 'a' -- thanks to the guarantees inherent in the C++ language.

4. Analyze Consistency. Failure is possible, but in each case each suboperation guarantees the progression from one consistent state to another regardless of success or failure, so the entire assignment ends in a consistent state. Therefore the assignment's best possible consistency guarantee level is 'C' -- again thanks to the guarantees inherent in the C++ language.

5. Analyze Isolation. No guarantee is possible here, because no suboperation makes any guarantee about side effects. For example, in Case 1, T1::T1() may fail in a way that has already affected global state.

6. Analyze Durability. No guarantee is possible here either, because no suboperation makes any guarantee about durability. For example, in Case 1, T1::T1() may fail in a way that does not achieve persistence of the final state.

So aC is the strongest exception safety guarantee one can make for assignment without requiring any assumptions at all about the used type. That's a useful piece of information, especially in generic programming using language facilities like C++ templates where we cannot always assume knowledge of the guarantees provided by arbitrary and even unknowable types.

Better still, we can make an even stronger statement with a little knowledge about the types we're using:

aCI-

The guarantee provided by the technique shown in GotW #59, if T1 and T2 construction and destruction provide at least -CI-.

Interestingly, note that this guarantee (aCI-) promises more than the Abrahams strong guarantee (aCi-).

In short:

- With no knowledge of the used types T1 and T2, it is always possible to write an assignment operator for T that gives "better than the basic guarantee" -- specifically, such an assignment operator is aC-safe.

- With the additional knowledge that T1 and T2 construction and destruction are CI-safe, if is always possible to write an assignment operator for T that gives "better than the strong guarantee" -- specifically, such an assignment operator is aCI-safe.

Neither of these guarantees is concisely expressible using prior exception and program safety models.

TRANSACTIONAL PROGRAM CORRECTNESS ANALYSIS

The third goal of this taxonomy was that "[i]t should help us reason about and analyze the safety of code, making it less a craft and more a methodical engineering task."

Let's see how well we can do.

ACID Safety Analysis

It's important that the ACID program correctness taxonomy precisely describes the prior Abrahams and other guarantees, but perhaps the best thing about the ACID model is that it simplifies our analysis because the compound guarantees are no longer the important thing at all.

Consider first the following principle, which is perhaps the most important result of this article:

The Transactional Principle:
All programming operations should be considered as transactions, with associated ACID guarantees. This applies regardless of whether exceptions are possible or absent.

All programming operations are amenable to this new style of ACID transactional analysis. After years of differing with Dave Abrahams on this point, I can now in good conscience come around to his point of view that "exceptions are not magical" -- they are simply another failure case, albeit with complex programming semantics in C++ that come into play when you actually use them, but conceptually no different from an error return code. The reason I only now feel I can agree with Dave Abrahams is that, after writing about and advancing this topic for several years, I only now think I finally understand exception safety well enough to be able to reason about it confidently as simply another form of "early return."

By defining distinct and largely independent axes, the ACID model gives us a simpler and systematic -- even mechanical -- way to analyze an operation's correctness: Analyze it according to the individual safety guarantees, one axis at a time, not according to aggregated "compound guarantees." The way to analyze a function, then, is to examine each axis' guarantee level in turn:

1. If the axis has no guarantee, do nothing.

2. If the axis has a guarantee about a failure case, then for each possible EARLY return (each possible failure return, or each possible throw or interrupt point which means each non-A-safe function call we perform as a suboperation), ensure the guarantee is met if the early return is taken.

3. If the axis makes a guarantee about a success case, then for each possible NORMAL return, ensure the guarantee is met if the return is taken.

Let's consider examples to show why this can make reasoning about exception safety simpler, more systematic, more mechanical -- which is to say more like engineering and less like craft:

Example: Analyzing for aCi-Safety (the Strong Guarantee)

Say you are writing a function that must guarantee aCi-safety (the Abrahams strong guarantee). How do you analyze whether it meets the guarantee? By considering each axis independently as follows:

1. [a] For each possible EARLY return, ensure the immediate object is returned to the initial state if the early return is taken.

2. [C] For each NORMAL return, ensure the immediate object's state is consistent.

3. [i] For each possible EARLY return, ensure that side effects are undone if the early return is taken.

Note that we didn't need to consider the c-safe aspect because ai-safe => aci-safe since we require that the initial state of the system be consistent.

Another Example: Analyzing for aC-Safety

Now say you are writing a function that must guarantee aC-safety. How do you analyze whether it meets the guarantee? By considering each axis independently as follows:

1. [a] For each possible EARLY return, ensure the immediate object is returned to the initial state if the early return is taken.

2. [c] For each possible EARLY return, ensure that the system is in a consistent state if the early return is taken.

3. [C] For each NORMAL return, ensure the immediate object's state is consistent.

OTHER RESULTS AND CONCLUSIONS

The Importance of "Big-A" Atomicity

As pointed out by Abrahams and Colvin and others, to guarantee any kind of exception safety it is essential that certain operations -- notable destructors and deallocation functions -- be required not to throw. That is, they must be at least A-safe. We rely on this in all sorts of situations, notably:

- The "do all the work off to the side, and commit using only nonthrowing operations" method is not possible if there are no suitable nonthrowing (A-safe) operations.

- The related "create a temporary and Swap()" method, which is the standard form of exception-safe copy assignment, is not possible without a nonthrowing (A-safe) Swap().

An interesting point raised by Greg Colvin is that for this reason it does not appear to be possible to write exception-safe code in Java. The Java specification currently allows the JVM to throw an exception at any time at arbitrary points in the program, for example asynchronously when stop is called on a Thread object. In practice it isn't that bad, but it is worrisome that the Java specification can apparently be shown to preclude writing exception-safe code because there is no way to absolutely guarantee A-safe operations in Java.

Side Effects Last

It seems best to try to perform I-safe operations last, in order to put off side effects. That way, if a failure occurs earlier in the operation, no side effects will have been begun. This helps to limit the scope for side effects after failures.

Future Directions

An important area for further study is a rigorous treatment of analyzing compound operations. For example, as illustrated in one of the analysis examples above, the consistency guarantees operate like a chain; an operation is c-safe only if all suboperations are c-safe (any one that isn't becomes the weak link that breaks the chain), and an operation is C-safe only if all suboperations are C-safe. That's fine for consistency... now, is there anything we can conclude similarly about atomicity, isolation, durability, or combinations thereof in suboperations?

Summary

When I started this taxonomy, I had in mind applying database concepts to better analyze and describe C++ exception safety. In the process of doing so, however, I kept on coming back to the simple terms "successful operation" and "failed operation," and finally realized (strongly influenced, I am sure, by papers like Dave Abrahams' "Exception Safety in Generic Components"[1]) that the latter applied completely and equally to ALL forms of failure -- C++ exceptions, Java exceptions, C return codes, asynchronous interrupts, and any other imaginable failure mode and reporting mechanism. That made me realize that what I was developing might not really be just a taxonomy for analyzing C++ exception safety at all, but rather something much more general: A taxonomy for describing program correctness in all its aspects in all languages. Further use, critique, and development will show whether this approach is indeed that widely applicable or not.

Finally, consider two fundamental implications:

1. The Transactional Principle: All programming operations should be considered as transactions, with associated guarantees, and are amenable to this new style of ACID transactional analysis. This applies regardless of whether exceptions are possible or absent.

2. As pointed out by Dave Abrahams in his "Exception-Safety in Generic Components" paper[1] and other places, "exception safety" is a bit of a misnomer, and the ACID model bears this out by not distinguishing between different kinds of "early returns," whether they be through error codes, through exceptions, or through some other method. Exception-related "safety" is not some magical entity or unknowable phantom; "safety" is just another name for "correctness." When we want to talk about "exception safety," we should instead just talk about "program correctness" without arbitrarily highlighting one form of correctness. Unfortunately, this only underscores that languages that have exceptions but do not allow exception-safe programming (such as by not supporting the strongest no-failure atomicity guarantee, or A-safety) actually have a worse problem, namely insufficient support for general program correctness.

If this taxonomy and its related analysis methods are at all useful, they will be subject to refinement. I invite readers to analyze and expand upon these ideas.

Acknowledgments

Thanks to Dave Abrahams, Greg Colvin, and Scott Meyers for incisive reviews of this article, along with much other fun and productive discussion of C++ topics over the years.

 

Notes

1. D. Abrahams. "Exception-Safety in Generic Components," in Generic Programming: Proceedings of a Dagstuhl Seminar, M. Jazayeri, R. Loos, and D. Musser, eds. (Springer Verlag, 1999).

2. H. Mueller. "10 Rules for Handling Exception Handling Successfully" (C++ Report, 8(1), January 1996).

3. J. Reeves. "Coping with Exceptions" (C++ Report, 8(3), March 1996).

4. C.J. Date. An Introduction to Database Systems, 7th Edition (Addison-Wesley, 1999).

5. H. Sutter. "Namespaces and the Interface Principle" (C++ Report, 11(3), March 1999).

6. H. Sutter. Exceptional C++ (Addison-Wesley, 2000).

Copyright 2009 Herb Sutter