GotW #44

Home Blog Talks Books & Articles Training & Consulting

On the
blog
RSS feed November 4: Other Concurrency Sessions at PDC
November 3
: PDC'09: Tutorial & Panel
October 26: Hoare on Testing
October 23
: Deprecating export Considered for ISO C++0x

This is the original GotW problem and solution substantially as posted to Usenet. See the book More Exceptional C++ (Addison-Wesley, 2002) for the most current solution to this GotW issue. The solutions in the book have been revised and expanded since their initial appearance in GotW. The book versions also incorporate corrections, new material, and conformance to the final ANSI/ISO C++ standard.

Reference Counting - Part II
Difficulty: 6 / 10

In the second of this three-part miniseries, we examine the effect of references and iterators into a reference-counted string. Can you spot the issues?

Problem

Consider the reference-counted Optimized::String class from GotW #43, but with two new functions: Length() and operator[].

  namespace Optimized {
    struct StringBuf {
        StringBuf();              // start off empty
       ~StringBuf();              // delete the buffer
        void Reserve( size_t n ); // ensure len >= n

        char*    buf;             // allocated buffer
        size_t   len;             // length of buffer
        size_t   used;            // # chars actually used
        unsigned refs;            // reference count
    };

    class String {
    public:
        String();                 // start off empty
       ~String();                 // decrement reference count
                                  //  (delete buffer if refs==0)
        String( const String& );  // point at same buffer and
                                  //  increment reference count
        void   Append( char );    // append one character
        size_t Length() const;    // string length
        char&  operator[](size_t);// element access
    private:
        void AboutToModify( size_t n );
                                  // lazy copy, ensure len>=n
        StringBuf* data_;
    };
  }

This allows code like the following:

  if( s.Length() > 0 ) {
    cout << s[0];
    s[0] = 'a';
  }

Guru Questions

1. Implement the new members of Optimized::String.

2. Do any of the other members need to be changed because of the new member functions? Explain.

Solution

Consider the reference-counted Optimized::String class from GotW #43, but with two new functions: Length() and operator[].

The point of this GotW is to demonstrate why adding operator[] changes Optimized::String's semantics enough to impact other parts of the class. But, first things first:

1. Implement the new members of Optimized::String.

The Length() function is simple:

  namespace Optimized {

    size_t String::Length() const {
      return data_->used;
    }

There's more to operator[], however, than meets the casual eye. In the following, what I want you to notice is that what operator[] does (it returns a reference into the string) is really no different from what begin() and end() do for standard strings (they return iterators that "point into" the string). Any copy-on-write implementation of std::basic_string will run into the same considerations that we do below.

Writing operator[] for Shareable Strings

Here's a naive first attempt:

    //  BAD: Naive attempt #1 at operator[]
    //
    char& String::operator[]( size_t n ) {
      return *(data_->buf+n);
    }

This isn't good enough by a long shot. Consider:

    //  Example 1a: Why attempt #1 doesn't work
    //
    void f( Optimized::String& s ) {
      Optimized::String s2( s ); // take a copy of the string
      s[0] = 'x';                // oops: also modifies s2!
    }

You might be thinking that the poor programmer of Example 1a might be a little unhappy about this side effect. You would be right.

So, at the very least, we'd better make sure that the string isn't being shared, else the caller might inadvertently be modifying what look to him like two unrelated strings. "Aha," thinks the once-naive programmer, "I'll call AboutToModify() to ensure I'm using an unshared representation:"

    //  BAD: Inadequate attempt #2 at operator[]
    //
    char& String::operator[]( size_t n ) {
      AboutToModify( data_->len );
      return *(data_->buf+n);
    }

This looks good, but it's still not enough. The problem is that we only need to rearrange the Example 1a problem code slightly to get back into the same situation as before:

    //  Example 2a: Why attempt #2 doesn't work either
    //
    void f( Optimized::String& s ) {
      char& rc = s[0];  // take a reference to the first char
      Optimized::String s2( s ); // take a copy of the string
      rc = 'x';                  // oops: also modifies s2!
    }

You might be thinking that the poor programmer of Example 2a might be a little perturbed about this surprise, too. You would be right, but as of this writing certain popular implementations of basic_string have precisely this copy-on-write-related bug.

The problem is that the reference was taken while the original string was unshared, but then the string became shared and the single update through the reference modified both String objects' visible state.

Key Notion: An "Unshareable" String

When operator[] is called, besides taking an unshared copy of the StringBuf, what also need to mark the string "unshareable," just in case the user remembers the reference (and later tries to use it).

Now, marking the string "unshareable for all time" will work, but that's actually a little excessive: It turns out that all we really need to do is mark the string "unshareable for a while." To see what I mean, consider that it's already true that references returned by operator[] into the string must be considered invalid after the next mutating operation. That is, code like this:

    //  Example 3: Why references are invalidated
    //             by mutating operations
    //
    void f( Optimized::String& s ) {
      char& rc = s[0];
      s.Append( 'i' );
      rc = 'x';   // error: oops, buffer may have moved
    }             //        if s did a reallocation

should already be documented as invalid, whether or not the string uses copy-on-write. In short, calling a mutator clearly invalidates references into the string because you never know if the underlying memory may move (invisibly, from the point of view of the calling code).

Given that fact, in Example 2a, rc would be invalidated anyway by the next mutating operation on s. So, instead of marking s as "unshareable for all time" just because someone might have remembered a reference into it, we could just mark it "unshareable until after the next mutating operation," when any such remembered reference would be invalidated anyway. To the user, the documented behaviour is the same.

2. Do any of the other members need to be changed because of the new member functions? Explain.

As we can now see, the answer is: Yes.

First, we need to be able to remember whether a given string is currently unshareable (so that we won't use reference counting when copying it). We could throw in a bool flag, but to avoid even that overhead let's just encode the "unshareable" state directly in the refs count by agreeing that "refs == the biggest unsigned int there can possibly be" means "unshareable." We'll also add a flag to AboutToModify() that says whether to mark the string unshareable.

    //  GOOD: Correct attempt #3 at operator[]
    //
    //  Add a new static member for convenience, and
    //  change AboutToModify appropriately. Because now
    //  we'll need to clone a StringBuf in more than
    //  one function (see the String copy constructor,
    //  below), we'll also move that logic into a
    //  single function... it was time for StringBuf to
    //  have its own copy constructor, anyway.
    //
    size_t String::Unshareable = numeric_limits<unsigned>::max();

    StringBuf::StringBuf( const StringBuf& other, size_t n )
      : buf(0), len(0), used(0), refs(1)
    {
        Reserve( max( other.len, n ) );
        copy( other.buf, other.buf+other.used, buf );
        used = other.used;
    }

    void String::AboutToModify(
      size_t n,
      bool   bMarkUnshareable /* = false */
    ) {
      if( data_->refs > 1 && data_->refs != Unshareable ) {
        StringBuf* newdata = new StringBuf( *data_, n );
        --data_->refs;   // now all the real work is
        data_ = newdata; //  done, so take ownership
      } else {
        data_->Reserve( n );
      }
      data_->refs = bMarkUnshareable ? Unshareable : 1;
    }

    char& String::operator[]( size_t n ) {
      AboutToModify( data_->len, true );
      return *(data_->buf+n);
    }

Note that all of the other calls to AboutToModify() continue to work as originally written.

Now all we need to do is make String's copy constructor respect the unshareable state, if it's set:

    String::String( const String& other )
    {
      //  If possible, use copy-on-write.
      //  Otherwise, take a deep copy immediately.
      //
      if( other.data_->refs != Unshareable ) {
        data_ = other.data_;
        ++data_->refs;
      } else {
        data_ = new StringBuf( *other.data_ );
      }
    }

The destructor needs a small tweak:

    String::~String() {
      if( data_->refs == Unshareable || --data_->refs < 1 ) {
        delete data_;
      }
    }

All other functions (both of them!) work as originally written:

    String::String() : data_(new StringBuf) { }

    void String::Append( char c ) {
      AboutToModify( data_->used+1 );
      data_->buf[data_->used++] = c;
    }

  }

And that's all(!) there is to it. In the final GotW of this reference-counting miniseries, we'll consider how multithreading affects our reference-counted string. See GotW #45 for the juicy details.

Here's all the code together.

Note that I've also taken this opportunity to implement a slight change to StringBuf::Reserve(). It now rounds up the chosen "new buffer size" to the next multiple of four, in order to ensure that the size of the memory buffer size is always a multiple of four bytes.

This is in the name of efficiency: Many popular operating systems don't allocate memory in chunks smaller than this, anyway, and this code is faster than the original one particularly for small strings. (The original code would allocate a 1-byte buffer, then a 2-byte buffer, then a 3-byte buffer, then a 4-byte buffer, and then a 6-byte buffer before the exponential-growth strategy would really kick in. The code below goes straight to a 4-byte buffer, then an 8-byte buffer, and so on.)

  namespace Optimized {

    struct StringBuf {
        StringBuf();              // start off empty
       ~StringBuf();              // delete the buffer
        StringBuf( const StringBuf& other, size_t n = 0 );
                                  // initialize to copy of other,
                                  //  and ensure len >= n

        void Reserve( size_t n ); // ensure len >= n

        char*    buf;             // allocated buffer
        size_t   len;             // length of buffer
        size_t   used;            // # chars actually used
        unsigned refs;            // reference count
    };

    class String {
    public:
        String();                 // start off empty
       ~String();                 // decrement reference count
                                  //  (delete buffer if refs==0)
        String( const String& );  // point at same buffer and
                                  //  increment reference count
        void   Append( char );    // append one character
        size_t Length() const;    // string length
        char&  operator[](size_t);// element access
    private:
        void AboutToModify( size_t n, bool bUnshareable = false );
                                  // lazy copy, ensure len>=n
                                  //  and mark if unshareable
        static size_t Unshareable;// ref-count flag for "unshareable"
        StringBuf* data_;
    };


    StringBuf::StringBuf() : buf(0), len(0), used(0), refs(1) { }

    StringBuf::~StringBuf() { delete[] buf; }

    StringBuf::StringBuf( const StringBuf& other, size_t n )
      : buf(0), len(0), used(0), refs(1)
    {
        Reserve( max( other.len, n ) );
        copy( other.buf, other.buf+other.used, buf );
        used = other.used;
    }

    void StringBuf::Reserve( size_t n ) {
      if( len < n ) {
        //  Same growth code as in GotW #43, except now we round
        //  the new size up to the nearest multiple of 4 bytes.
        size_t needed = max<size_t>( len*1.5, n );
        size_t newlen = needed ? 4 * ((needed-1)/4 + 1) : 0;
        char*  newbuf = newlen ? new char[ newlen ] : 0;
        if( buf ) {
          copy( buf, buf+used, newbuf );
        }

        delete[] buf;   // now all the real work is
        buf = newbuf;   //  done, so take ownership
        len = newlen;
      }
    }


    size_t String::Unshareable = numeric_limits<unsigned>::max();

    String::String() : data_(new StringBuf) { }

    String::~String() {
      if( data_->refs == Unshareable || --data_->refs < 1 ) {
        delete data_;
      }
    }

    String::String( const String& other )
    {
      //  If possible, use copy-on-write.
      //  Otherwise, take a deep copy immediately.
      //
      if( other.data_->refs != Unshareable ) {
        data_ = other.data_;
        ++data_->refs;
      } else {
        data_ = new StringBuf( *other.data_ );
      }
    }

    void String::AboutToModify(
      size_t n,
      bool   bMarkUnshareable /* = false */
    ) {
      if( data_->refs > 1 && data_->refs != Unshareable ) {
        StringBuf* newdata = new StringBuf( *data_, n );
        --data_->refs;   // now all the real work is
        data_ = newdata; //  done, so take ownership
      } else {
        data_->Reserve( n );
      }
      data_->refs = bMarkUnshareable ? Unshareable : 1;
    }

    void String::Append( char c ) {
      AboutToModify( data_->used+1 );
      data_->buf[data_->used++] = c;
    }

    size_t String::Length() const {
      return data_->used;
    }

    char& String::operator[]( size_t n ) {
      AboutToModify( data_->len, true );
      return *(data_->buf+n);
    }

  }

Copyright © 2009 Herb Sutter