GotW #28: The Fast Pimpl Idiom

GotW #28

On the
blog

November 4: Other Concurrency Sessions at PDC
November 3: PDC'09: Tutorial & Panel

October 26: Hoare on Testing
October 23: Deprecating export Considered for ISO C++0x

This is the original GotW problem and solution substantially as posted to Usenet. See the book Exceptional C++ (Addison-Wesley, 2000) for the most current solutions to GotW issues #1-30. The solutions in the book have been revised and expanded since their initial appearance in GotW. The book versions also incorporate corrections, new material, and conformance to the final ANSI/ISO C++ standard.

The Fast Pimpl Idiom
Difficulty: 6 / 10

It's sometimes tempting to cut corners in the name of "reducing dependencies" or in the name of "efficiency," but it may not always be a good idea. Here's an excellent idiom to accomplish both objectives simultaneously and safely.

Problem

Standard malloc() and new() calls are very expensive. In the code below, the programmer originally has a data member of type X in class Y:

    // file y.h
    #include "x.h"

    class Y {
        /*...*/
        X x_;
    };

    // file y.cpp
    Y::Y() {}

This declaration of class Y requires the declaration of class X to be visible (from x.h). To avoid this, the programmer first tries to write:

    // file y.h
    class X;

    class Y {
        /*...*/
        X* px_;
    };

    // file y.cpp
    #include "x.h"

    Y::Y() : px_( new X ) {}
    Y::~Y() { delete px_; px_ = 0; }

This nicely hides X, but it turns out that Y is used very widely and the dynamic allocation overhead is degrading performance. Finally, our fearless programmer hits on the "perfect" solution that requires neither including x.h in y.h nor the inefficiency of dynamic allocation (and not even a forward declaration!):

    // file y.h
    class Y {
        /*...*/
        static const size_t sizeofx = /*some value*/;
        char x_[sizeofx];
    };

    // file y.cpp
    #include "x.h"

    Y::Y() {
        assert( sizeofx >= sizeof(X) );
        new (&x_[0]) X;
    }

    Y::~Y() {
        (reinterpret_cast<X*>(&x_[0]))->~X();
    }

Discuss.

Solution

The Short Answer

Don't do this. Bottom line, C++ doesn't support opaque types directly, and this is a brittle attempt (some people would even say "hack"^[1]) to work around that limitation.

What the programmer almost certainly wants is something else, namely the "Fast Pimpl" idiom, which I'll present right after the "Why Attempt #3 is Deplorable" section below.

Why Attempt #3 is Deplorable

First, let's consider a few reasons why Attempt #3 above truly is deplorable form:

1. Alignment. Unlike dynamic memory acquired by ::operator new(), the x_ character buffer isn't guaranteed to be properly aligned for objects of type X. To make this work more reliably, the programmer could use a "max_align" union, which is usually implemented as something like:

    union max_align {
      short       dummy0;
      long        dummy1;
      double      dummy2;
      long double dummy3;
      void*       dummy4;
      /*...and pointers to functions, pointers to
           member functions, pointers to member data,
           pointers to classes, eye of newt, ...*/
    };

It would be used like this:

    union {
      max_align m;
      char x_[sizeofx];
    };

This isn't guaranteed to be fully portable, but in practice it's close enough because there are few or no systems on which this won't work as expected.

I know that some experts will accept and even recommend this technique. In good conscience I still have to call it a hack and strongly discourage it.

2. Brittleness. The author of Y has to be inordinately careful with otherwise-ordinary Y functions. For example, Y must not use the default assignment operator, but must either suppress assignment or supply its own.

Writing a safe Y::operator=() isn't hard, but I'll leave it as an exercise for the reader. Remember to account for exception safety in that and in Y::~Y(). Once you're finished, I think you'll agree that this is a lot more trouble than it's worth.

3. Maintenance Cost. When sizeof(X) grows beyond sizeofx, the programmer must bump up sizeofx. This can be an unattractive maintenance burden. Choosing a larger value for sizeofx mitigates this, but at the expense of trading off efficiency (see #4).

4. Inefficiency. Whenever sizeofx > sizeof(X), space is being wasted. This can be minimized, but at the expense of maintenance effort (see #3).

5. Just Plain Wrongheadedness. I include this one last, but not least: In short, it's obvious that the programmer is trying to do "something unusual." Frankly, in my experience, "unusual" is just about always a synonym for "hack." Whenever you see this kind of subversion -- whether it's allocating objects inside character arrays like this programmer is doing, or implementing assignment using explicit destruction and placement new as discussed in GotW #23 -- you should Just Say No.

I really mean that. Make it a reflex.

A Better Solution: The "Fast Pimpl"

The motivation for hiding X is to avoid forcing clients of Y to know about (and therefore depend upon) X. The usual C++ workaround for eliminating this kind of implementation dependency is to use the pimpl idiom,^[2] which is what our fearless programmer first tried to do.

The only issue in this case was the pimpl method's performance because of allocating space for X objects in the free store. The usual way to address allocation performance for a specific class is to simply overload operator new for that specific class, because fixed-size allocators can be made much more efficient than general-purpose allocators.

Unfortunately, this assumes that the author of Y is also the author of X. That's not likely to be true in general. The real solution is to use an Efficient Pimpl, which is a true pimpl class with its own class-specific operator new:

    // file y.h
    class YImpl;

    class Y {
      /*...*/
      YImpl* pimpl_;
    };

    // file y.cpp
    #include "x.h"

    struct YImpl {  // yes, 'struct' is allowed :-)
      /*...private stuff here...*/

      void* operator new( size_t )   { /*...*/ }
      void  operator delete( void* ) { /*...*/ }
    };

    Y::Y() : pimpl_( new YImpl ) {}
    Y::~Y() { delete pimpl_; pimpl_ = 0; }

"Aha!" you say. "We've found the holy grail -- the Fast Pimpl!" you say. Well, yes, but hold on a minute and think about how this will work and what it will cost you.

Your favourite advanced C++ or general-purpose programming textbook has the details about how to write efficient fixed-size [de]allocation functions, so I won't cover that again here. I will talk about usability: One technique is to put them in a generic fixed-size allocator template, perhaps something like:

    template<size_t S>
    class FixedAllocator {
    public:
      void* Allocate( /*requested size is always S*/ );
      void  Deallocate( void* );
    private:
      /*...implemented using statics?...*/
    };

Because the private details are likely to use statics, however, there could be problems if Deallocate() is ever called from a static object's dtor. Probably safer is a singleton that manages a separate free list for each request size (or, as an efficiency tradeoff, a separate free list for each request size "bucket"; e.g., one list for blocks of size 0-8, another for blocks of size 9-16, etc.):

    class FixedAllocator {
    public:
      static FixedAllocator* Instance();

      void* Allocate( size_t );
      void  Deallocate( void* );
    private:
        /*...singleton implementation, typically
             with easier-to-manage statics than
             the templated alternative above...*/
    };

Let's throw in a helper base class to encapsulate the calls:

    struct FastPimpl {
      void* operator new( size_t s ) {
        return FixedAllocator::Instance()->Allocate(s);
      }
      void operator delete( void* p ) {
        FixedAllocator::Instance()->Deallocate(p);
      }
    };

Now, you can easily write as many Fast Pimpls as you like:

    //  Want this one to be a Fast Pimpl?
    //  Easy, then just inherit...
    struct YImpl : FastPimpl {
      /*...private stuff here...*/
    };

But Beware!

This is nice and all, but don't just use the Fast Pimpl willy-nilly. You're getting extra allocation speed, but as usual you should never forget the cost: Managing separate free lists for objects of specific sizes usually means incurring a space efficiency penalty because any free space is fragmented (more than usual) across several lists.

As with any other optimization, use this one only after profiling and experience prove that the extra performance boost is really needed in your situation.

Notes

1. I'm one of them. :-)

2. See GotW #24.

The Fast Pimpl IdiomDifficulty: 6 / 10