GotW #75

Home Blog Talks Books & Articles Training & Consulting

On the
blog
RSS feed November 4: Other Concurrency Sessions at PDC
November 3
: PDC'09: Tutorial & Panel
October 26: Hoare on Testing
October 23
: Deprecating export Considered for ISO C++0x

This is the original GotW problem and solution substantially as posted to Usenet. See the book Exceptional C++ Style (Addison-Wesley, 2004) for the most current solution to this GotW issue. The solutions in the book have been revised and expanded since their initial appearance in GotW. The book versions also incorporate corrections, new material, and conformance to the final ANSI/ISO C++ standard (1998) and its Technical Corrigendum (2003).

Istream Initialization? 
Difficulty: 3 / 10

Most people know the famous quote: "What if they gave a war and no one came?" This time, we consider the question: "What if we initialized an object and nothing happened?" As Scarlett might say in such a situation: "This isn't right, I do declare!"

Problem

Assume that the relevant standard headers are included and that the appropriate using-declarations are made.

JG Question

1. What does the following code do?

deque<string> coll1;

copy( istream_iterator<string>( cin ),
      istream_iterator<string>(),
      back_inserter( coll1 ) );

Guru Questions

2. What does the following code do?

deque<string> coll2( coll1.begin(),
                     coll1.end() );

deque<string> coll3( istream_iterator<string>( cin ),
                     istream_iterator<string>() );

3. What must be changed to make the code do what the programmer probably expected?

Solution

1. What does the following code do?

// Example 1(a)
//
deque<string> coll1;

copy( istream_iterator<string>( cin ),
      istream_iterator<string>(),
      back_inserter( coll1 ) );

This code declares a deque of strings called coll1 that is initially empty. It then copies every whitespace-delimited string in the standard input stream (cin) into the deque using deque::push_back(), until there is no more input available.

The above code is equivalent to:

// Example 1(b): Equivalent to 1(a)
//
deque<string> coll1;

istream_iterator<string> first( cin ), last;
copy( first, last, back_inserter( coll1 ) );

The only difference is that in Example 1(a) the istream_iterator objects are created on the fly as unnamed temporary objects, and so they are destroyed at the end of the copy() statement. In Example 1(b), the istream_iterator objects are named variables, and survive the copy() statement; they won't be destroyed until the end of whatever scope surrounds the above code.

2. What does the following code do?

// Example 2(a): Declaring another deque
//
deque<string> coll2( coll1.begin(),
                     coll1.end() );

This code declares a second deque of strings called coll2, and initializes it using the deque constructor that takes a pair of iterators corresponding to a range from which the contents should be copied. In this case, we're initializing coll2 from an iterator range
that happens to correspond to "everything that's in coll1."

The code so far in Example 2(a) is nearly equivalent to:

// Example 2(b): Almost the same as Example 2(a)
//

// extra step: call default constructor
deque<string> coll2;

// append elements using push_back()
copy( coll1.begin(), coll1.end(), 
      back_inserter( coll2 ) );

The (minor) difference is that coll2's default constructor is called first and then the elements are pushed into the collection as a separate step, using push_back(). The original code simply did it all using the constructor that takes an iterator pair, which probably (though not necessarily) does exactly the same thing under the covers.

You might wonder why I've belabored this syntax. The reason will become clear as we take a look at the last part of the code, which is unfortunately much more benign than some might think:

// Example 2(c): Declaring yet another deque?
//
deque<string> coll3( istream_iterator<string>( cin ),
                     istream_iterator<string>() );

The above code looks at first blush like it's trying to do the same thing as Example 1(a), namely create a deque of strings populated from the standard input, except that it's trying to do it using the syntax of Example 2(a), namely using the iterator range constructor. This has one potential problem, and one actual problem: The potential problem is that cin is exhausted, so there's no input left to read as was probably intended, which may be a logical problem.

The big problem, though, is that the code doesn't actually do anything at all. Why not? Because it doesn't actually declare a deque<string> object named coll3. What it actually declares is (take a deep breath here):

        a function named coll3
            that returns a deque<string> by value
            and takes two parameters:
                an istream_iterator<string> with a formal parameter name of cin,
                and a function with no formal parameter name
                    that returns an istream_iterator<string>
                    and takes no parameters.

(Say that three times fast.)

What's going on here? Basically, we're running across one of the painful rules that C++  inherited from C, to maintain C compatibility: If a piece of code can be interpreted as a declaration, it will be. In the words of the C++ standard, clause 6.8:

There is an ambiguity in the grammar involving expression-statements and declarations: An expression-statement with a function-style explicit type conversion (_expr.type.conv_) as its leftmost subexpression can be indistinguishable from a declaration where the first declarator starts with a (. In those cases the statement is a declaration.

Without going into the gory details, the reason why this is the way that it is comes down to helping compilers deal with C's horrid declaration syntax, which can be ambiguous -- and so to make things manageable the compiler resolves such ambiguities by universally assuming that "if in doubt, it must be a function declaration." 'Nuff said.

If you haven't already, take a quick look at GotW #1 (Exceptional C++ Item 42)[1], which contains a similar but simpler example. Let's dissect the declaration step by step to see what's going on:

// Example 2(d): Identical to Example 2(c), removing
// redundant parentheses and adding a typedef
//
typedef istream_iterator<string> (Func)();

deque<string> coll3( istream_iterator<string> cin, 
                     Func );

Does that look more like a function declaration? Maybe so, maybe not, so let's take another step and remove the formal parameter name "cin" which is ignored anyway, and change the name "coll3" to something that we usually expect to see as a function name:

// Example 2(e): Still identical to Example 2(c)
//
typedef istream_iterator<string> (Func)();

deque<string> f( istream_iterator<string>, Func );

Now it's pretty clear: The above "could be" a function declaration, and so according to the C and C++ syntax rules, it is one. What makes it confusing is that it looks a lot like constructor syntax; what makes it downright obscure is that the formal parameter name, cin, happens to resemble the name of a variable that is indeed in scope and is even defined by the standard -- because that's what it was in fact intended to be -- but, misleading as it is, that doesn't matter, for the formal parameter name and std::cin have nothing in common other than the accident that they happen to be spelled the same way.

People still run across this problem from time to time in real-world coding, and that's the reason why this problem deserves its own article. Because the code is (probably surprisingly) just a function declaration, it doesn't actually do anything -- no code gets generated by the compiler, no actions are performed, no deque constructors are called, no objects are created.

It wouldn't be fair to throw up an example like this, however, without also showing how you can fix it. This brings us to the final question:

3. What must be changed to make the code do what the programmer probably expected?

All we need is something that makes it impossible for the compiler to treat the code as a function declaration. There are two easy ways to do it. Here's the seductive way:

// Example 3(a): Disambiguate the syntax,
// say by adding extra parens
// (okay solution, score 7/10)
//
deque<string> coll3( (istream_iterator<string>(cin)),
                     istream_iterator<string>() );

Here just adding the extra parentheses around the parameters makes it clear to the compiler that what we intend to be constructor parameter names can't be parameter declarations. This is because although "istream_iterator<string>(cin)" can be a variable (or parameter declaration, as noted above, "(istream_iterator<string>(cin))" can't -- the code in Example 3(a) can't be a function declaration for the same reason that "void f( (int i) )" can't be, namely because of the extra parentheses which are illegal around a whole parameter declaration.

There are other ways to try to disambiguate this by forcing the statement out of the declaration syntax, but I won't present them for a simple reason: They only work if both you and your compiler understand this corner case of the standard very well.

This declaration-vs.-constructor syntax ambiguity is by its nature such a thorny edge case that the best thing to do is just avoid the ambiguity altogether, and not rely on methods that essentially amount to coaxing and wheedling a surly three-year-old compiler into treating it as a declaration. Put another way, if you were talking to someone, would you purposely say something ambiguous and then change it slightly by adding, "well, what I really meant was..."? Hardly.

It's far better to avoid the ambiguity in the first place. I prefer and recommend the following alternative because it's much easier to get right, it's utterly understandable to even the weakest compilers, and it makes the code clearer to read to boot:

// Example 3(b): Use named variables
// (recommended solution, score 10/10)
//
istream_iterator<string> first( cin ), last;
deque<string> coll3( first, last );

Actually, in both Example 3(a) and 3(b), making the suggested change to just one of the parameters would have been sufficient, but for consistency I've treated both parameters the same way.

Guideline: Prefer using named variables as constructor parameters. This avoids possible declaration ambiguities.

Notes

1. H. Sutter. Exceptional C++ (Addison-Wesley, 2000).

Copyright © 2009 Herb Sutter