|
Style Case Study #1: Index Tables |
3.6.1/2: portable code must define | |
7/7 footnote 78, and 7.1.5/2 footnote 80: implicit | |
Annex C (Compatibility), comment on 7.1.5/4: explicitly notes that " |
cout << "#################" << endl;
3. Always #include
the headers for the types whose definitions you need: The program uses cout
and endl
, but fails to #include <iostream>
. Why did this probably work on the original developer's system? Because C++ standard headers can #include
each other, but unlike C, C++ does not specify which standard headers #include
which other standard headers.
In this case, the program does #include <vector>
and <algorithm>
, and on the original system it probably just so happened that one of those headers happened to indirectly #include <iostream>
too. That may work on the library implementation used to develop the original code, and it happens to work on mine too, but it's not portable and not good style.
4. Follow the guidelines in my Migrating to Namespaces article[3] about using namespaces: Speaking of cout
and endl
, the program must also qualify them with std::
, or write "using std::cout; using std::endl;
". Unfortunately it's still common for authors to forget namespace scope qualifiers -- I hasten to point out that this author did correctly scope vector
and sort
.
b) Stylistic improvements that would improve code clarity, reusability, and maintainability.
Beyond the above mechanical errors, there were several things I personally would have done differently in the code example. First, a couple of comments about the helper struct:
template <class RAIter>
struct sort_idxtbl_pair
{
RAIter it;
int i;
bool operator<( const sort_idxtbl_pair& s )
{ return (*it) < (*(s.it)); }
void set( const RAIter& _it, int _i ) { it=_it; i=_i; }
sort_idxtbl_pair() {}
};
5. Be const-correct: In particular, sort_idxtbl_pair::operator<()
doesn't modify *this
, so it ought to be declared as a const
member function.
6. Remove redundant code: The program explicitly writes class sort_idxtbl_pair
's default constructor even though it's no different from the implicitly generated version. There doesn't seem to be much point to this. Also, as long as sort_idxbl_pair
is a struct
with public data, having a distinct set()
operation adds a little syntactic sugar but since it's only called in one place the minor extra complexity doesn't gain much.
Next, we come to the core function, sort_idxtbl()
:
template <class RAIter>
void sort_idxtbl( RAIter first, RAIter last, int* pidxtbl )
{
int iDst = last-first;
typedef std::vector< sort_idxtbl_pair<RAIter> > V;
V v( iDst );
int i=0;
RAIter it = first;
V::iterator vit = v.begin();
for( i=0; it<last; it++, vit++, i++ )
(*vit).set(it,i);
std::sort(v.begin(), v.end());
int *pi = pidxtbl;
vit = v.begin();
for( ; vit<v.end(); pi++, vit++ )
*pi = (*vit).i;
}
7. Choose meaningful and appropriate names: In this case, sort_idxtbl
is a misleading name because the function doesn't sort an index table... it creates one! On the other hand, the code gets good marks for using the template parameter name RAIter
to indicate a random access iterator, which is what's required in this version of the code and so naming the parameter to indicate this is a good reminder.
8. Be consistent: In the function above, sometimes variables are initialized (or set) in for-init-statements, and sometimes they aren't. This just makes things harder to read, at least for me. Your mileage may vary on this one.
9. Remove gratuitous complexity: This function adores gratuitous local variables! It contains three examples. First, the variable iDst
is initialized to last-first
, and then used only once; why not just write last-first
where it's used and get rid of clutter? Second, the vector
iterator vit
is created where a subscript was already available and could have been used just as well, and the code would have been clearer. Third, the local variable it
gets initialized to the value of a function parameter, after which the function parameter is never used; my personal preference in that case is just to use the function parameter (even if you change its value -- that's okay!) instead of introducing another name.
10. Reuse Part 1: Reuse more of the standard library. Now, the original program gets good marks for reusing std::sort()
-- that's good. But why handcraft that final loop to perform a copy when std::copy()
does the same thing? Why reinvent a special-purpose sort_idxtbl_pair
class when the only thing it has that std::pair
doesn't is a comparison function? Besides being easier, reuse also makes our own code more readable. Humble thyself and reuse!
11. Reuse Part 2: Kill two birds with one stone by making the implementation itself more reusable. Of the original implementation, nothing is directly reusable other than the function itself. The helper sort_idxtbl_pair
class is hardwired for its purpose and is not independently reusable.
12. Reuse Part 3: Improve the signature. The original signature
template <class RAIter>
void sort_idxtbl( RAIter first, RAIter last, int* pidxtbl )
takes a bald int*
pointer to the output area, which I generally try to avoid in favor of managed storage (say, a vector
). But at the end of the day the user ought to be able to call sort_idxtbl
and put the output into a plain array, or a vector
, or anything else. Well, the requirement "be able to put the output into any container" simply cries out for an iterator, doesn't it?
template< class RAIn, class Out >
void sort_idxtbl( RAIn first, RAIn last, Out result )
13. Reuse Part 4, or Prefer comparing iterators using !=
: When comparing iterators, always use !=
(which works for all kinds of iterators) instead of <
(which works only for random-access iterators), unless of course you really need to use <
and only intend to support random-access iterators. The original program uses <
to compare the iterators it's given to work on, which is fine for random access iterators, which was the program's initial intent -- to create indexes into vector
s and arrays, both of which support random-access iteration. But there's no reason we may not want to do exactly the same thing for other kinds of containers, like list
s and set
s, that don't support random-access iteration, and the only reason the original code won't work for such containers is that it uses <
instead of !=
to compare iterators.
As Scott Meyers puts it eloquently, "program in the future tense." He elaborates:
"Good software adapts well to change. It accommodates new features, it ports to new platforms, it adjusts to new demands, it handles new inputs. Software this flexible, this robust, and this reliable does not come about by accident. It is designed and implemented by programmers who conform to the constraints of today while keeping in mind the probable needs of tomorrow. This kind of software -- software that accepts change gracefully -- is written by people who program in the future tense."[4]
14. Prefer preincrement unless you really need the old value: Here, for the iterators, writing preincrement (++i
) should habitually be preferred over writing postincrement (i++
).[5] True, that probably doesn't make a material difference in the original code because vector<T>::iterator
does not have to be a cheap-to-copy T*
(although it almost always will be); but if we implement point 13 we may no longer be limited to just vector<T>::iterator
s, and most other iterators are of class type -- perhaps often still not too expensive to copy, but why introduce this possible inefficiency needlessly?
That covers most the important issues. There are other things we could critique but that I didn't consider important enough to call attention to here; for example, production code should have comments that document each class's and function's purpose and semantics, but that doesn't apply to code accompanying magazine articles where the explanation is typically written in better English and in greater detail than code comments have any right to expect.
I'm deliberately not critiquing the mainline for style (as opposed to the mechanical errors already noted above that would cause the mainline to fail to compile) because after all this particular mainline is only meant to be a demonstration harness to help readers of the magazine article see how the index table apparatus is meant to work, and it's the index table apparatus that's the intended focus.
Let's preserve the original code's basic interface choice instead of straying far afield and proposing alternate design choices.[6] Limiting our critique just to correcting the code for mechanical errors and basic style, then, consider the three alternative improved versions below. Each has its own benefits, drawbacks, and style preferences as explained in the accompanying comments. What all three versions have in common is that they are clearer, more understandable, and more portable code -- and that ought to count for something, in your company and in mine.
// An improved version of the code originally
// published in [1].
//
#include <vector>
#include <map>
#include <algorithm>// Solution 1 does some basic cleanup but still
// preserves the general structure of the
// original's approach. We're down to 17 lines
// (even if you count "public:" and "private:"
// as lines), where the original had 23.
//
namespace Solution1
{
template<class Iter>
class sort_idxtbl_pair
{
public:
void set( const Iter& it, int i ) { it_ = it; i_ = i; }bool operator<( const sort_idxtbl_pair& other ) const
{ return *it_ < *other.it_; }
operator int() const { return i_; }private:
Iter it_;
int i_;
};// This function is where most of the clarity
// savings came from; it has 5 lines, where
// the original had 13. After each code line,
// I'll show the corresponding original code
// for comparison. Prefer to write code that
// is clear and concise, not unnecessarily
// complex or obscure!
//
template<class IterIn, class IterOut>
void sort_idxtbl( IterIn first, IterIn last, IterOut out )
{
std::vector<sort_idxtbl_pair<IterIn> > v(last-first);
// int iDst = last-first;
// typedef std::vector< sort_idxtbl_pair<RAIter> > V;
// V v(iDst);
for( int i=0; i < last-first; ++i )
v[i].set( first+i, i );
// int i=0;
// RAIter it = first;
// V::iterator vit = v.begin();
// for (i=0; it<last; it++, vit++, i++)
// (*vit).set(it,i);
std::sort( v.begin(), v.end() );
// std::sort(v.begin(), v.end());
std::copy( v.begin(), v.end(), out );
// int *pi = pidxtbl;
// vit = v.begin();
// for (; vit<v.end(); pi++, vit++)
// *pi = (*vit).i;
}
}// Solution 2 uses a pair instead of reinventing
// a pair-like helper class. Now we're down to
// 13 lines, from the original 23. Of the 14
// lines, 9 are purpose-specific, and 5 are
// directly reusable in other contexts.
//
namespace Solution2
{
template<class T, class U>
struct ComparePair1stDeref
{
bool operator()( const std::pair<T,U>& a,
const std::pair<T,U>& b ) const
{ return *a.first < *b.first; }
};template<class IterIn, class IterOut>
void sort_idxtbl( IterIn first, IterIn last, IterOut out )
{
std::vector< std::pair<IterIn,int> > s( last-first );
for( int i=0; i < s.size(); ++i )
s[i] = std::make_pair( first+i, i );
std::sort( s.begin(), s.end(),
ComparePair1stDeref<IterIn,int>() );
for( int i=0; i < s.size(); ++i, ++out )
*out = s[i].second;
}
}// Solution 3 just shows a couple of alternative
// details -- it uses a map to avoid a separate
// sorting step, and it uses std::transform()
// instead of a handcrafted loop. Here we still
// have 15 lines, but more are reusable. This
// version uses more space overhead and probably
// more time overhead too, so I prefer Solution 2,
// but this is an example of finding alternative
// approaches to a problem.
//
namespace Solution3
{
template<class T>
struct CompareDeref
{
bool operator()( const T& a, const T& b ) const
{ return *a < *b; }
};
template<class T, class U>
struct Pair2nd
{
const U& operator()( const std::pair<T,U>& a ) const
{ return a.second; }
};template<class IterIn, class IterOut>
void sort_idxtbl( IterIn first, IterIn last, IterOut out )
{
std::multimap<IterIn, int, CompareDeref<IterIn> > v;
for( int i=0; first != last; ++i, ++first )
v.insert( std::make_pair( first, i ) );
std::transform( v.begin(), v.end(), out,
Pair2nd<IterIn const,int>() );
}
}// I left the test harness essentially unchanged,
// except to demonstrate putting the output in an// output iterator (instead of necessarily an
// int*) and using the source array directly as a
// container.
//
#include <iostream>int main()
{
int ai[10] = { 15,12,13,14,18,11,10,17,16,19 };std::cout << "#################" << std::endl;
std::vector<int> aidxtbl( 10 );
// use another namespace name to test a different solution
Solution3::sort_idxtbl( ai, ai+10, aidxtbl.begin() );
for( int i=0; i<10; ++i )
std::cout << "i=" << i
<< ", aidxtbl[i]=" << aidxtbl[i]
<< ", ai[aidxtbl[i]]=" << ai[aidxtbl[i]]
<< std::endl;
std::cout << "#################" << std::endl;
}
1. C. Hicks. "Creating an Index Table in STL" (C/C++ Users Journal, 18(8), August 2000).
2. Available at http://www.gotw.ca/gotw/072.htm.
3. H. Sutter. "Migrating to Namespaces" (Dr. Dobb's Journal, 25(10), October 2000).
4. S. Meyers. More Effective C++, Item 32 (Addison-Wesley, 1996).
5. H. Sutter. Exceptional C++ (Addison-Wesley, 2000).
6. The original author also reports separate feedback from Steve Simpson demonstrating another elegant, but substantially different, approach: Simpson creates a containerlike object that wraps the original container, including its iterators, and allows iteration using the alternative ordering.
Copyright © 2009 Herb Sutter |