GotW #33

Home Blog Talks Books & Articles Training & Consulting

On the
blog
RSS feed November 4: Other Concurrency Sessions at PDC
November 3
: PDC'09: Tutorial & Panel
October 26: Hoare on Testing
October 23
: Deprecating export Considered for ISO C++0x

This is the original GotW problem and solution substantially as posted to Usenet. See the book More Exceptional C++ (Addison-Wesley, 2002) for the most current solution to this GotW issue. The solutions in the book have been revised and expanded since their initial appearance in GotW. The book versions also incorporate corrections, new material, and conformance to the final ANSI/ISO C++ standard.

Inline
Difficulty: 4 / 10

Contrary to popular opinion, the keyword inline is not some sort of magic bullet. It is, however, a useful tool when employed properly. The question is, When should you use it?

Problem

1. Does inlining a function increase efficiency?

2. When and how should you decide to inline a function?

Solution

Write What You Know, and Know What You Write

1. Does inlining a function increase efficiency?

Not necessarily.

First off, if you tried to answer this question without first asking what you want to optimize, you fell into a classic trap. The first question has to be: "What do you mean by efficiency?" Does the above question mean program size? memory footprint? execution time? development speed? build time? or something else?

Second, contrary to popular opinion, inlining can improve OR worsen any of these:

a) Program Size. Many programmers assume that inlining increases program size, because instead of having one copy of a function's code the compiler creates a copy in every place that function is used. This is often true, but not always: If the function size is smaller than the code the compiler has to generate to perform the function call, then inlining will reduce program size.

b) Memory Footprint. Inlining usually has little or no effect on a program's memory usage, apart from the basic program size (above).

c) Execution Time. Many programmers assume that inlining a function will improve execution time, because it avoids the function call overhead, and because "seeing through the veil" of the function call gives the compiler's optimizer more opportunities to work its craft. This can be true, but often isn't: If the function is not called extremely frequently, there will usually be no visible improvement in overall program execution time. In fact, just the opposite can happen: If inlining increases a calling function's size it will reduce that caller's locality of reference, which means that overall program speed can actually worsen if the caller's inner loop no longer fits in the processor's cache.

To put this point in perspective, don't forget that most programs are not CPU-bound. Probably the most common bottleneck is to be I/O-bound, which can include anything from network bandwidth or latency to file or database access.

d) Development Speed, Build Time. To be most useful, inlined code has to be visible to the caller, which means that the caller has to depend on the internals of the inlined code. Depending on another module's internal implementation details increases the practical coupling of modules (it does not, however, increase their theoretical coupling, because the caller doesn't actually use any of the callee's internals). Usually when functions change, callers do not need to be recompiled (only relinked, and often not even that). When inlined functions change, callers are forced to recompile.

Finally, if you're looking to improve efficiency in some way, always look to your algorithms and data structures first... they will give you the order-of-magnitude overall improvements, whereas process optimizations like inlining generally (note, "generally") yield less dramatic results.

Just Say 'No For Now'

2. When and how should you decide to inline a function?

Just like any other optimization: After a profiler tells you to, and not a minute sooner. The only time you'd inline right away is when it's an empty function that's likely to stay empty, or you're absolutely forced to, i.e., when writing a non-exported template.

Bottom line, inlining always costs something, if only increased coupling, and you should never pay for something until you know you're going to turn a profit -- that is, get something better in return.

"But I can always tell where the bottlenecks are," you may think? Don't worry, you're not alone, most programmers think this at one time or another, but you're still wrong. Dead wrong. Programmers are notoriously poor guessers about where their code's true bottlenecks lie.

Usually only experimental evidence (a.k.a. profiling output) helps to tell you where the true hot spots are. Nine times out of ten, a programmer cannot identify the number-one hot-spot bottleneck in his code without some sort of profiling tool. After more than a decade in this business, I have yet to see a consistent exception in any programmer I've ever worked with or heard about... even though everyone and their kid brother may claim until they're blue in the face that this doesn't apply to them. :-)

[Note another practical reason for this: Profilers aren't as good at identifying which inlined functions should NOT be inlined.]

What About Computation-Intensive Tasks (e.g., Numeric Libraries)?

Some people write small, tight library code, such as advanced scientific and engineering numerical libraries, and can sometimes do reasonably well with seat-of-the-pants inlining. Even those developers, however, tend to inline judiciously and tend to tune later rather than earlier. Note that writing a module and then comparing performance with "inlining on" and "inlining off" is generally an unsound idea, because "all on" or "all off" is a coarse measure that tells you only about the average case... it doesn't tell you WHICH functions benefited (and how much each one did). Even in these cases, usually you're better off to use a profiler, and optimize based on its advice.

What About Accessors?

There are people who will argue that one-line accessor functions (like "X& Y::f() { return myX_; }") are a reasonable exception, and could/should be automatically inlined. I understand the reasoning, but be careful: At the very least, all inlined code increases coupling, so unless you're certain in advance that inlining will help there's no harm deferring that decision to profiling time. Later, when the code is stable, a profiler might point out that inlining will help, and at that point: a) you know that what you're doing is worth doing; and b) you'll have avoided all the coupling and possible build overhead until the end of the project development cycle. Not a bad deal, that.

In Summary

From the GotW coding standards:

- avoid inlining or detailed tuning until performance profiles prove the need (Cline95: 139-140, 348-351; Meyers92: 107-110; Murray93: 234-235; 242-244)

- corollary: in general, avoid inlining (Lakos96: 631-632; Murray93: 242-244)

[Note that this deliberately says "avoid" (not "never") and that these same coding standards do encourage prior inlining in one or two very restricted situations.]

Cline95: Marshall Cline and Greg Lomow. "C++ FAQs" Addison-Wesley, 1995

Lakos96: John Lakos. "Large-Scale C++ Software Design" Addison-Wesley, 1996

Meyers92: Scott Meyers. "Effective C++" Addison-Wesley, 1992

Murray93: Robert Murray. "C++ Strategies and Tactics" Addison-Wesley, 1993

 

Copyright © 2009 Herb Sutter