Reducing C++ Code Bloat
These are some (now rather old) notes I've made on things you can do to reduce code
bloat in C++ programs, geared especially towards:
- The gcc compiler (I've mostly been using 2.95.2)
- Embedded systems (in my case ARM, using the a.out object format)
- Reducing the size of the code - because FLASH is more expensive than
DRAM
Turn off exceptions
-fno-exceptions
Exceptions are a great way to handle errors, but sadly in my opinion
you would need to be very brave to use them in an embedded environment.
In theory, much of the extra code required for these to work appears in
a separate segment which is only paged in when an exception is thrown. That's
great for systems with a hard drive, but not much help if your program is
booting from FLASH and you care about FLASH size.
Note that if you do this, then you really want to use the nothrow version
of operator new. Otherwise you may find that when memory allocation fails, the
constructor will still be called:
Foo *f = new(nothrow) Foo;
Use vtable thunks
-fvtable-thunks
To quote from the gcc man page:
Use `thunks' to implement the virtual function
dispatch table
(`vtable'). The traditional (cfront-style) approach to
implementing vtables was to store a pointer to the function and two
offsets for adjusting the `this' pointer at the call site. Newer
implementations store a single pointer to a `thunk' function which
does any necessary adjustment and then calls the target function.
Like all options that change the ABI, all C++ code, *including
libgcc.a* must be built with the same setting of this option.
Their effect is to make virtual function calls smaller and faster. Chances
are if you are using gcc3 then you have this enabled anyway.
Avoid inline virtual destructors
If you have classes with virtual members then the compiler will output
the vtable and the rtti information whenever it sees the implementation
of the first non-pure virtual member function. If this function is inline
then the compiler will see it everywhere the header file in included and
you will get the vtable (etc) in every compilation unit.
Note:
- This only applies with gcc2.95 - with gcc3 the algorithm ignores
inline virtual member functions when determining when to output the vtable
and rtti information.
- If you're using ELF then I think that the linker will ensure
you actually only get one of them, but if you're using a.out or COFF then
you will certainly end up with duplication of the vtables, rtti functions
and rtti type_info nodes.
To avoid this, make the first member out-of-line. Since most classes
which are derived-from also need a virtual destructor you might as well
make it the destructor.
struct Base {
virtual ~Base() {} // bad! code bloat!
};
struct AnotherBase {
virtual ~AnotherBase(); // good!
};
Be careful with anonymous namespaces
Anonymous namespaces seem like a good idea. But under gcc2.95 they're
implemented by constructing a regular namespace with a name based on the
path to the file. Take the following small program:
#include <iostream>
#include <typeinfo>
namespace {
struct X {
virtual ~X() {}
};
X an_x;
}
int main()
{
X *x = &an_x;
cout << typeid(x).name() << "\n";
}
Try running it, after compiling it something like:
gcc /home/lgd/foo.cc -lstdc++
That string that's printed out is always present for every
class. If you have a lot of code inside anonymous namespaces, and a long
path, it can start to add up.
Be careful with dynamic_cast
Each use of the dynamic_cast operator seems to consume about 15
words. Nuff said.
Avoid virtual inheritance
If you use virtual inheritance:
class Base { ... };
class D1 : public virtual Base { ... };
class D2 : public virtual Base { ... };
class E : public D1, public D2 { ... };
then you will find that the ctors and dtors of class D1 and
D2 no longer call the ctors and dtor of class Base. Instead,
class E's ctors and dtors call them. This has the result that your
most derived class ctors and dtors get a lot bigger.
Avoid deep class hierarchies
If you have a deep class hierarchy (with virtual methods) then at each
level, the type_info function (the function that does dynamic casting for
you) gets bigger (by about 40 bytes as far as I can tell).
Try compiling the following piece of code:
struct W {
virtual ~W();
};
W::~W() {}
struct X : public W {
virtual ~X();
};
X::~X() {}
struct Y : public X {
virtual ~Y();
};
Y::~Y() {}
struct Z : public Y {
virtual ~Z();
};
Z::~Z() {}
Now run nm --size-sort --demangle -t d on the object file. See
how each type_info function is larger than the next.
Inline Functions With Static Data
Watch out for inline functions with static data in gcc2.
For example:
struct Bar {
static void *func() { return "hello" }
};
int main() {} // Bar not used anywhere!
If you compile and link this code with gcc2.95 (a.out or ELF) you end
up with the string "hello" in your image, even though it's not being used.
At least in a.out it appears in each and every translation unit! This losing
behaviour seems to have been fixed in gcc3.
Templates
- If used carefully, templates can reduce code size, because
replicated code ends up in common sections. You need linker support for this
though (it won't happen with a.out but will with ELF).
- Think about using template hoisting to get all the complicated stuff
out of the template, leaving the templates just doing type conversions. Also
see the discussion of template specialization in The C++PL.
Functions in sections
Add -ffunction-sections to the compiler options, and --gc-sections
to the linker options. This puts each function in a separate section (only
works for ELF) and then throws away any sections that are unused (you will
need to do something by hand to reference any bits of code you want, but
are not directly referenced, such as trap vectors).
This saves a good deal of space if you are in the habit of writing
code that is not actually used.
Using gcc3.4.x
gcc3.4 seems to produce code that is about 10% smaller overall. However, if
you compress (at least with gzip) it then (in my single experiment) ended up
larger. Also, at least for me, gcc3.4 seems to generate broken code (not sure
why). gcc3.4 also might be one way to make use of Thumb mode.
The RXVT compiler from ARM produces much smaller code. But you have to pay
for it.
Useful Shell Commands
Find the biggest functions:
nm --size-sort -t d --demangle file.o
Look for duplicated strings, often a sign of duplicated code (e.g.
due to failure to inline as expected):
strings file.o | sort | uniq -D
Links
Luke Diamand, luke at diamand
dot org.