Reducing C++ Code Bloat


These are some (now rather old) notes I've made on things you can do to reduce code bloat in C++ programs, geared especially towards:

Turn off exceptions

    -fno-exceptions
  
Exceptions are a great way to handle errors, but sadly in my opinion you would need to be very brave to use them in an embedded environment. In theory, much of the extra code required for these to work appears in a separate segment which is only paged in when an exception is thrown. That's great for systems with a hard drive, but not much help if your program is booting from FLASH and you care about FLASH size.

Note that if you do this, then you really want to use the nothrow version of operator new. Otherwise you may find that when memory allocation fails, the constructor will still be called:

    Foo *f = new(nothrow) Foo;

Use vtable thunks

     -fvtable-thunks

To quote from the gcc man page:

 

Use `thunks' to implement the virtual function dispatch table
(`vtable').  The traditional (cfront-style) approach to
implementing vtables was to store a pointer to the function and two
offsets for adjusting the `this' pointer at the call site.  Newer
implementations store a single pointer to a `thunk' function which
does any necessary adjustment and then calls the target function.

Like all options that change the ABI, all C++ code, *including
libgcc.a* must be built with the same setting of this option.

Their effect is to make virtual function calls smaller and faster. Chances are if you are using gcc3 then you have this enabled anyway.

Avoid inline virtual destructors

If you have classes with virtual members then the compiler will output the vtable and the rtti information whenever it sees the implementation of the first non-pure virtual member function. If this function is inline then the compiler will see it everywhere the header file in included and you will get the vtable (etc) in every compilation unit.

Note:
To avoid this, make the first member out-of-line. Since most classes which are derived-from also need a virtual destructor you might as well make it the destructor.

struct Base {
virtual ~Base() {} // bad! code bloat!
}; struct AnotherBase { virtual ~AnotherBase(); // good! };


Be careful with anonymous namespaces

Anonymous namespaces seem like a good idea. But under gcc2.95 they're implemented by constructing a regular namespace with a name based on the path to the file. Take the following small program:
#include <iostream>
#include <typeinfo>

namespace {
struct X {
virtual ~X() {}
};

X an_x;
}

int main()
{
X *x = &an_x;
cout << typeid(x).name() << "\n";
}
Try running it, after compiling it something like:
 gcc /home/lgd/foo.cc -lstdc++

That string that's printed out is always present for every class. If you have a lot of code inside anonymous namespaces, and a long path, it can start to add up.

Be careful with dynamic_cast

Each use of the dynamic_cast operator seems to consume about 15 words. Nuff said.

Avoid virtual inheritance

If you use virtual inheritance:

class Base { ... };
class D1 : public virtual Base { ... };
class D2 : public virtual Base { ... };
class E : public D1, public D2 { ... };

then you will find that the ctors and dtors of class D1 and D2 no longer call the ctors and dtor of class Base. Instead, class E's ctors and dtors call them. This has the result that your most derived class ctors and dtors get a lot bigger.

Avoid deep class hierarchies

If you have a deep class hierarchy (with virtual methods) then at each level, the type_info function (the function that does dynamic casting for you) gets bigger (by about 40 bytes as far as I can tell).

Try compiling the following piece of code:

struct W {
virtual ~W();
};
W::~W() {}

struct X : public W {
virtual ~X();
};
X::~X() {}

struct Y : public X {
virtual ~Y();
};
Y::~Y() {}

struct Z : public Y {
virtual ~Z();
};
Z::~Z() {}
Now run nm --size-sort --demangle -t d on the object file. See how each type_info function is larger than the next.

Inline Functions With Static Data

Watch out for inline functions with static data in gcc2.

For example:
struct Bar {
static void *func() { return "hello" }
};

int main() {} // Bar not used anywhere!

If you compile and link this code with gcc2.95 (a.out or ELF) you end up with the string "hello" in your image, even though it's not being used. At least in a.out it appears in each and every translation unit! This losing behaviour seems to have been fixed in gcc3.

Templates

Functions in sections

Add -ffunction-sections to the compiler options, and --gc-sections to the linker options. This puts each function in a separate section (only works for ELF) and then throws away any sections that are unused (you will need to do something by hand to reference any bits of code you want, but are not directly referenced, such as trap vectors).

This saves a good deal of space if you are in the habit of writing code that is not actually used.

Using gcc3.4.x

gcc3.4 seems to produce code that is about 10% smaller overall. However, if you compress (at least with gzip) it then (in my single experiment) ended up larger. Also, at least for me, gcc3.4 seems to generate broken code (not sure why). gcc3.4 also might be one way to make use of Thumb mode.

The RXVT compiler from ARM produces much smaller code. But you have to pay for it.

Useful Shell Commands

Find the biggest functions:
nm --size-sort -t d --demangle file.o
 Look for duplicated strings, often a sign of duplicated code (e.g. due to failure to inline as expected):
strings file.o | sort | uniq -D

Links





Luke Diamand, luke at diamand dot org.