Sure-Fire Stack Reduction Program
Programmers these days usually don't have to worry about stack overflow,
yet there are still some situations where stack size matters. Here's a
recipe to follow if you want to reduce the stack consumption of a
large program. You'll take KILOBYTES off in just DAYS! FDA Approved!
As seen on TV!
In C and C++ on Linux, thread stacks are (for reasons of compiler
convenience) allocated a fixed region of contiguous virtual memory
on startup. The size cannot change during the life of the thread.
Normally, one deals with this restriction by assigning huge regions
(the default is 8 megabytes) of virtual memory to each thread's stack.
Since no physical memory is assigned to the stack until it's needed,
this works out quite well, without wasting physical memory.
There's just one catch: When scaling up programs to thousands
of threads on 32 bit processors, there's a hard limit of
a gigabyte or two on virtual memory available to each process.
That means there's a hard limit on the total size
of thread stacks. Let's say Linux allows user processes
to use 2^31 bytes of virtual address space, and linuxthreads by default
gives each thread 2^23 bytes of virtual stack space. The
hard upper bound on the number of threads is then 2^(31-23) = 256.
To achieve more than 256 threads, you have to allocate less
virtual memory to each stack, using
pthread_attr_setstacksize(). (See the demo below.)
Reducing the stack size is dangerous. If any thread of a program
overflows its stack, the program will malfunction and probably crash.
Worse, there is no general way to know what the required stack size is!
You have to carefully design your program to use a bounded amount of
stack in each thread. Since most threaded programs aren't so carefully
designed, it would be helpful to have a way to track down what functions
are using too much stack.
Fortunately, there's a way to get a list of how much stack space each
function in your program uses. This is somewhat less than a total
solution, as it can't take into account recursive functions
nor functions that use alloca(), but it's helpful nonetheless.
The idea is to recognize the particular instruction used by
the compiler for allocating stack space.
Joern Engel wrote a perl script that implements this idea.
Here is a version of that script modified to have a simpler output format:
It prints out a list of functions and their stack usage, biggest first.
To use it, compile your program normally, then run it through objdump
and the script as follows:
objdump -D a.out | perl checkstack.pl i386
For instance, running this on the program
int blort()
{
char bar[10000];
return bar[0] + bar[9999];
}
int main(int argc, char **argv)
{
char foo[2417];
foo[0] = (char) blort();
exit(0);
}
outputs
10008 blort
2440 main
This script lets you get a quick idea of which functions
use enough stack to be worth looking at. Together with
the technique described below, it can quickly help you
figure out which functions need to be rewritten to use less stack space.
The only way to really tell whether an arbitrary program can run with
small thread stacks is to try it. If it crashes, it can't run with that
size stack. However, it'd sure be nice to know *why* it crashed, i.e.
which functions were on the stack at the time of the crash. You
can easily get a stack dump with gdb or with glibc's backtrace() function;
the problem is triggering the stack dump when overflow occurs.
By default, glibc uses a single guard page between stacks, so if you're
lucky, the overflow will touch the guard page first.
On some platforms, you can control the size of the stack and the guard page; see
the glibc-2.3.x manual
and the POSIX spec for
pthread_attr_setguardsize().
This lets you get rid of the guard page if you really don't need it.
However, that might detect the crash too late, and give a confusing backtrace.
To detect the crash as soon as possible, turn on active stack checking in gcc
with the -fstack-check option. This causes the compiler to write a byte to
each page of the local variable area on entry to each function, thus removing
any chance that the overflow will miss the guard page.
(Note: similar but architecture-specific options such as -mapcs-stack-check are beyond the scope of
this document.)
I tried out -fstack-check on Red Hat 8.0's gcc with the example program
int main(int argc, char **argv)
{
char foo[2417];
exit(0);
}
This generates a SIGSEGV when compiled with -fstack-check when the size
of the local variables on main's stack exceeds 2416 bytes.
This appears to be a bug. However, -fstack-check is not useless; to work
around this problem, just avoid using more than 2KB of stack in your main
thread. This bug may have been reported before to the gcc maintainers -- here are
some related-looking entries in the bug tracking system --
but without enough info to track the problem down.
Hopefully this document makes the issue clearer.
I have filed a new bug report, #10127.
Here's a program that demonstrates how to set thread stack size smaller,
and how to work around the apparent bug in -fstack-check:
Putting the above all together yields the following recipe:
- Until the bug in gcc or glibc is fixed, work around it by wrapping your real main
with a fake main that just starts a thread and waits for it to exit.
- Modify your program to take a parameter for stack size, and pass that
parameter to pthread_attr_setstacksize. Use that attribute for creating your worker
threads.
- Compile your program with -fstack-check.
- Run your program, and search for the lowest value of stack size that
lets your program complete a normal task successfully.
- Run it again with a slightly smaller stack size, causing a crash, and get a
symbolic stack traceback.
- Run 'objdump -d' on your program's binary and pass the result through checkstack.pl to get a map of stack usage by function.
- Find all the functions that occur both in the symbolic stack traceback, and in the output
of checkstack.pl. These are the functions whose stack usage you must reduce to avoid the crash.
Copyright 2003, Dan Kegel