Sure-Fire Stack Reduction Program

Programmers these days usually don't have to worry about stack overflow, yet there are still some situations where stack size matters. Here's a recipe to follow if you want to reduce the stack consumption of a large program. You'll take KILOBYTES off in just DAYS! FDA Approved! As seen on TV!

Contents
Standard Practice: big stacks
Virtual Memory Limit on total stack size
Reducing stack usage is tricky
Static Analysis of Stack Usage
Stack Guard Pages Help Detect Overflow
Active Stack Overflow Detection
Demo Program
Recipe for Slimming Down Your Stacks

Standard Practice: big stacks

In C and C++ on Linux, thread stacks are (for reasons of compiler convenience) allocated a fixed region of contiguous virtual memory on startup. The size cannot change during the life of the thread. Normally, one deals with this restriction by assigning huge regions (the default is 8 megabytes) of virtual memory to each thread's stack. Since no physical memory is assigned to the stack until it's needed, this works out quite well, without wasting physical memory.

Virtual Memory Limit on total stack size

There's just one catch: When scaling up programs to thousands of threads on 32 bit processors, there's a hard limit of a gigabyte or two on virtual memory available to each process. That means there's a hard limit on the total size of thread stacks. Let's say Linux allows user processes to use 2^31 bytes of virtual address space, and linuxthreads by default gives each thread 2^23 bytes of virtual stack space. The hard upper bound on the number of threads is then 2^(31-23) = 256. To achieve more than 256 threads, you have to allocate less virtual memory to each stack, using pthread_attr_setstacksize(). (See the demo below.)

Reducing stack usage is tricky

Reducing the stack size is dangerous. If any thread of a program overflows its stack, the program will malfunction and probably crash. Worse, there is no general way to know what the required stack size is! You have to carefully design your program to use a bounded amount of stack in each thread. Since most threaded programs aren't so carefully designed, it would be helpful to have a way to track down what functions are using too much stack.

Static Analysis of Stack Usage

Fortunately, there's a way to get a list of how much stack space each function in your program uses. This is somewhat less than a total solution, as it can't take into account recursive functions nor functions that use alloca(), but it's helpful nonetheless. The idea is to recognize the particular instruction used by the compiler for allocating stack space.

Joern Engel wrote a perl script that implements this idea. Here is a version of that script modified to have a simpler output format:

checkstack.pl

It prints out a list of functions and their stack usage, biggest first. To use it, compile your program normally, then run it through objdump and the script as follows:

objdump -D a.out | perl checkstack.pl i386

For instance, running this on the program

int blort()
{
	char bar[10000];
	return bar[0] + bar[9999];
}

int main(int argc, char **argv)
{
	char foo[2417];
	foo[0] = (char) blort();
	exit(0);
}

outputs

10008 blort
2440 main

This script lets you get a quick idea of which functions use enough stack to be worth looking at. Together with the technique described below, it can quickly help you figure out which functions need to be rewritten to use less stack space.

Stack Guard Pages Help Detect Overflow

The only way to really tell whether an arbitrary program can run with small thread stacks is to try it. If it crashes, it can't run with that size stack. However, it'd sure be nice to know *why* it crashed, i.e. which functions were on the stack at the time of the crash. You can easily get a stack dump with gdb or with glibc's backtrace() function; the problem is triggering the stack dump when overflow occurs. By default, glibc uses a single guard page between stacks, so if you're lucky, the overflow will touch the guard page first.

On some platforms, you can control the size of the stack and the guard page; see the glibc-2.3.x manual and the POSIX spec for pthread_attr_setguardsize(). This lets you get rid of the guard page if you really don't need it.

Active Stack Overflow Detection

However, that might detect the crash too late, and give a confusing backtrace. To detect the crash as soon as possible, turn on active stack checking in gcc with the -fstack-check option. This causes the compiler to write a byte to each page of the local variable area on entry to each function, thus removing any chance that the overflow will miss the guard page. (Note: similar but architecture-specific options such as -mapcs-stack-check are beyond the scope of this document.)

I tried out -fstack-check on Red Hat 8.0's gcc with the example program

int main(int argc, char **argv)
{
	char foo[2417];
	exit(0);
}

This generates a SIGSEGV when compiled with -fstack-check when the size of the local variables on main's stack exceeds 2416 bytes. This appears to be a bug. However, -fstack-check is not useless; to work around this problem, just avoid using more than 2KB of stack in your main thread. This bug may have been reported before to the gcc maintainers -- here are some related-looking entries in the bug tracking system --

but without enough info to track the problem down. Hopefully this document makes the issue clearer. I have filed a new bug report, #10127.

Demo Program

Here's a program that demonstrates how to set thread stack size smaller, and how to work around the apparent bug in -fstack-check:

demo.c

Recipe for Slimming Down Your Stacks

Putting the above all together yields the following recipe:

Until the bug in gcc or glibc is fixed, work around it by wrapping your real main with a fake main that just starts a thread and waits for it to exit.
Modify your program to take a parameter for stack size, and pass that parameter to pthread_attr_setstacksize. Use that attribute for creating your worker threads.
Compile your program with -fstack-check.
Run your program, and search for the lowest value of stack size that lets your program complete a normal task successfully.
Run it again with a slightly smaller stack size, causing a crash, and get a symbolic stack traceback.
Run 'objdump -d' on your program's binary and pass the result through checkstack.pl to get a map of stack usage by function.
Find all the functions that occur both in the symbolic stack traceback, and in the output of checkstack.pl. These are the functions whose stack usage you must reduce to avoid the crash.