smeso

MIPS stacktrace: an unexpected journey


in Technical, mips, embedded, glibc, gcc, stacktrace, C, assembly, backtrace, libgcc

Automatically receiving a stacktrace when your C program crashes isn't rocket science. But this time it was more difficult than I expected. This is a short recollection of the things I found out few years ago. This post assumes that the reader has some basic knowledge about functions' calling conventions, CPU registers, and assembly.

Some context

A C program running on Linux was randomly crashing on one specific embedded device deployed on the other side of the world. The device architecture was MIPS32. We needed a system to asynchronously receive reports with as much details as possible (i.e. a stacktrace). The device was using glibc, ideally we wanted a solution that could also work with other standard libraries (e.g. musl and uClibc) but even a non-portable solution was okay, at least to fix this one problem. Using external dependencies, especially if they were large, wasn't an option.

The simple solution

glibc already has backtrace(3) and backtrace_symbols_fd(3). They are easy to use and they will certainly work very well! glibc simply calls libgcc. Any code I’ll ever write will never understand the code generated by GCC better than libgcc!

Well, this may be true when you are on x86_64, but on some other architectures like MIPS32 those functions don't work at all.

Some notes on MIPS

Just in case you are not too familiar with MIPS, I wanted to add some notes about how it works with gcc on Linux.

  • The return address of the current function is stored in the $ra register
  • When entering a new function the "old" return address is pushed onto the stack and is the last thing just before the start of the stack frame of the new function.

To reconstruct the stack trace we need to recover all the return addresses in sequence. We can start from $ra and then go backwards, pulling the return addresses from the end of each stack frame.

The problem

It turns out that backtrace(3) does the same thing that too many blog posts on the Internet recommend to do: unwind the stack using the frame pointer register to figure out the position of the previous function's return address. This makes perfect sense, the frame pointer (aka $fp or $30 on MIPS) is usually a register designed to help debuggers to refer to local variables and other information stored on the stack (e.g. the previous function return address) using constant offsets. In theory, while the stack pointer (aka $sp) always points at the top of the stack, the $fp should point at the beginning of the current stack frame and should not move from there. If you want to retrieve the return address using the $sp, you need to know how much stuff you put on your stack since you entered the current stack frame, this depends on: what function you are in, how many automatic variables this functions is using, what type are those functions, and how many bytes do those types use. It would be very inconvenient to work with the $sp during debug, so we are very lucky to have the $fp! Using the $fp we can always retrieve the return address of the previous function without any complex operation! The offset between the $fp and the return address is the same constant for all functions in all programs!

... Or is it?

Well... it turns out it isn't.

In fact, when using GCC on Linux on MIPS32, the frame pointer just works exactly like another stack pointer: it's completely useless! I'm not 100% sure about the reason behind this choice, but I think it could be related to the fact that, on architectures with (relatively) small registers, it would be difficult to reference the top of the stack using a real $fp, but still it sounds like the wrong thing to do.

The funniest thing is that, backtrace(3) implementation from libgcc seems to ignore how gcc works in this context and it just returns random values.

The real solution

We can get rid of the $fp completely and just work with $sp, but how can we figure out the correct offset to use with $sp for any function?

You may not like the answer (or maybe you will) but the only way to know where the beginning of the stack frame is... is to jump into the actual code of the function and parse the opcodes to figure out how much $sp was decremented by the compiler.

Here is one way to do it:

#include <link.h>
#include <sys/ucontext.h>

static inline void my_backtrace(ucontext_t* c)
{
    unsigned long *ra;
    unsigned long *fp;
    unsigned long *sp;
    size_t ra_offset;
    size_t stack_size;
    int reached_start = 0;
    int first_time = 1;

    pc = (unsigned long*)(unsigned long)c->uc_mcontext.pc;
    ra = (unsigned long*)(unsigned long)c->uc_mcontext.gregs[31];
    fp = (unsigned long*)(unsigned long)c->uc_mcontext.gregs[30];
    sp = (unsigned long*)(unsigned long)c->uc_mcontext.gregs[29];
    print(pc - 1);
    print(ra - 1);

    while (!reached_start) {
        int using_fp = 0;
        ra_offset = 0;
        stack_size = 0;
        for (unsigned long* addr = ra;
             (ra_offset == 0 || stack_size == 0) && !reached_start;
             --addr) {
            switch (*addr & 0xffff0000) {
                case 0x27bd0000:
                    // found addiu sp, sp, stack_size
                    stack_size = abs((short)(*addr & 0xffff));
                    break;
                case 0xafbf0000:
                    // found sw ra, ra_offset(sp)
                    ra_offset = (short)(*addr & 0xffff);
                    break;
                case 0x03a00000:
                    if (0x03a0f000 == (*addr & 0xffffff00)) {
                        // found pseudo instruction move fp, sp
                        using_fp = 1;
                    }
                    break;
                case 0x03e00000:
                    if (0x03e00025 == *addr) {
                        // found move zero,ra
                        // so we found the start
                        reached_start = 1;
                    }
                    break;
                default:
                    break;
            }
        }

        if (!ra_offset) {
            break;
        }

        if (using_fp && first_time) {
            sp = fp;
        }
        first_time = 0;

        ra = *(unsigned long**)((unsigned long)sp + ra_offset);
        if (using_fp) {
            sp = *(unsigned long**)((unsigned long)sp + ra_offset - 4);
        } else {
            sp = (unsigned long*)((unsigned long)sp + stack_size);
        }

        print(ra - 1);
    }

}

Symbolizing the addresses

It would be nice to be able to translate the addresses, that we just found, to actual function names. This isn't usually the funniest thing to do, but after what we just did, it seems trivial. We can use dl_iterate_phdr(3) to look into every loaded shared object. Once we found the shared object that contains our address, we can walk through its ELF sections and look at its .symtab to find the function name.

Here is an example of how to do it:

struct file_match {
    const char *file;
    void *address;
    void *base;
};

static int
find_matching_file(struct dl_phdr_info *info,
                   size_t size,
                   void *data)
{
    struct file_match *match = data;
    long n;
    const ElfW(Phdr) *phdr;
    /* This code is modeled from Gfind_proc_info-lsb.c:callback() from libunwind */
    ElfW(Addr) load_base = info->dlpi_addr;
    phdr = info->dlpi_phdr;
    for (n = info->dlpi_phnum; --n >= 0; phdr++) {
        if (phdr->p_type == PT_LOAD) {
            ElfW(Addr) vaddr = phdr->p_vaddr + load_base;
            if (match->address >= vaddr && match->address < vaddr + phdr->p_memsz) {
                match->file = info->dlpi_name;
                match->base = info->dlpi_addr;
            }
        }
    }
    return 0;
}

static inline void
symbolize(const char* progname, void* addr)
{
    struct file_match match = { .address = addr };
    dl_iterate_phdr(find_matching_file, &match);
    if (match.file && strlen(match.file)) {
        fprintf(stderr, " %s(+%p)\n", match.file, addr - match.base);
    } else {
        fprintf(stderr, " %s\n", progname);
    }
    fflush(stderr);
}

Wouldn't it be cool if we also retrieved the source file name and the line? To do that we need to use the information provided in the .debug_line section using the DWARF format. To use as little space as possible DWARF doesn’t simply store a list of address-to-line mappings. It stores a line number program, which is a serialized finite state machine that can be used to find out line numbers and more. Parsing DWARF is left as an exercise for the reader. Alternatively we can manually invoke addr2line.

Conclusion

In the end the bug was found and fixed and everyone lived happily ever after.

I learned, once again, that features/bugs can sometime happen in your most trusted dependency and one should never refrain from doubting the correctness of the most respected software.

I also learned, once again, that the Internet is full of misleading information and blog posts written with authority by people that never actually tried to do the things that they talk about. As of today if you try to lookup information about the frame pointer on MIPS32 it will be very hard to find any mention at all of the issues outlined in this post. So, never trust a blog post! Not even this one! Your combination of compiler and libc might behave differently and have different, new, exciting issues that will ruin your day!

P.S.

This post made HN's front page and received a few comments! You can add your comments here as well.