Thursday 18 January 2018

c - Is it possible to identify whether an address reference belongs to static/heap/stack in the process address space

itemprop="text">

We have a mechanism that monitors the
load & store instructions that captures the address referenced. I'd like to classify
the addresses whether they belong to the stack, the heap or the region where the static
variables are allocated. Is there a way to do this classification
programatically?



My initial thought was to do a
malloc() with a small memory request (1?) as soon as the process starts running so that
I could capture the "base address" (or starting address) for the heap. That way, I can
distinguish from those variables statically allocated and the rest. For those references
not belonging to the static region (those are, heap and stack), how could I
differentiate them?



Some small tests show that
the following simple code (run in Linux 3.18/x86-64 compiled with gcc
4.8.4)




#include

#include

int
x;

int foo (void)
{
int s;
int *h =
malloc (sizeof(int));


printf ("x = %p, *s = %p, h =
%p\n", &x, &s, h);
}

int main (int argc, char
*argv[])
{
foo();
return
0;
}



shows
some randomization of the address space (not in the static variables but in the
remaining part -- heap & stack) which may add some uncertainty but maybe a way to
find the limits of these regions of the addres space.



Answer




There is no standard C API for this, which
means that all possible solutions are going to be based on platform-specific hacks.
Also, this answer limits itself to single-threaded
applications.




  1. How to
    recognize a stack
    address?



The stack is a
continuous memory region. Therefore all you need to know are two numbers: the top of the
stack and the bottom of the stack. The top of the stack is basically limited by the
stack frame of the current function. However, since the size of the current stack frame
cannot be accessed from C code, it's a difficult to tell where exactly the current frame
ends. The trick here is to call one more function from the current and use an addess the
in the called functions stack frame as the boundary value for
stack_top.




Learning
the bottom of the stack is simpler - its value stays constant during the execution of
the program, and is bounded by the stack frame of the entry-point function
(main() in C programs). Therefore taking address of some local
variable in the main() function is a sufficient
approximation.



One more caveat is that
x86 stack grows backwards, which means that the top of the
stack has a smaller address than the bottom. This code sums it
up:



void
*stack_bottom;

bool IS_IN_STACK(void *x)
__attribute__((noinline));
bool IS_IN_STACK(void *x) {
void
*stack_top = &stack_top;
return x <= stack_bottom && x >=
stack_top;

}

int main (int argc, char *argv[])
{
int x;
stack_bottom = &x;

...


start="2">
  • How to recognize an address of a static
    variable?




  • The
    logic is even simpler here. Static variables are allocated in a memory region starting
    with a fixed, platform-specific address. Usually this region precedes all other regions
    in memory. The only thing that has to be learned therefore is the
    end address of this static memory
    region.



    Luckily, GCC linker href="http://man7.org/linux/man-pages/man3/end.3.html" rel="noreferrer">provides
    symbols end, edata and
    etext that denote the end of .bss,
    .data and .text segments respectively.
    Static variables are allocated either in .bss or
    .data segment, therefore this check should be sufficient on
    most platforms:



    #define
    IS_STATIC(x) ((void*)(x) <= (void*)&end || (void*)(x) <=
    (void*)&edata)


    This
    macro checks both edata and end to
    avoid making assumptions about which of .bss and
    .data comes first in
    memory.




    start="3">
  • Heap
    addresses.



  • Heap variables
    are typically allocated in addresses directly following the addresses in
    .data and .bss regions. However,
    sometimes heap addresses may belong to non-continuous memory ranges. Therefore the best
    you can do here is to read Linux process files to find out the memory mappings as
    suggested in the other answer. Alternatively, just check if both
    IS_IN_STACK and IS_STATIC return
    false.



    The complete program using these
    macros:



    int
    x;

    extern int end, edata;

    void
    *stack_bottom;

    bool IS_IN_STACK(void *x)
    __attribute__((noinline));
    bool IS_IN_STACK(void *x) {
    void
    *stack_top = &stack_top;
    return x <= stack_bottom && x >=
    stack_top;
    }


    #define IS_STATIC(x) ((void*)(x)
    <= (void*)&end || (void*)(x) <= (void*)&edata)

    int
    foo (void)
    {
    int s;
    int *h = malloc
    (sizeof(int));

    printf ("x = %p, *s = %p, h = %p\n", &x,
    &s, h);
    // prints 0 1 0
    printf ("%d %d %d\n",
    IS_IN_STACK(&x), IS_IN_STACK(&s), IS_IN_STACK(h));

    //
    prints 1 0 0
    printf ("%d %d %d\n", IS_STATIC(&x), IS_STATIC(&s),
    IS_STATIC(h));
    }

    int main (int argc, char
    *argv[])
    {
    int x;
    stack_bottom = &x;

    foo();
    return
    0;

    }


    No comments:

    Post a Comment

    php - file_get_contents shows unexpected output while reading a file

    I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...