c - Is it possible to identify whether an address reference belongs to static/heap/stack in the process address space

Thursday 18 January 2018

c - Is it possible to identify whether an address reference belongs to static/heap/stack in the process address space

itemprop="text">

We have a mechanism that monitors the
load & store instructions that captures the address referenced. I'd like to classify
the addresses whether they belong to the stack, the heap or the region where the static
variables are allocated. Is there a way to do this classification
programatically?

My initial thought was to do a
malloc() with a small memory request (1?) as soon as the process starts running so that
I could capture the "base address" (or starting address) for the heap. That way, I can
distinguish from those variables statically allocated and the rest. For those references
not belonging to the static region (those are, heap and stack), how could I
differentiate them?

Some small tests show that
the following simple code (run in Linux 3.18/x86-64 compiled with gcc
4.8.4)

#include
            
#include 

int
            x;

int foo (void)
{
 int s;
 int *h =
            malloc (sizeof(int));


 printf ("x = %p, *s = %p, h =
            %p\n", &x, &s, h);
}

int main (int argc, char
            *argv[])
{
 foo();
 return
            0;
}

shows
some randomization of the address space (not in the static variables but in the
remaining part -- heap & stack) which may add some uncertainty but maybe a way to
find the limits of these regions of the addres space.

Answer

There is no standard C API for this, which
means that all possible solutions are going to be based on platform-specific hacks.
Also, this answer limits itself to single-threaded
applications.

How to
recognize a stack
address?

The stack is a
continuous memory region. Therefore all you need to know are two numbers: the top of the
stack and the bottom of the stack. The top of the stack is basically limited by the
stack frame of the current function. However, since the size of the current stack frame
cannot be accessed from C code, it's a difficult to tell where exactly the current frame
ends. The trick here is to call one more function from the current and use an addess the
in the called functions stack frame as the boundary value for
stack_top.

Learning
the bottom of the stack is simpler - its value stays constant during the execution of
the program, and is bounded by the stack frame of the entry-point function
(main() in C programs). Therefore taking address of some local
variable in the main() function is a sufficient
approximation.

One more caveat is that
x86 stack grows backwards, which means that the top of the
stack has a smaller address than the bottom. This code sums it
up:

void
            *stack_bottom;

bool IS_IN_STACK(void *x)
            __attribute__((noinline));
bool IS_IN_STACK(void *x) {
 void
            *stack_top = &stack_top;
 return x <= stack_bottom && x >=
            stack_top;

}

int main (int argc, char *argv[])
            {
 int x;
 stack_bottom = &x;

            ...

start="2">

How to recognize an address of a static
variable?

The
logic is even simpler here. Static variables are allocated in a memory region starting
with a fixed, platform-specific address. Usually this region precedes all other regions
in memory. The only thing that has to be learned therefore is the
end address of this static memory
region.

Luckily, GCC linker href="http://man7.org/linux/man-pages/man3/end.3.html" rel="noreferrer">provides
symbols end, edata and
etext that denote the end of .bss,
.data and .text segments respectively.
Static variables are allocated either in .bss or
.data segment, therefore this check should be sufficient on
most platforms:

#define
            IS_STATIC(x) ((void*)(x) <= (void*)&end || (void*)(x) <=
            (void*)&edata)

This
macro checks both edata and end to
avoid making assumptions about which of .bss and
.data comes first in
memory.

start="3">

Heap
addresses.

Heap variables
are typically allocated in addresses directly following the addresses in
.data and .bss regions. However,
sometimes heap addresses may belong to non-continuous memory ranges. Therefore the best
you can do here is to read Linux process files to find out the memory mappings as
suggested in the other answer. Alternatively, just check if both
IS_IN_STACK and IS_STATIC return
false.

The complete program using these
macros:

int
            x;

extern int end, edata;

void
            *stack_bottom;

bool IS_IN_STACK(void *x)
            __attribute__((noinline));
bool IS_IN_STACK(void *x) {
 void
            *stack_top = &stack_top;
 return x <= stack_bottom && x >=
            stack_top;
}


#define IS_STATIC(x) ((void*)(x)
            <= (void*)&end || (void*)(x) <= (void*)&edata)

int
            foo (void)
{
 int s;
 int *h = malloc
            (sizeof(int));

 printf ("x = %p, *s = %p, h = %p\n", &x,
            &s, h);
 // prints 0 1 0
 printf ("%d %d %d\n",
            IS_IN_STACK(&x), IS_IN_STACK(&s), IS_IN_STACK(h));

 //
            prints 1 0 0
 printf ("%d %d %d\n", IS_STATIC(&x), IS_STATIC(&s),
            IS_STATIC(h));
}

int main (int argc, char
            *argv[])
{
 int x;
 stack_bottom = &x;

            foo();
 return
            0;

}

Blog

Thursday 18 January 2018

c - Is it possible to identify whether an address reference belongs to static/heap/stack in the process address space

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file