Sunday, 1 July 2018

What is the difference between char array and char pointer in C?


What is the difference between char array vs char pointer in C?



C99 N1256 draft


There are two different uses of character string literals:



  1. Initialize char[]:


    char c[] = "abc";

    This is "more magic", and described at 6.7.8/14 "Initialization":



    An array of character type may be initialized by a character string literal, optionally
    enclosed in braces. Successive characters of the character string literal (including the
    terminating null character if there is room or if the array is of unknown size) initialize the
    elements of the array.



    So this is just a shortcut for:


    char c[] = {'a', 'b', 'c', '\0'};

    Like any other regular array, c can be modified.


  2. Everywhere else: it generates an:


    So when you write:


    char *c = "abc";

    This is similar to:


    /* __unnamed is magic because modifying it gives UB. */
    static char __unnamed[] = "abc";
    char *c = __unnamed;

    Note the implicit cast from char[] to char *, which is always legal.


    Then if you modify c[0], you also modify __unnamed, which is UB.


    This is documented at 6.4.5 "String literals":



    5 In translation phase 7, a byte or code of value zero is appended to each multibyte
    character sequence that results from a string literal or literals. The multibyte character
    sequence is then used to initialize an array of static storage duration and length just
    sufficient to contain the sequence. For character string literals, the array elements have
    type char, and are initialized with the individual bytes of the multibyte character
    sequence [...]


    6 It is unspecified whether these arrays are distinct provided their elements have the
    appropriate values. If the program attempts to modify such an array, the behavior is
    undefined.




6.7.8/32 "Initialization" gives a direct example:



EXAMPLE 8: The declaration


char s[] = "abc", t[3] = "abc";

defines "plain" char array objects s and t whose elements are initialized with character string literals.


This declaration is identical to


char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

The contents of the arrays are modifiable. On the other hand, the declaration


char *p = "abc";

defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.



GCC 4.8 x86-64 ELF implementation


Program:


#include 
int main(void) {
char *s = "abc";
printf("%s\n", s);
return 0;
}

Compile and decompile:


gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o

Output contains:


 char *s = "abc";
8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
f: 00
c: R_X86_64_32S .rodata

Conclusion: GCC stores char* it in .rodata section, not in .text.


If we do the same for char[]:


 char s[] = "abc";

we obtain:


17:   c7 45 f0 61 62 63 00    movl   $0x636261,-0x10(%rbp)

so it gets stored in the stack (relative to %rbp).


Note however that the default linker script puts .rodata and .text in the same segment, which has execute but no write permission. This can be observed with:


readelf -l a.out

which contains:


 Section to Segment mapping:
Segment Sections...
02 .text .rodata

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...