Monday, 2 September 2019

c++ - Undefined, unspecified and implementation-defined behavior



What is undefined behavior in C and C++? What about unspecified behavior and implementation-defined behavior? What is the difference between them?


Answer



Undefined behavior is one of those aspects of the C and C++ language that can be surprising to programmers coming from other languages (other languages try to hide it better). Basically, it is possible to write C++ programs that do not behave in a predictable way, even though many C++ compilers will not report any errors in the program!



Let's look at a classic example:




#include 

int main()
{
char* p = "hello!\n"; // yes I know, deprecated conversion
p[0] = 'y';
p[5] = 'w';
std::cout << p;
}



The variable p points to the string literal "hello!\n", and the two assignments below try to modify that string literal. What does this program do? According to section 2.14.5 paragraph 11 of the C++ standard, it invokes undefined behavior:




The effect of attempting to modify a string literal is undefined.




I can hear people screaming "But wait, I can compile this no problem and get the output yellow" or "What do you mean undefined, string literals are stored in read-only memory, so the first assignment attempt results in a core dump". This is exactly the problem with undefined behavior. Basically, the standard allows anything to happen once you invoke undefined behavior (even nasal demons). If there is a "correct" behavior according to your mental model of the language, that model is simply wrong; The C++ standard has the only vote, period.



Other examples of undefined behavior include accessing an array beyond its bounds, dereferencing the null pointer, accessing objects after their lifetime ended or writing allegedly clever expressions like i++ + ++i.




Section 1.9 of the C++ standard also mentions undefined behavior's two less dangerous brothers, unspecified behavior and implementation-defined behavior:




The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine.



Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects.



Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine.




Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [ Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior.end note ]




Specifically, section 1.3.24 states:




Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).




What can you do to avoid running into undefined behavior? Basically, you have to read good C++ books by authors who know what they're talking about. Screw internet tutorials. Screw bullschildt.



No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...