Saturday 30 December 2017

c++ - Is memory barrier or atomic operation required in a busy-wait loop?

itemprop="text">


Consider the following
spin_lock() implementation, originally from href="https://stackoverflow.com/a/32658335/3169754">this
answer:



void
spin_lock(volatile bool* lock) {
for (;;) {
// inserts an acquire
memory barrier and a compiler barrier
if (!__atomic_test_and_set(lock,
__ATOMIC_ACQUIRE))
return;

while (*lock) // no
barriers; is it OK?

cpu_relax();

}
}


What I
already
know:




  • volatile
    prevents compiler from optimizing out *lock re-read on each
    iteration of the while
    loop;

  • volatile href="https://stackoverflow.com/questions/26307071/does-the-c-volatile-keyword-introduce-a-memory-fence">inserts
    neither memory nor compiler
    barriers;


  • such an implementation
    actually works in GCC for x86 (e.g. in Linux kernel) and some
    other architectures;

  • at least one memory and compiler
    barrier href="https://jfdube.wordpress.com/2012/03/08/understanding-memory-ordering/"
    rel="nofollow noreferrer">is required in
    spin_lock() implementation for a generic architecture; this
    example inserts them in href="https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html" rel="nofollow
    noreferrer">__atomic_test_and_set().



Questions:




  1. Is
    volatile enough here or are there any architectures or
    compilers where memory or compiler barrier or atomic operation is required in the
    while loop?



    1.1
    According to C++
    standards?




    1.2 In practice, for known
    architectures and compilers, specifically for GCC and platforms it
    supports?


  2. Is this implementation
    safe on all architectures supported by GCC and Linux? (It is at
    least inefficient on some architectures,
    right?)

  3. Is the while loop safe
    according to C++11 and its memory
    model?



/>

There are several related questions, but I was
unable to construct an explicit and unambiguous answer from
them:





class="post-text" itemprop="text">
class="normal">Answer







  1. Is volatile enough here or are there any architectures or compilers where
    memory or compiler barrier or atomic operation is required in the while
    loop?





will the
volatile code see the change. Yes, but not necessarily as quickly as if there was a
memory barrier. At some point, some form of synchronization will occur, and the new
state will be read from the variable, but there are no guarantees on how much has
happened elsewhere in the
code.






1.1 According to C++
standards?




From
cppreference :
memory_order



It is the memory model
and memory order which defines the generalized hardware that the code needs to work on.
For a message to pass between threads of execution, an inter-thread-happens-before
relationship needs to occur. This requires
either...




  • A
    synchronizes-with B


  • A has a std::atomic
    operation before B

  • A indirectly synchronizes with B
    (through X).

  • A is sequenced before X which inter-thread
    happens before B

  • A interthread happens before X and X
    interthread happens before
    B.



As you are not
performing any of those cases there will be forms of your program where on some current
hardware, it may fail.



In practice, the end of a
time-slice will cause the memory to become coherent, or any form of barrier on the
non-spinlock thread will ensure that the caches are
flushed.




Not sure on the causes of
the volatile read getting the "current
value".




1.2 In
practice, for known architectures and compilers, specifically for GCC and platforms it
supports?




As the
code is not consistent with the generalized CPU, from C++11
then it is likely this code will fail to perform with versions of C++ which try to
adhere to the standard.



From href="http://en.cppreference.com/w/cpp/language/cv" rel="nofollow
noreferrer">cppreference : const volatile qualifiers
Volatile
access stops optimizations from moving work from before it to after it, and from after
it to before it.





"This
makes volatile objects suitable for communication with a signal handler, but not with
another thread of
execution"




So an
implementation has to ensure that instructions are read from the memory location rather
than any local copy. But it does not have to ensure that the volatile write is flushed
through the caches to produce a coherent view across all the CPUs. In this sense, there
is no time boundary on how long after a write into a volatile variable will become
visible to another thread.



Also see href="https://www.kernel.org/doc/html/v4.11/process/volatile-considered-harmful.html"
rel="nofollow noreferrer">kernel.org why volatile is nearly always wrong in
kernel






Is this implementation safe on all architectures supported by GCC and Linux?
(It is at least inefficient on some architectures,
right?)




There is
no guarantee the volatile message gets out of the thread which sets it. So not really
safe. On linux it may be
safe.




Is the while
loop safe according to C++11 and its memory
model?




No - as it
doesn't create any of the inter-thread messaging
primitives.



No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...