memory - peterson algorithm about data modification in lock visible to second thread with mfence/lfence/sfence

Thursday, 15 August 2019

memory - peterson algorithm about data modification in lock visible to second thread with mfence/lfence/sfence

https://www.justsoftwaresolutions.co.uk/threading/petersons_lock_with_C++0x_atomics.html

I wrote comments and asked two questions and have another question about Anthony's reply.
here is the reply:

"1. The acquire/release on the flag0 and flag1 variables are necessary to ensure that it acts as a lock: the release store in the unlock synchronizes with the acquire-load in the next lock, to ensure that the data modified while the lock was held is now visible to the second thread."

I have written a peterson lock in C

typedef struct {
  volatile bool flag[2];
   volatile int victim;
  } peterson_lock_t;


  void peterson_lock_init(peterson_lock_t &lock) {
   lock.flag[0] = lock.flag[1] = false;
   lock.victim = 0;
  } 

  void peterson_lock(peterson_lock_t &lock, int id) {
   lock.flag[id] = true;
   lock.victim = id;
   asm volatile ("mfence" : : : "memory");

   while (lock.flag[1 - id] && lock.victim == id) {
   };
  }

  void peterson_unlock(peterson_lock_t &lock, int id) {
   lock.flag[id] = false;
  }

I tested it and I think it's correct, right?

If it's right, my question is do I need to add sfence and lfence to “make sure the data modified while the lock was held is now visible to the second thread” ?
like this,

  void peterson_lock(peterson_lock_t &lock, int id) {
   lock.flag[id] = true;
   lock.victim = id;
   asm volatile ("mfence" : : : "memory");
   asm volatile ("lfence" : : : "memory"); // here, I think this is unnecessary, since mfence will flush load buffer
   while (lock.flag[1 - id] && lock.victim == id) {

   };
  }

  void peterson_unlock(peterson_lock_t &lock, int id) {
   asm volatile ("sfence" : : : "memory"); // here
   lock.flag[id] = false;
  }

I think no need to do this.

My understanding is that on x86/64 'store' has a release semantics, and 'load' has a acquire semantics(the root reason is on x86/64 there is only store load reorder),
and 'lock.flag[id]= false' is a 'store', 'lock.flag[1 - id] ' is a 'load',
so there is no need to do things like the acquire/release on the flag0 and flag1 in Dmitriy's implementation

EDIT @Anthony
very appreciate your replay.
Yes, I need to avoid compiler reorder.
So, modification like below, is it correct?
Because for x86, only need to forbidden compiler reorder in 'peterson_unlock'

void peterson_lock(peterson_lock_t &lock, int id) {
    lock.flag[id] = true;
    lock.victim = id;
    asm volatile ("mfence" : : : "memory");
    while (lock.flag[1 - id] && lock.victim == id) {
    };
}

void peterson_unlock(peterson_lock_t &lock, int id) {
    asm volatile ("" : : : "memory"); // here, forbidden compiler reorder

    lock.flag[id] = false;
}

Blog

Thursday, 15 August 2019

memory - peterson algorithm about data modification in lock visible to second thread with mfence/lfence/sfence

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file