It's never actually explained how n=2 is possible. Here's how it works:
P1 and P2 start executing. P1 sets temp=1, then P2 runs through the loop for i=1..9. Then P1 sets n=temp=1. P2 sets temp=2, then P1 resumes executing for i=2..10, and completes. P2 sets n=temp=2, then completes.
The intuition is P1 checkpoints, P2 runs through the loop up until the last iteration, P1 resets n to the checkpoint + 1, P2 checkpoints, P1 runs through the loop, then P2 resets n to the checkpoint + 1.
Thank you! I initially included the illustration and explanation in the first draft but decided to remove them in favor of the full trail. Looking back, I think that was a mistake. I’ve now added back the illustration along with some explanation.
the best thing i think i read somewhere is to think of modern cpus as a bunch of computers on a network. each task is transmitted in messages over the network potentially, executed and its result sent back in a packet. despite this not being fully accurate it does give an idea that even in a set of 100 operations, due to caches, latencies and also kernel/hardware stuff happening during these 100 operations however much you try to isolate them, the first operation submitted might see its result sent back last.
this will help to determine or realize better when and where synchronization is needed even though you might not expect it.
(or remoddeling of the code).
its extremely hard to writr concurrent code well. i think i've hardly ever managed lockless code that wasnt super bugy (yet! :D). it takes a lot of focus and patience, and tons details about the platform(s) it might run on.
To be pedantic (which I feel is fair because it's a concurrency question about possibilities):
>If we run P and Q concurrently, with ‘n’ initialized to zero, what would the value of ‘n’ be when the two processes finish executing their statements?
It depends a lot on the observer of `n` and what relationship it has with P and Q. Which isn't defined.
E.g. a trivial boundary-answer is that it can be 0, if nothing guarantees that P's and Q's threads' memory reaches the `n`-observer's thread. This is somewhat common if you try to synchronize via `sleep`, because that usually doesn't guarantee anything as all.
We also don't know the computer architecture. We can probably assume at least one byte moves between memory levels atomically, because basically every practical system does that, but that doesn't have to be true. If bits move between memory levels independently, this could observe 31, because it can be any combination of the bits between `00000` and `10100`, which includes `01011` -> there can be a `1` in any position, including all positions, so you can observe a number that was never assigned. (IRL: multi-word values and partial initializations can do this, e.g. it's why double-checked locking is flawed without something to guarantee initialization completes before the pointer is exposed)
You're right, I could have phrased it better to something like:
"If we run P and Q concurrently with ‘n’ initialized to zero, what extreme interleaving could result in the lowest value of 'n' when the two processes finish executing their statements on a model checker?"
tbh I think zero is a completely reasonable result for a general thought-exercise like this. sleep-based "memory fixes" are quite common in practice, and there's no synchronization at all in the sample.
though there's a lot of fun in here if you allow for optimizing compilers and lack of synchronization. e.g. without explicit synchronization between P/Q and the observing thread (assuming it's the main thread), it's reasonable for a compiler to simply delete P and Q entirely and replace the whole program with `print(0)`
This implicitely assumes atomic assignments, meaning that during an assignment all bits representing a value are transfered in one atomic unit. This sounds a bit trivial, but if one would be working with large integers that are stored in multiple memory words, it is less trivial.
I think it is possible to implement locking with only atomic assigments.
You can replace the read and store with atomic operations and n==2 is still possible. As far as I can tell, the model being verified already uses atomic operations.
I get your point, and my first question was "what operations are even atomic here for this problem to be well-defined?".
But I think "language operations are atomic and everything else isn't" is a reasonable inference for a puzzle.
I'm also intrigued by locking being done through atomic assignments; I thought it required at least something like "write if equal to value" or "write new value and return old".
Yeah, looking at the code my 'intuition' was "this is a classic data race, the program is broken" not "somewhere between 10 and 20".
I suppose if you assume all global accesses are atomic, it's a good demonstration that atomic operations don’t compose atomically (even in trivial cases) and aren't particularly useful as concurrency primitives.
The question is more testing your full understanding of why this is a race condition, including understanding the possible side effects, since that's the only thing you start out with when someone sends in a bug report.
I was once asked a very similar problem during a job interview for a position of a software engineer. I was offered to solve it on a piece of paper or orally. We ran out of time and they offered me to take it home. I did and I spent a few hours thinking about it. Google and ChatGPT were not able to produce better results than the intuitive answer, so I built a test program that confirmed the weird behaviour. After meditating for a few hours I was able to find the correct answer and I was indeed fascinated by this.
Do you also think it's too much for a usual interview?
I don't think this an unreasonable puzzle for a role that involves problem-solving or security.
You can read the code, figure out what primitives you have, translate the problem into something simple (I like card and board games), then map it back.
Problem: two processes each do this ten times:
- read the counter
- do any number of other things in other processes
- write back what you read, plus one.
Game: You have two piles of cards. Each card costs 1 move to play, but lets you discard any number of cards from other piles at the same time.
Solution: play one red card, discarding everything but one blue card. Play the single remaining blue card, discarding the leftover red cards.
Back: process 1 reads, process 2 runs 9 steps, process 1 writes, process 2 reads, process 1 finishes, process 2 writes.
It's never actually explained how n=2 is possible. Here's how it works:
P1 and P2 start executing. P1 sets temp=1, then P2 runs through the loop for i=1..9. Then P1 sets n=temp=1. P2 sets temp=2, then P1 resumes executing for i=2..10, and completes. P2 sets n=temp=2, then completes.
The intuition is P1 checkpoints, P2 runs through the loop up until the last iteration, P1 resets n to the checkpoint + 1, P2 checkpoints, P1 runs through the loop, then P2 resets n to the checkpoint + 1.
Thank you! I initially included the illustration and explanation in the first draft but decided to remove them in favor of the full trail. Looking back, I think that was a mistake. I’ve now added back the illustration along with some explanation.
This was fun to port over to PlusCal to verify it myself:
(I acknowledge that this isn't the most elegant Pluscal but I think it is a roughly accurate?)Thank you :) I was wondering how this would look in TLA+/PlusCal.
the best thing i think i read somewhere is to think of modern cpus as a bunch of computers on a network. each task is transmitted in messages over the network potentially, executed and its result sent back in a packet. despite this not being fully accurate it does give an idea that even in a set of 100 operations, due to caches, latencies and also kernel/hardware stuff happening during these 100 operations however much you try to isolate them, the first operation submitted might see its result sent back last.
this will help to determine or realize better when and where synchronization is needed even though you might not expect it. (or remoddeling of the code).
its extremely hard to writr concurrent code well. i think i've hardly ever managed lockless code that wasnt super bugy (yet! :D). it takes a lot of focus and patience, and tons details about the platform(s) it might run on.
To be pedantic (which I feel is fair because it's a concurrency question about possibilities):
>If we run P and Q concurrently, with ‘n’ initialized to zero, what would the value of ‘n’ be when the two processes finish executing their statements?
It depends a lot on the observer of `n` and what relationship it has with P and Q. Which isn't defined.
E.g. a trivial boundary-answer is that it can be 0, if nothing guarantees that P's and Q's threads' memory reaches the `n`-observer's thread. This is somewhat common if you try to synchronize via `sleep`, because that usually doesn't guarantee anything as all.
We also don't know the computer architecture. We can probably assume at least one byte moves between memory levels atomically, because basically every practical system does that, but that doesn't have to be true. If bits move between memory levels independently, this could observe 31, because it can be any combination of the bits between `00000` and `10100`, which includes `01011` -> there can be a `1` in any position, including all positions, so you can observe a number that was never assigned. (IRL: multi-word values and partial initializations can do this, e.g. it's why double-checked locking is flawed without something to guarantee initialization completes before the pointer is exposed)
You're right, I could have phrased it better to something like:
"If we run P and Q concurrently with ‘n’ initialized to zero, what extreme interleaving could result in the lowest value of 'n' when the two processes finish executing their statements on a model checker?"
I'll edit it to improve, thanks.
tbh I think zero is a completely reasonable result for a general thought-exercise like this. sleep-based "memory fixes" are quite common in practice, and there's no synchronization at all in the sample.
though there's a lot of fun in here if you allow for optimizing compilers and lack of synchronization. e.g. without explicit synchronization between P/Q and the observing thread (assuming it's the main thread), it's reasonable for a compiler to simply delete P and Q entirely and replace the whole program with `print(0)`
This implicitely assumes atomic assignments, meaning that during an assignment all bits representing a value are transfered in one atomic unit. This sounds a bit trivial, but if one would be working with large integers that are stored in multiple memory words, it is less trivial.
I think it is possible to implement locking with only atomic assigments.
You can replace the read and store with atomic operations and n==2 is still possible. As far as I can tell, the model being verified already uses atomic operations.
I get your point, and my first question was "what operations are even atomic here for this problem to be well-defined?".
But I think "language operations are atomic and everything else isn't" is a reasonable inference for a puzzle.
I'm also intrigued by locking being done through atomic assignments; I thought it required at least something like "write if equal to value" or "write new value and return old".
Yeah, looking at the code my 'intuition' was "this is a classic data race, the program is broken" not "somewhere between 10 and 20".
I suppose if you assume all global accesses are atomic, it's a good demonstration that atomic operations don’t compose atomically (even in trivial cases) and aren't particularly useful as concurrency primitives.
The question is more testing your full understanding of why this is a race condition, including understanding the possible side effects, since that's the only thing you start out with when someone sends in a bug report.
I was once asked a very similar problem during a job interview for a position of a software engineer. I was offered to solve it on a piece of paper or orally. We ran out of time and they offered me to take it home. I did and I spent a few hours thinking about it. Google and ChatGPT were not able to produce better results than the intuitive answer, so I built a test program that confirmed the weird behaviour. After meditating for a few hours I was able to find the correct answer and I was indeed fascinated by this. Do you also think it's too much for a usual interview?
Kudos for solving this. And yes, this is too much for a job interview.
I don't think this an unreasonable puzzle for a role that involves problem-solving or security.
You can read the code, figure out what primitives you have, translate the problem into something simple (I like card and board games), then map it back.
Problem: two processes each do this ten times: - read the counter - do any number of other things in other processes - write back what you read, plus one.
Game: You have two piles of cards. Each card costs 1 move to play, but lets you discard any number of cards from other piles at the same time.
Solution: play one red card, discarding everything but one blue card. Play the single remaining blue card, discarding the leftover red cards.
Back: process 1 reads, process 2 runs 9 steps, process 1 writes, process 2 reads, process 1 finishes, process 2 writes.
As another commentator mentioned, you have violated a precondition, there's no point trying to understand what would happen because its indeterminate.
Seems pretty clear that the result could be anything from 1 to 20.
(Assumes atomic assignments, as noted by someone else.)
What sort of an interleaving would produce 1? Seems provably impossible to me, assuming atomic assignments.
The intuition I developed over the years says that result is unknown, it's a race condition, all bets are off. Specially since doing N+1.
[dead]