Discussion:
bugs in test-lock
(too old to reply)
Bruno Haible
2017-01-05 18:42:47 UTC
Permalink
Raw Message
Torvald Riegel wrote [in private email, should have CCed bug-gnulib]:

[about the use of a lock in 'struct atomic_int' in test-lock.c:]
The delivered code, with a lock, is correct, however, right?
Not quite, unfortunately, because locks do not make any guarantees
regarding fairness. Thus, there is no guarantee that critical sections
querying the flag do not starve a critical sections that wants to set
the flag.
Good implementations will try to avoid starvation to some extent, but
that's not simple without potentially decreasing overall throughput.
Thus, in practice, you can't expect to never observe starvation.
Good point. Yes, if I have multiple threads executing a loop

for (;;)
{
pthread_mutex_lock (&lock);
do something;
pthread_mutex_unlock (&lock);
do very little other things;
}

it is very possible that a specific thread that also wants this lock
will have to wait way too long.

That's why I inserted the yield statements in the lock_checker_thread:

for (;;)
{
pthread_mutex_lock (&lock);
do something;
pthread_mutex_unlock (&lock);
do very little other things;
pthread_yield ();
}

It should have (and in my experiments, actually has) the effect that each
waiting thread has an average chance to get the lock -- this relies on a
fair scheduler in the kernel, of course -- and thus to limit the execution
time of test-lock.

EXPLICIT_YIELD = 0 -> 3.2 ... 3.5 sec (especially slow in test_once).
EXPLICIT_YIELD = 1 -> 1.8 sec
POSIX makes no explicit guarantee regarding forward progress (ie, things
like guaranteeing no starvation), but the specification gives indication
that concurrent sem_trywait or sem_wait should not starve a sem_post
(IOW, simplified, sem_post should complete eventually). I'd also say
that good-quality implementaitons of semaphores would not starve
sem_post, simply because you'd want to implement sem_trywait as just
atomic reads internally, and those won't starve concurrent atomic writes
(as is, for example, explicitly required in the C11/C++11 memory model).
Therefore, I'd suggest using semaphores for the completion flags.
What I actually need here, in lock_checker_done, is a notification mechanism
from the main thread to the checker threads.
I used 'volatile' - turned out to be very slow.
Now I use a pthread_mutex_t, which is a bit of an abuse.
sem_t may well be better than pthread_mutex_t, but is still a bit of an abuse:
semaphores are about taking and releasing access to a resource, without the
notion of "owner".
For a real notification mechanism, probably a pthread_cond_t would be the right
thing.

The problem here is: it's a unit test. If it fails, I don't want to search
for bugs in the condition variable subsystem, semaphore subsystem, AND the
lock subsystem. I want the test to rely essentially only on the lock subsystem.
Second, the mutex test is incorrect because the checker thread's
critical sections are allowed to starve the mutator threads' critical
sections. I'd suggest to either use a fixed number of iterations or a
fixed duration for both mutators and checkers, or to move the setting of
the completion flag to before the pthread_join for the mutators.
I think the yield () calls handle this danger; see above.
Third, the rwlock test is also incorrect because it is, in the general
case, implementation-defined whether readers or writers take preference.
https://bugzilla.redhat.com/show_bug.cgi?id=1410052
Fixing this requires either similar steps as for the exclusive/recursive
mutex tests, or using something like
PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP.
This was being discussed in the other thread. The glthread_rwlock_* functions
now give preference to the writer - on all platforms, not only on some as it
was before (Solaris, Mac OS X did it right already).

Bruno
Torvald Riegel
2017-01-05 20:57:12 UTC
Permalink
Raw Message
Post by Bruno Haible
[about the use of a lock in 'struct atomic_int' in test-lock.c:]
The delivered code, with a lock, is correct, however, right?
Not quite, unfortunately, because locks do not make any guarantees
regarding fairness. Thus, there is no guarantee that critical sections
querying the flag do not starve a critical sections that wants to set
the flag.
Good implementations will try to avoid starvation to some extent, but
that's not simple without potentially decreasing overall throughput.
Thus, in practice, you can't expect to never observe starvation.
Good point. Yes, if I have multiple threads executing a loop
for (;;)
{
pthread_mutex_lock (&lock);
do something;
pthread_mutex_unlock (&lock);
do very little other things;
}
it is very possible that a specific thread that also wants this lock
will have to wait way too long.
for (;;)
{
pthread_mutex_lock (&lock);
do something;
pthread_mutex_unlock (&lock);
do very little other things;
pthread_yield ();
}
Why do you think this works? We don't have uniprocessors anymore,
there's more that's affecting execution order than just the OS
scheduler.
Post by Bruno Haible
It should have (and in my experiments, actually has) the effect that each
waiting thread has an average chance to get the lock -- this relies on a
fair scheduler in the kernel, of course -- and thus to limit the execution
time of test-lock.
There's no guarantee that this will give fairness do calls before or
after it. The OS scheduler may place let other runnable processes run
first, but modern synchronization primitives try not to involve the OS
as much as possible because of the overheads this typically involves.
The hardware itself, cache misses, etc. have a huge influence on the
interleaving of executions.
Post by Bruno Haible
EXPLICIT_YIELD = 0 -> 3.2 ... 3.5 sec (especially slow in test_once).
EXPLICIT_YIELD = 1 -> 1.8 sec
POSIX makes no explicit guarantee regarding forward progress (ie, things
like guaranteeing no starvation), but the specification gives indication
that concurrent sem_trywait or sem_wait should not starve a sem_post
(IOW, simplified, sem_post should complete eventually). I'd also say
that good-quality implementaitons of semaphores would not starve
sem_post, simply because you'd want to implement sem_trywait as just
atomic reads internally, and those won't starve concurrent atomic writes
(as is, for example, explicitly required in the C11/C++11 memory model).
Therefore, I'd suggest using semaphores for the completion flags.
What I actually need here, in lock_checker_done, is a notification mechanism
from the main thread to the checker threads.
I used 'volatile' - turned out to be very slow.
Now I use a pthread_mutex_t, which is a bit of an abuse.
semaphores are about taking and releasing access to a resource, without the
notion of "owner".
The notification is not tied to a particular owner.
sem_trywait allows you to query whether a notification is available.
sem_post allows you to make a notification available.
Post by Bruno Haible
For a real notification mechanism, probably a pthread_cond_t would be the right
thing.
But you don't want to wait for a condition, you want to query whether a
condition holds. The latter is not supported by a condvar.
Post by Bruno Haible
The problem here is: it's a unit test. If it fails, I don't want to search
for bugs in the condition variable subsystem, semaphore subsystem, AND the
lock subsystem. I want the test to rely essentially only on the lock subsystem.
locks are the wrong mechanism. locks give you mutual exclusion. If you
want a simple notification that you can query without blocking, use a
semaphore.
Post by Bruno Haible
Second, the mutex test is incorrect because the checker thread's
critical sections are allowed to starve the mutator threads' critical
sections. I'd suggest to either use a fixed number of iterations or a
fixed duration for both mutators and checkers, or to move the setting of
the completion flag to before the pthread_join for the mutators.
I think the yield () calls handle this danger; see above.
No, unfortunately not; see above.
Post by Bruno Haible
Third, the rwlock test is also incorrect because it is, in the general
case, implementation-defined whether readers or writers take preference.
https://bugzilla.redhat.com/show_bug.cgi?id=1410052
Fixing this requires either similar steps as for the exclusive/recursive
mutex tests, or using something like
PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP.
This was being discussed in the other thread. The glthread_rwlock_* functions
now give preference to the writer - on all platforms, not only on some as it
was before (Solaris, Mac OS X did it right already).
I disagree with that design choice, as I explained in the other thread.
Bruno Haible
2017-01-05 23:11:56 UTC
Permalink
Raw Message
Post by Torvald Riegel
Post by Bruno Haible
The problem here is: it's a unit test. If it fails, I don't want to search
for bugs in the condition variable subsystem, semaphore subsystem, AND the
lock subsystem. I want the test to rely essentially only on the lock subsystem.
locks are the wrong mechanism. locks give you mutual exclusion. If you
want a simple notification that you can query without blocking, use a
semaphore.
OK, thanks for this clear advice. I've pushed the change below.

The timings look OK:
EXPLICIT_YIELD = 1 USE_SEMAPHORE = 1 1.79 sec
EXPLICIT_YIELD = 1 USE_SEMAPHORE = 0 1.78 sec
EXPLICIT_YIELD = 0 USE_SEMAPHORE = 1 3.3 .. 3.8 sec
EXPLICIT_YIELD = 0 USE_SEMAPHORE = 0 3.3 .. 3.9 sec
Post by Torvald Riegel
Post by Bruno Haible
for (;;)
{
pthread_mutex_lock (&lock);
do something;
pthread_mutex_unlock (&lock);
do very little other things;
pthread_yield ();
}
Why do you think this works?
I think this works because
- the manual page http://man7.org/linux/man-pages/man3/pthread_yield.3.html
says "The thread is placed at the end of the run queue". To me, this means
that if all participating threads call pthread_yield () once in a while,
all threads have a chance to be woken up and get the lock within a reasonable
amount of time.
- the timings of EXPLICIT_YIELD = 1 vs. 0 (above) show that it has some
effect.
Post by Torvald Riegel
We don't have uniprocessors anymore,
there's more that's affecting execution order than just the OS
scheduler.
...
There's no guarantee that this will give fairness to calls before or
after it. The OS scheduler may let other runnable processes run
first, but modern synchronization primitives try not to involve the OS
as much as possible because of the overheads this typically involves.
Are you suggesting that the pthread_yield manual page is wrong? Or that
some warning should be added to it? I'm sure Michael Kerrisk will accept inputs.
Post by Torvald Riegel
Post by Bruno Haible
For a real notification mechanism, probably a pthread_cond_t would be the right
thing.
But you don't want to wait for a condition, you want to query whether a
condition holds. The latter is not supported by a condvar.
Oh. Really, "wait queue" would be a better name than "condition variable"
for this thing.

Bruno


2017-01-05 Bruno Haible <***@clisp.org>

lock tests: Prefer semaphore over mutex.
* tests/test-lock.c (USE_SEMAPHORE): New constant.
(struct atomic_int, init_atomic_int, get_atomic_int_value,
set_atomic_int_value) [USE_SEMAPHORE]: Define using a POSIX semaphore.
Suggested by Torvald Riegel <***@redhat.com>.

diff --git a/tests/test-lock.c b/tests/test-lock.c
index 095511e..f3da4cc 100644
--- a/tests/test-lock.c
+++ b/tests/test-lock.c
@@ -51,12 +51,20 @@
#define EXPLICIT_YIELD 1

/* Whether to use 'volatile' on some variables that communicate information
- between threads. If set to 0, a lock is used to protect these variables.
- If set to 1, 'volatile' is used; this is theoretically equivalent but can
- lead to much slower execution (e.g. 30x slower total run time on a 40-core
- machine. */
+ between threads. If set to 0, a semaphore or a lock is used to protect
+ these variables. If set to 1, 'volatile' is used; this is theoretically
+ equivalent but can lead to much slower execution (e.g. 30x slower total
+ run time on a 40-core machine), because 'volatile' does not imply any
+ synchronization/communication between different CPUs. */
#define USE_VOLATILE 0

+#if USE_POSIX_THREADS
+/* Whether to use a semaphore to communicate information between threads.
+ If set to 0, a lock is used. If set to 1, a semaphore is used.
+ Uncomment this to reduce the dependencies of this test. */
+# define USE_SEMAPHORE 1
+#endif
+
/* Whether to print debugging messages. */
#define ENABLE_DEBUGGING 0

@@ -97,6 +105,10 @@

#include "glthread/thread.h"
#include "glthread/yield.h"
+#if USE_SEMAPHORE
+# include <errno.h>
+# include <semaphore.h>
+#endif

#if ENABLE_DEBUGGING
# define dbgprintf printf
@@ -128,6 +140,41 @@ set_atomic_int_value (struct atomic_int *ai, int new_value)
{
ai->value = new_value;
}
+#elif USE_SEMAPHORE
+/* This atomic_int implementation can only support the values 0 and 1.
+ It is initially 0 and can be set to 1 only once. */
+struct atomic_int {
+ sem_t semaphore;
+};
+static void
+init_atomic_int (struct atomic_int *ai)
+{
+ sem_init (&ai->semaphore, 0, 0);
+}
+static int
+get_atomic_int_value (struct atomic_int *ai)
+{
+ if (sem_trywait (&ai->semaphore) == 0)
+ {
+ if (sem_post (&ai->semaphore))
+ abort ();
+ return 1;
+ }
+ else if (errno == EAGAIN)
+ return 0;
+ else
+ abort ();
+}
+static void
+set_atomic_int_value (struct atomic_int *ai, int new_value)
+{
+ if (new_value == 0)
+ /* It's already initialized with 0. */
+ return;
+ /* To set the value 1: */
+ if (sem_post (&ai->semaphore))
+ abort ();
+}
#else
struct atomic_int {
gl_lock_define (, lock)
Torvald Riegel
2017-01-06 12:38:46 UTC
Permalink
Raw Message
Post by Bruno Haible
Post by Torvald Riegel
Post by Bruno Haible
The problem here is: it's a unit test. If it fails, I don't want to search
for bugs in the condition variable subsystem, semaphore subsystem, AND the
lock subsystem. I want the test to rely essentially only on the lock subsystem.
locks are the wrong mechanism. locks give you mutual exclusion. If you
want a simple notification that you can query without blocking, use a
semaphore.
OK, thanks for this clear advice. I've pushed the change below.
EXPLICIT_YIELD = 1 USE_SEMAPHORE = 1 1.79 sec
EXPLICIT_YIELD = 1 USE_SEMAPHORE = 0 1.78 sec
EXPLICIT_YIELD = 0 USE_SEMAPHORE = 1 3.3 .. 3.8 sec
EXPLICIT_YIELD = 0 USE_SEMAPHORE = 0 3.3 .. 3.9 sec
IIRC that's for the test where some threads have unlimited work and some
have bounded work, right? I would be careful in interpreting these
numbers, because you measure not just fairness, but also lock
contention, etc. These can behave in ways that might be surprising if
you're not looking at stuff like this constantly.
Post by Bruno Haible
Post by Torvald Riegel
Post by Bruno Haible
for (;;)
{
pthread_mutex_lock (&lock);
do something;
pthread_mutex_unlock (&lock);
do very little other things;
pthread_yield ();
}
Why do you think this works?
I think this works because
- the manual page http://man7.org/linux/man-pages/man3/pthread_yield.3.html
says "The thread is placed at the end of the run queue". To me, this means
that if all participating threads call pthread_yield () once in a while,
all threads have a chance to be woken up and get the lock within a reasonable
amount of time.
- the timings of EXPLICIT_YIELD = 1 vs. 0 (above) show that it has some
effect.
Post by Torvald Riegel
We don't have uniprocessors anymore,
there's more that's affecting execution order than just the OS
scheduler.
...
There's no guarantee that this will give fairness to calls before or
after it. The OS scheduler may let other runnable processes run
first, but modern synchronization primitives try not to involve the OS
as much as possible because of the overheads this typically involves.
Are you suggesting that the pthread_yield manual page is wrong? Or that
some warning should be added to it? I'm sure Michael Kerrisk will accept inputs.
I don't think it's necessarily wrong, but if there's no actual run queue
because you have more or the same number of cores (or HW threads)
available than what the number of runnable OS threads is, then this
won't push through a FIFO run queue and thus won't order them one after
the other.
Post by Bruno Haible
Post by Torvald Riegel
Post by Bruno Haible
For a real notification mechanism, probably a pthread_cond_t would be the right
thing.
But you don't want to wait for a condition, you want to query whether a
condition holds. The latter is not supported by a condvar.
Oh. Really, "wait queue" would be a better name than "condition variable"
for this thing.
Well, it's not a queue either because of spurious wake-ups being
allowed, for example.
What you can look at as hint is that there's no pthread_cond_trywait,
but there is a sem_trywait, for example.

Regarding the patch, why don't you just use the semaphore directly, but
wrap it as an atomic_int (which it is not)? Do you have to support
platforms that offer something like a semaphore, or a latch, or an
atomic flag (with a happens-before between setter and observer seeing
the flag set)?
Bruno Haible
2017-01-06 13:45:30 UTC
Permalink
Raw Message
Post by Torvald Riegel
Post by Bruno Haible
Are you suggesting that the pthread_yield manual page is wrong? Or that
some warning should be added to it? I'm sure Michael Kerrisk will accept inputs.
I don't think it's necessarily wrong, but if there's no actual run queue
because you have more or the same number of cores (or HW threads)
available than what the number of runnable OS threads is, then this
won't push through a FIFO run queue and thus won't order them one after
the other.
OK, I've forwarded your statement to Michael Kerrisk so that he may improve
the man page.
Post by Torvald Riegel
Regarding the patch, why don't you just use the semaphore directly, but
wrap it as an atomic_int (which it is not)?
1) I did so because of portability issues:
- GNU Pth (one of the platforms supported by the gnulib module) does not
have semaphores.
- I didn't want to dig into the semaphores API of Windows at this point.
- Testing among POSIX platforms is incomplete, and I may need to revert
back to mutexes on some (weird) platform.
2) In 10 years or so, we may wish to replace this 'struct atomic_int' by
something provided by <stdatomic.h>.

Bruno

Loading...