Pádraig Brady
2018-05-15 06:49:55 UTC
Libvirt CI recently started running "make check" on our FreeBSD 10 & 11
hosts, and we see frequent failure of the test-poll unit test in gnulib
IIUC, gnulib is not actually building a replacement poll() function
on FreeBSD, it is merely running the gnulib test suite against the
FreeBSD system impl of poll() and hitting this behaviour.
$ ./gnulib/tests/test-poll
Unconnected socket test... passed
Connected sockets test... failed (expecting POLLHUP after shutdown)
General socket test with fork... failed (expecting POLLHUP after shutdown)
Pipe test... passed
Looking at the first failure in test_socket_pair method.
It sets up a listener socket, connects a client, accepts the client
and then closes the remote end. It expects the server's client socket
to thus show POLLHUP or POLLERR.
When it fails, the poll() is in fact still showing POLLOUT. If you put
a sleep between the close () and poll() calls, it will succeed.
So, IIUC, the test is racing with the BSD kernel's handling of socket
close - the test can't assume that just because the remote end of the
client has been closed, that poll() will immediately show POLLHUP|ERR.
Anyone have ideas on how to make this test more reliable and not depend
on the kernel synchronizing the close() state with poll() results
immediately ?
Regards,
Daniel
Yes that test looks racy as the network shutdown is async.hosts, and we see frequent failure of the test-poll unit test in gnulib
IIUC, gnulib is not actually building a replacement poll() function
on FreeBSD, it is merely running the gnulib test suite against the
FreeBSD system impl of poll() and hitting this behaviour.
$ ./gnulib/tests/test-poll
Unconnected socket test... passed
Connected sockets test... failed (expecting POLLHUP after shutdown)
General socket test with fork... failed (expecting POLLHUP after shutdown)
Pipe test... passed
Looking at the first failure in test_socket_pair method.
It sets up a listener socket, connects a client, accepts the client
and then closes the remote end. It expects the server's client socket
to thus show POLLHUP or POLLERR.
When it fails, the poll() is in fact still showing POLLOUT. If you put
a sleep between the close () and poll() calls, it will succeed.
So, IIUC, the test is racing with the BSD kernel's handling of socket
close - the test can't assume that just because the remote end of the
client has been closed, that poll() will immediately show POLLHUP|ERR.
Anyone have ideas on how to make this test more reliable and not depend
on the kernel synchronizing the close() state with poll() results
immediately ?
Regards,
Daniel
How about we s/nowait/wait/, and only check for input events.
The following works on Linux at least:
--- tests/test-poll.c 2018-05-14 23:46:09.595448490 -0700
+++ pb/gltests/test-poll.c 2018-05-14 23:45:46.827048159 -0700
@@ -334,8 +334,9 @@
test_pair (c1, c2);
close (c1);
- ASSERT (write (c2, "foo", 3) == 3);
- if ((poll1_nowait (c2, POLLIN | POLLOUT) & (POLLHUP | POLLERR)) == 0)
+
+ (void) write (c2, "foo", 3); // Initiate shutdown
+ if ((poll1_wait (c2, POLLIN) & (POLLHUP | POLLERR)) == 0)
failed ("expecting POLLHUP after shutdown");
close (c2);