Discussion:
test-poll failing on FreeBSD 10/11
Pádraig Brady
2018-05-15 06:49:55 UTC
Permalink
Libvirt CI recently started running "make check" on our FreeBSD 10 & 11
hosts, and we see frequent failure of the test-poll unit test in gnulib
IIUC, gnulib is not actually building a replacement poll() function
on FreeBSD, it is merely running the gnulib test suite against the
FreeBSD system impl of poll() and hitting this behaviour.
$ ./gnulib/tests/test-poll
Unconnected socket test... passed
Connected sockets test... failed (expecting POLLHUP after shutdown)
General socket test with fork... failed (expecting POLLHUP after shutdown)
Pipe test... passed
Looking at the first failure in test_socket_pair method.
It sets up a listener socket, connects a client, accepts the client
and then closes the remote end. It expects the server's client socket
to thus show POLLHUP or POLLERR.
When it fails, the poll() is in fact still showing POLLOUT. If you put
a sleep between the close () and poll() calls, it will succeed.
So, IIUC, the test is racing with the BSD kernel's handling of socket
close - the test can't assume that just because the remote end of the
client has been closed, that poll() will immediately show POLLHUP|ERR.
Anyone have ideas on how to make this test more reliable and not depend
on the kernel synchronizing the close() state with poll() results
immediately ?
Regards,
Daniel
Yes that test looks racy as the network shutdown is async.
How about we s/nowait/wait/, and only check for input events.
The following works on Linux at least:

--- tests/test-poll.c 2018-05-14 23:46:09.595448490 -0700
+++ pb/gltests/test-poll.c 2018-05-14 23:45:46.827048159 -0700
@@ -334,8 +334,9 @@

test_pair (c1, c2);
close (c1);
- ASSERT (write (c2, "foo", 3) == 3);
- if ((poll1_nowait (c2, POLLIN | POLLOUT) & (POLLHUP | POLLERR)) == 0)
+
+ (void) write (c2, "foo", 3); // Initiate shutdown
+ if ((poll1_wait (c2, POLLIN) & (POLLHUP | POLLERR)) == 0)
failed ("expecting POLLHUP after shutdown");

close (c2);
Daniel P. Berrangé
2018-05-15 07:39:20 UTC
Permalink
Post by Pádraig Brady
Libvirt CI recently started running "make check" on our FreeBSD 10 & 11
hosts, and we see frequent failure of the test-poll unit test in gnulib
IIUC, gnulib is not actually building a replacement poll() function
on FreeBSD, it is merely running the gnulib test suite against the
FreeBSD system impl of poll() and hitting this behaviour.
$ ./gnulib/tests/test-poll
Unconnected socket test... passed
Connected sockets test... failed (expecting POLLHUP after shutdown)
General socket test with fork... failed (expecting POLLHUP after shutdown)
Pipe test... passed
Looking at the first failure in test_socket_pair method.
It sets up a listener socket, connects a client, accepts the client
and then closes the remote end. It expects the server's client socket
to thus show POLLHUP or POLLERR.
When it fails, the poll() is in fact still showing POLLOUT. If you put
a sleep between the close () and poll() calls, it will succeed.
So, IIUC, the test is racing with the BSD kernel's handling of socket
close - the test can't assume that just because the remote end of the
client has been closed, that poll() will immediately show POLLHUP|ERR.
Anyone have ideas on how to make this test more reliable and not depend
on the kernel synchronizing the close() state with poll() results
immediately ?
Regards,
Daniel
Yes that test looks racy as the network shutdown is async.
How about we s/nowait/wait/, and only check for input events.
--- tests/test-poll.c 2018-05-14 23:46:09.595448490 -0700
+++ pb/gltests/test-poll.c 2018-05-14 23:45:46.827048159 -0700
@@ -334,8 +334,9 @@
test_pair (c1, c2);
close (c1);
- ASSERT (write (c2, "foo", 3) == 3);
- if ((poll1_nowait (c2, POLLIN | POLLOUT) & (POLLHUP | POLLERR)) == 0)
+
+ (void) write (c2, "foo", 3); // Initiate shutdown
+ if ((poll1_wait (c2, POLLIN) & (POLLHUP | POLLERR)) == 0)
failed ("expecting POLLHUP after shutdown");
Unfortunately that doesn't help at all. I've tried various other tricks
to help synchronization too. For example I put sockets into blocking
mode and then did a read() which returns 0 for EOF as expected, but
poll() still returns POLLIN rather than POLLHUP after that read. I
also tried tweaking SO_LINGER to not effect. The only awful solution
I can make work is to simply sleep() a short while. eg this patch

diff --git a/tests/test-poll.c b/tests/test-poll.c
index c5e0a92de..e99ab3f4d 100644
--- a/tests/test-poll.c
+++ b/tests/test-poll.c
@@ -252,7 +252,7 @@ test_accept_first (void)
struct sockaddr_in ia;
socklen_t addrlen;
char buf[3];
- int c, pid;
+ int c, pid, i, rv;

pid = fork ();
if (pid < 0)
@@ -284,7 +284,12 @@ test_accept_first (void)
failed ("cannot read data left in the socket by closed process");
ASSERT (read (c, buf, 3) == 3);
ASSERT (write (c, "foo", 3) == 3);
- if ((poll1_wait (c, POLLIN | POLLOUT) & (POLLHUP | POLLERR)) == 0)
+ do {
+ rv = poll1_nowait (c, POLLIN | POLLOUT);
+ usleep(1000);
+ i++;
+ } while ((rv & (POLLHUP | POLLERR)) == 0 && i < 20);
+ if ((rv & (POLLHUP | POLLERR)) == 0)
failed ("expecting POLLHUP after shutdown");
close (c);
}
@@ -330,6 +335,7 @@ test_socket_pair (void)
int s = open_server_socket ();
int c1 = connect_to_socket (false);
int c2 = accept (s, (struct sockaddr *) &ia, &addrlen);
+ int i, rv;
ASSERT (s >= 0);
ASSERT (c1 >= 0);
ASSERT (c2 >= 0);
@@ -339,9 +345,14 @@ test_socket_pair (void)
test_pair (c1, c2);
close (c1);
ASSERT (write (c2, "foo", 3) == 3);
- if ((poll1_nowait (c2, POLLIN | POLLOUT) & (POLLHUP | POLLERR)) == 0)
- failed ("expecting POLLHUP after shutdown");
-
+
+ do {
+ rv = poll1_nowait (c2, POLLIN | POLLOUT);
+ usleep(1000);
+ i++;
+ } while ((rv & (POLLHUP | POLLERR)) == 0 && i < 20);
+ if ((rv & (POLLHUP | POLLERR)) == 0)
+ failed ("expecting POLLHUP after shutdown");
close (c2);
}



Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Loading...