Discussion:
gnulib poll() returns POLLERR on Win32 platform, breaking libvirt
(too old to reply)
Daniel P. Berrange
2017-04-18 15:07:02 UTC
Permalink
Raw Message
We're seeing a failure in libvirt on Win32 platforms whereby poll() often
returns POLLERR. I traced this down to the rpl_recv() function calling
FD_TO_SOCKET() and getting INVALID_SOCKET back. This is propagated back
to windows_compute_revents_socket() which sets POLLERR, because
WSAGetLastError() returns 0.

The windows_compute_revents_socket() is indeed passing a bad file descriptor
to recv, because it is actually passing in a SOCKET handle.

After bisecting gnulib, I found the root cause of this problem is this commit:

commit 17f1e64f00011fb745019119e21b26e4aba65e4b
Author: Paul Eggert <***@cs.ucla.edu>
Date: Tue Feb 24 16:16:19 2015 -0800

poll: port to MSVC v18 on MS-Windows 8.1

Problem reported by Gisle Vanem in:
http://lists.gnu.org/archive/html/bug-gnulib/2015-02/msg00139.html
* lib/poll.c: Always include <sys/select.h> and <sys/socket.h>.
* modules/poll (Depends-on) [!HAVE_POLL || REPLACE_POLL]:
Add sys_socket.


by unconditionally pulling in sys/select.c & sys/socket.c, we are causing
poll() to call gnulib's replacement rpl_recv() wrapper, rather than the
original bare Winsock recv() method.

Reverting that change fixes libvirt, but obviously that's not a valid
approach, since it'll re-break MSVC v18 on Win8.1


So I tried the following change to make it pass in an FD into the rpl_recv
method instead of a HANDLE

@@ -571,7 +568,7 @@
if (FD_ISSET ((SOCKET) h, &xfds))
ev.lNetworkEvents |= FD_OOB;

- happened = windows_compute_revents_socket (pfd[i].fd, pfd[i].events,
+ happened = windows_compute_revents_socket ((SOCKET) h, pfd[i].events,
ev.lNetworkEvents);
}
else


that makes the recv() work, but now poll() just gets into an infinite loop
calling select() over and over again, never reporting that the socket is
now read()able. So something else is now broken. I guess we're now calling
into gnulib's rpl_select() instead of the bare Winsock select() and so
triggering other badness - presumably more fd vs handle mixups ?

Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
Gisle Vanem
2017-04-18 20:07:42 UTC
Permalink
Raw Message
Post by Daniel P. Berrange
that makes the recv() work, but now poll() just gets into an infinite loop
calling select() over and over again, never reporting that the socket is
now read()able. So something else is now broken. I guess we're now calling
into gnulib's rpl_select() instead of the bare Winsock select() and so
triggering other badness - presumably more fd vs handle mixups ?
Maybe the fix by Tim Rühsen here:
https://github.com/rockdaboot/wget2/issues/136#issuecomment-270692530

would help. But I too still have occasional problems with pool().
--
--gv
Paolo Bonzini
2017-04-19 15:36:54 UTC
Permalink
Raw Message
Post by Daniel P. Berrange
We're seeing a failure in libvirt on Win32 platforms whereby poll() often
returns POLLERR. I traced this down to the rpl_recv() function calling
FD_TO_SOCKET() and getting INVALID_SOCKET back. This is propagated back
to windows_compute_revents_socket() which sets POLLERR, because
WSAGetLastError() returns 0.
The windows_compute_revents_socket() is indeed passing a bad file descriptor
to recv, because it is actually passing in a SOCKET handle.
commit 17f1e64f00011fb745019119e21b26e4aba65e4b
Date: Tue Feb 24 16:16:19 2015 -0800
poll: port to MSVC v18 on MS-Windows 8.1
http://lists.gnu.org/archive/html/bug-gnulib/2015-02/msg00139.html
* lib/poll.c: Always include <sys/select.h> and <sys/socket.h>.
Add sys_socket.
by unconditionally pulling in sys/select.c & sys/socket.c, we are causing
poll() to call gnulib's replacement rpl_recv() wrapper, rather than the
original bare Winsock recv() method.
Reverting that change fixes libvirt, but obviously that's not a valid
approach, since it'll re-break MSVC v18 on Win8.1
So I tried the following change to make it pass in an FD into the rpl_recv
method instead of a HANDLE
@@ -571,7 +568,7 @@
if (FD_ISSET ((SOCKET) h, &xfds))
ev.lNetworkEvents |= FD_OOB;
- happened = windows_compute_revents_socket (pfd[i].fd, pfd[i].events,
+ happened = windows_compute_revents_socket ((SOCKET) h, pfd[i].events,
ev.lNetworkEvents);
}
else
You mean you applied the reverse of this patch, of course? For a
complete patch you'd also change windows_compute_revents_socket's first
argument to "int fd", and change it to use errno instead of WSAGetLastError.

But perhaps it's simpler to add "#undef recv" and "#undef select" near
the top of the file, in addition to this change.

Thanks,

Paolo
Daniel P. Berrange
2017-04-19 15:50:11 UTC
Permalink
Raw Message
Post by Paolo Bonzini
Post by Daniel P. Berrange
We're seeing a failure in libvirt on Win32 platforms whereby poll() often
returns POLLERR. I traced this down to the rpl_recv() function calling
FD_TO_SOCKET() and getting INVALID_SOCKET back. This is propagated back
to windows_compute_revents_socket() which sets POLLERR, because
WSAGetLastError() returns 0.
The windows_compute_revents_socket() is indeed passing a bad file descriptor
to recv, because it is actually passing in a SOCKET handle.
commit 17f1e64f00011fb745019119e21b26e4aba65e4b
Date: Tue Feb 24 16:16:19 2015 -0800
poll: port to MSVC v18 on MS-Windows 8.1
http://lists.gnu.org/archive/html/bug-gnulib/2015-02/msg00139.html
* lib/poll.c: Always include <sys/select.h> and <sys/socket.h>.
Add sys_socket.
by unconditionally pulling in sys/select.c & sys/socket.c, we are causing
poll() to call gnulib's replacement rpl_recv() wrapper, rather than the
original bare Winsock recv() method.
Reverting that change fixes libvirt, but obviously that's not a valid
approach, since it'll re-break MSVC v18 on Win8.1
So I tried the following change to make it pass in an FD into the rpl_recv
method instead of a HANDLE
@@ -571,7 +568,7 @@
if (FD_ISSET ((SOCKET) h, &xfds))
ev.lNetworkEvents |= FD_OOB;
- happened = windows_compute_revents_socket (pfd[i].fd, pfd[i].events,
+ happened = windows_compute_revents_socket ((SOCKET) h, pfd[i].events,
ev.lNetworkEvents);
}
else
You mean you applied the reverse of this patch, of course? For a
Sigh, yes.
Post by Paolo Bonzini
complete patch you'd also change windows_compute_revents_socket's first
argument to "int fd", and change it to use errno instead of WSAGetLastError.
Yep, I changed the arg, but didn't try the WSAGetLastError change.
Post by Paolo Bonzini
But perhaps it's simpler to add "#undef recv" and "#undef select" near
the top of the file, in addition to this change.
If we add the undef's then we don't need to change anything related to
windows_compute_revents_socket() as it'd be calling the native recv()
which expects the SOCKET rather than fd.

If we undef those functions, would that not re-introduce the bug mentioned
in http://lists.gnu.org/archive/html/bug-gnulib/2015-02/msg00139.html ?

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Paolo Bonzini
2017-04-19 16:23:33 UTC
Permalink
Raw Message
Post by Daniel P. Berrange
Post by Paolo Bonzini
But perhaps it's simpler to add "#undef recv" and "#undef select" near
the top of the file, in addition to this change.
If we add the undef's then we don't need to change anything related to
windows_compute_revents_socket() as it'd be calling the native recv()
which expects the SOCKET rather than fd.
If we undef those functions, would that not re-introduce the bug mentioned
in http://lists.gnu.org/archive/html/bug-gnulib/2015-02/msg00139.html ?
No, because that's itself a #define:

# define recv recv_used_without_including_sys_socket_h

that is introduced, for whatever reason I cannot fathom, by
lib/sys_time.in.h.

Thanks,

Paolo
Daniel P. Berrange
2017-04-19 16:30:17 UTC
Permalink
Raw Message
Post by Paolo Bonzini
Post by Daniel P. Berrange
Post by Paolo Bonzini
But perhaps it's simpler to add "#undef recv" and "#undef select" near
the top of the file, in addition to this change.
If we add the undef's then we don't need to change anything related to
windows_compute_revents_socket() as it'd be calling the native recv()
which expects the SOCKET rather than fd.
If we undef those functions, would that not re-introduce the bug mentioned
in http://lists.gnu.org/archive/html/bug-gnulib/2015-02/msg00139.html ?
# define recv recv_used_without_including_sys_socket_h
that is introduced, for whatever reason I cannot fathom, by
lib/sys_time.in.h.
Ok, it sounds like we only need the #undef to fix this all then

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Loading...