Discussion:
Retrying close() after error on Linux !?
Tim Rühsen
2018-03-14 15:04:32 UTC
Permalink
Hi,


just stumbled upon nonintr_close() in spawn-pipe.c which has this loop:

do
retval = close (fd);
while (retval < 0 && errno == EINTR);


Regarding the man page of libc, this shouldn't be done on certain
systems. E.g. it says the Linux kernel closes the file descriptor even
on EINTR. With the effect of possible race conditions in multi-threaded
code.


What do the experts here say ?


With Best Regards, Tim
Eric Blake
2018-03-14 15:13:37 UTC
Permalink
Post by Tim Rühsen
Hi,
do
retval = close (fd);
while (retval < 0 && errno == EINTR);
Regarding the man page of libc, this shouldn't be done on certain
systems. E.g. it says the Linux kernel closes the file descriptor even
on EINTR. With the effect of possible race conditions in multi-threaded
code.
What do the experts here say ?
Here's what POSIX has to say:
http://austingroupbugs.net/view.php?id=529#c1200

You are correct that retrying close (fd) is a bug in multi-threaded
programs. It is better to leak the fd (on the few implementations that
leave fd open on EINTR failure) than to risk corrupting multithreaded
state. If you are on a system that has posix_close() implemented (glibc
appears to not have done it yet, and I don't know if any other systems
have), then you are guaranteed that you can write race-free leak-free
code. But until then, it's better to err on the side of safety,
especially since it was only a minority of systems that leave fd open on
EINTR failure.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Eric Blake
2018-03-14 15:19:24 UTC
Permalink
Post by Eric Blake
Hi,
   do
     retval = close (fd);
   while (retval < 0 && errno == EINTR);
Regarding the man page of libc, this shouldn't be done on certain
systems. E.g. it says the Linux kernel closes the file descriptor even
on EINTR. With the effect of possible race conditions in multi-threaded
code.
What do the experts here say ?
http://austingroupbugs.net/view.php?id=529#c1200
You are correct that retrying close (fd) is a bug in multi-threaded
programs.  It is better to leak the fd (on the few implementations that
leave fd open on EINTR failure) than to risk corrupting multithreaded
state.  If you are on a system that has posix_close() implemented (glibc
appears to not have done it yet, and I don't know if any other systems
have), then you are guaranteed that you can write race-free leak-free
code.  But until then, it's better to err on the side of safety,
especially since it was only a minority of systems that leave fd open on
EINTR failure.
I didn't offer my conclusion: I think the existence of nonintr_close()
in gnulib is a bug, and should be removed. It's better to assume Linux
semantics (close ALWAYS destroys the fd, even when an error is reported)
which is safe for multithreaded apps, than to assume that EINTR leaves
the fd open (where assuming Linux semantics leaks on the few platforms
that leave the fd open, but a leak is safer than multithreaded
corruption when the wrong fd is closed on retry).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Tim Rühsen
2018-03-14 15:28:46 UTC
Permalink
Post by Eric Blake
Post by Eric Blake
Hi,
   do
     retval = close (fd);
   while (retval < 0 && errno == EINTR);
Regarding the man page of libc, this shouldn't be done on certain
systems. E.g. it says the Linux kernel closes the file descriptor even
on EINTR. With the effect of possible race conditions in multi-threaded
code.
What do the experts here say ?
http://austingroupbugs.net/view.php?id=529#c1200
You are correct that retrying close (fd) is a bug in multi-threaded
programs.  It is better to leak the fd (on the few implementations
that leave fd open on EINTR failure) than to risk corrupting
multithreaded state.  If you are on a system that has posix_close()
implemented (glibc appears to not have done it yet, and I don't know
if any other systems have), then you are guaranteed that you can write
race-free leak-free code.  But until then, it's better to err on the
side of safety, especially since it was only a minority of systems
that leave fd open on EINTR failure.
I didn't offer my conclusion: I think the existence of nonintr_close()
in gnulib is a bug, and should be removed.  It's better to assume Linux
semantics (close ALWAYS destroys the fd, even when an error is reported)
which is safe for multithreaded apps, than to assume that EINTR leaves
the fd open (where assuming Linux semantics leaks on the few platforms
that leave the fd open, but a leak is safer than multithreaded
corruption when the wrong fd is closed on retry).
Thanks for the explanation, Eric. Sounds reasonable to me.

There is also another occurrence in execute.c.

With Best Regards, Tim
Bruno Haible
2018-03-14 17:09:34 UTC
Permalink
Hi Eric,
Post by Eric Blake
Post by Tim Rühsen
do
retval = close (fd);
while (retval < 0 && errno == EINTR);
Regarding the man page of libc, this shouldn't be done on certain
systems. E.g. it says the Linux kernel closes the file descriptor even
on EINTR. With the effect of possible race conditions in multi-threaded
code.
What do the experts here say ?
http://austingroupbugs.net/view.php?id=529#c1200
Glad to see that this has been addressed in POSIX.

I definitely saw the need to handle EINTR with various system calls (read,
write, close) on early versions of Solaris, when a signal handler for SIGINT
is installed and the user presses Ctrl-C.

The types of fd on which this occurred were likely ttys and possibly NFS
mounted files (don't remember exactly).
Post by Eric Blake
It is better to leak the fd (on the few implementations that
leave fd open on EINTR failure) than to risk corrupting multithreaded
state.
No, not in all situations. When fd is a tty and you spawn a process that
is intended to communicate through a pipe, the usual end-of-stream handling
(EOF for reading, SIGPIPE for writing) will NOT happen if there is an fd
that has accidentally been left open. Therefore in spawn_pipe it is essential
that close() really closes what was intended.

I think we'll need to fall back to a #if based on platforms.

Bruno
Eric Blake
2018-03-14 17:30:46 UTC
Permalink
Post by Bruno Haible
Post by Eric Blake
http://austingroupbugs.net/view.php?id=529#c1200
Glad to see that this has been addressed in POSIX.
I think we'll need to fall back to a #if based on platforms.
If I'm reading the POSIX bug correctly, HPUX is the only system where
fds are left open on EINTR. Seems like a reasonable enough #if when
coupled with good comments pointing back to the POSIX discussion.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Eric Blake
2018-03-14 17:36:36 UTC
Permalink
Post by Eric Blake
Post by Bruno Haible
Post by Eric Blake
http://austingroupbugs.net/view.php?id=529#c1200
Glad to see that this has been addressed in POSIX.
I think we'll need to fall back to a #if based on platforms.
If I'm reading the POSIX bug correctly, HPUX is the only system where
fds are left open on EINTR.  Seems like a reasonable enough #if when
coupled with good comments pointing back to the POSIX discussion.
One other possible alternative: instead of calling close(fd) on a random
unknown fd, and worrying about the non-portability of EINTR behavior
when that fd is something that takes a long time to close, you could use
dup2()'s semantics to replace the original fd with something known to be
safe to close (for example, closing an fd open on /dev/null doesn't have
the risk of EINTR like you would have with closing an fd open on a pty
or tape device).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Bruno Haible
2018-03-14 17:52:16 UTC
Permalink
Hi Eric,
Post by Eric Blake
Post by Eric Blake
Post by Bruno Haible
I think we'll need to fall back to a #if based on platforms.
If I'm reading the POSIX bug correctly, HPUX is the only system where
fds are left open on EINTR.
Early Solaris is definitely in this camp as well.
Post by Eric Blake
Post by Eric Blake
Seems like a reasonable enough #if when
coupled with good comments pointing back to the POSIX discussion.
One other possible alternative: instead of calling close(fd) on a random
unknown fd, and worrying about the non-portability of EINTR behavior
when that fd is something that takes a long time to close, you could use
dup2()'s semantics to replace the original fd with something known to be
safe to close (for example, closing an fd open on /dev/null doesn't have
the risk of EINTR like you would have with closing an fd open on a pty
or tape device).
This is an interesting, creative idea. However, because few people use this
'dup2 as a kind of atomic close' workaround, I fear that it might lead us
into the proximity of OS bugs. Hardly anyone has tested how dup2 works
in the presence of signals and ttys, in the past.

Bruno
Paul Eggert
2018-03-14 18:06:39 UTC
Permalink
Post by Bruno Haible
This is an interesting, creative idea. However, because few people use this
'dup2 as a kind of atomic close' workaround, I fear that it might lead us
into the proximity of OS bugs. Hardly anyone has tested how dup2 works
in the presence of signals and ttys, in the past.
For what it's worth, in Emacs I wrote the attached code (in
src/sysdep.c) and it seems to have worked well enough for Emacs so far.
Perhaps something like this should be Gnulibified, and the rest of
Gnulib should call posix_close.

Loading...