[PATCH 04/18] posix: Allow glob to match dangling symlinks [BZ #866]

Discussion:

Paul Eggert

2017-08-31 22:11:39 UTC

Thanks for working to clear up this longstanding problem in glibc glob.
Here are the issues I found with the proposed patch:

* glob_in_dir calls gl_stat, which is a typo; it should call gl_lstat.

* The GLOB_MARK checks should use stat not lstat, since symlinks to
directories should be marked.

* For GLOB_TILDE and GLOB_TILDE_CHECK, the revised code still tests for
existence of the home directory as a directory. Other glob
implementations merely expand the ~ or ~foo and treat the result as a
literal string, and glibc should be consistent here.Â This is simpler
and avoids a stat call.

* __lstat64 needs to be defined in the !_LIBC case too.

* While looking into this I noticed that glob ignores directory entries
with d_ino == 0. Although currently GNU/Linux cannot return such
entries, POSIX allows d_ino == 0 so this assumption is not portable, and
this bug is in the neighborhood so we might as well fix it in a
separated patch (which simplifies the code).

* commit message says "remove tst-glob3" but the patch actually removes
bug-glob1-ARGS.

I fixed these problems (except for the last one) while merging the patch
into Gnulib, and this resulted in the attached patches which I installed
into Gnulib master. I'll CC: this to bug-gnulib accordingly. Please
consider merging the glibc-relevant parts of these patches back into
glibc, for your next iteration of this glibc proposal. As always, the
goal is for glob.c and related files to be identical in glibc and Gnulib.

Bruno Haible

2017-09-02 10:13:51 UTC

Permalink

Hi Paul,

Use uint_fast8_t not uint8_t, as C99 does not require uint8_t.

Such precautions are not needed, because
- gnulib for sure will not be ported to architectures where no
8-bit integer type exists.
- Whether C99 requires it or not, does not matter here: The 'glob'
module depends on 'stdint', and the 'stdint' module provides
uint8_t unconditionally (for several years already).

Bruno

Paul Eggert

2017-09-02 11:17:33 UTC

Permalink

Post by Bruno Haible
Such precautions are not needed, because
- gnulib for sure will not be ported to architectures where no
8-bit integer type exists.

Although that's true for much of Gnulib, many Gnulib modules would work just
fine on such hosts.

Post by Bruno Haible
- Whether C99 requires it or not, does not matter here: The 'glob'
module depends on 'stdint', and the 'stdint' module provides
uint8_t unconditionally (for several years already).

gl_STDINT_H's tests of uint8_t all depend on UINT8_MAX being defined, which is
not true on hosts lacking uint8_t, so stdint won't provide uint8_t merely
because a standard-conforming stdint.h lacks uint8_t. At least, that's the
intent of gl_STDINT_H.

Admittedly this issue is pretty esoteric nowadays.

We could finesse all this by simply using char instead of uint8_t, since the
values in question are all in the range 0..127 on all platforms we know about.

Bruno Haible

2017-09-02 11:46:12 UTC

Permalink

Hi Paul,

Post by Paul Eggert

What I meant is that test-stdint.c contains this code (unconditionally):

uint8_t b1[2] = { UINT8_C (17), UINT8_MAX };
verify (TYPE_MAXIMUM (uint8_t) == UINT8_MAX);

On all platform we've tested on, since 2006, this worked fine.

gl_STDINT_H is just a heuristic that implements the decision whether to provide
a substitute stdint.h or use the system-provided one.

A propos

verify (TYPE_MAXIMUM (uint8_t) == UINT8_MAX);

Hmm, should we make this test stronger by verifying that uint8_t is really
exactly 8 bits wide?

verify (UINT8_MAX == 0xFF);

Post by Paul Eggert
We could finesse all this by simply using char instead of uint8_t, since the
values in question are all in the range 0..127 on all platforms we know about.

This would produce code maintainability problems. It's good to distinguish
'char' and 'unsigned char', which are used for the elements of strings, from
uint8_t, for use cases such as an arithmetic type or enum value (DT_DIR).

Bruno

Paul Eggert

2017-09-02 18:23:38 UTC

Permalink

Post by Bruno Haible
On all platform we've tested on, since 2006, this worked fine.

Ah, I didn't see that. You're right, it's testing for POSIX, not just the C
standard.

Post by Bruno Haible
Hmm, should we make this test stronger by verifying that uint8_t is really
exactly 8 bits wide?
verify (UINT8_MAX == 0xFF);

I suppose it wouldn't hurt.

Paul Eggert

2017-09-02 10:40:18 UTC

Permalink

+ p = getpwnam (pwtmpbuf.data);

That won't work on non-glibc platforms that lack getpwnam_r, as the argument
should be 'name'. I installed the attached patch to Gnulib to fix this.

Bruno Haible

2017-09-02 11:17:32 UTC

Permalink

Document more readdir portability issues.

Let me add the list of platforms:

2017-09-02 Bruno Haible <***@clisp.org>

dirent: Update doc.
* doc/posix-headers/dirent.texi: More concrete list of platforms.

diff --git a/doc/posix-headers/dirent.texi b/doc/posix-headers/dirent.texi
index 0a19fd3..eab30ca 100644
--- a/doc/posix-headers/dirent.texi
+++ b/doc/posix-headers/dirent.texi
@@ -9,7 +9,8 @@ Portability problems fixed by Gnulib:
@itemize
@item
The type @code{ino_t} is missing on some platforms:
-glibc 2.8 and others.
+glibc 2.23 and others.
+
@end itemize

Portability problems not fixed by Gnulib:
@@ -21,8 +22,8 @@ MSVC 14.
@item
Although many systems define a @code{struct dirent} member named
@code{d_type} and directory entry type macros like @code{DT_DIR} and
-@code{DT_LINK}, some do not:
-AIX 7.2, HP-UX 11, Solaris 11, probably others.
+@code{DT_LNK}, some do not:
+Minix 3.1.8, AIX 7.2, HP-UX 11, IRIX 6.5, OSF/1 5.1, Solaris 11 2011-11, mingw, Interix 3.5, BeOS.

@item
On systems with @code{d_type}, not every filesystem supports
@@ -30,12 +31,16 @@ On systems with @code{d_type}, not every filesystem supports

@item
Some systems define a @code{struct dirent} member named @code{d_namlen}
-containing the string length of @code{d_name}, but others do not.
+containing the string length of @code{d_name}, but others do not:
+glibc 2.23 on Linux, Minix 3.1.8, Solaris 11 2011-11, Cygwin, BeOS.
+All of these, except Cygwin, have a member @code{d_reclen} instead,
+that has a different semantics.

@item
Some systems define a @code{struct dirent} member named @code{d_off}
containing a magic cookie suitable as an argument to @code{seekdir},
-but others do not.
+but others do not:
+glibc 2.23 on Hurd, Mac OS X 10.5, FreeBSD 6.0, NetBSD 5.0, OpenBSD 3.8, AIX 5.1, HP-UX 11, OSF/1 5.1, Cygwin, mingw, Interix 3.5, BeOS.

@item
Some systems define a @code{struct dirent} member named

Paul Eggert

2017-09-02 22:50:36 UTC

Permalink

There is no actual login to resize the buffer in case of the resizing
the buffer in case of ERANGE, so a static buffer using glibc default
LOGIN_NAME_MAX is suffice.

Although I had trouble parsing that, I think you're saying that because
the current glob.c goes awry when sysconf (_SC_LOGIN_NAME_MAX) < 0, it's
OK if we change glob.c to insist on a fixed-size limit of 255 bytes on
user name length so that glob continues to mishandle (presumably
mostly-theoretical) environments with longer user names. But that's not
the GNU style, which is to avoid arbitrary limits. Instead, let's fix
glob.c so that it doesn't need to know the user name length limit.
Obviously glob should use heap allocation for anything large, which
suggests that it should use a scratch buffer for the login name.

I looked into this, and it's easy enough to change glob.c to use the
tail of the scratch buffer that it's already using for getpwnam_r (given
your previously-proposed patches), and this simplifies glob's
memory-allocation code. I installed the attached patch into Gnulib to do
that. Please take a look at it for your next go-round with glibc. Thanks.

This is mostly-theoretical stuff, of course, as this code is exercised
only when $HOME is unset or empty.

+ char user_name[LOGIN_NAME_MAX];

A nit: that array needs to be one byte bigger, for the trailing NULL.
This point is irrelevant to the attached Gnulib patch, which doesn't use
LOGIN_NAME_MAX.