[PATCH 2/6] parse-datetime: fix dependency

Discussion:

[PATCH 2/6] parse-datetime: fix dependency

Paul Eggert

2017-09-26 01:29:09 UTC

* modules/parse-datetime (Depends-on): Depend
on nstrftime, not strftime.
---
ChangeLog | 4 ++++
modules/parse-datetime | 2 +-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 9c6d73f72..912a48e80 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,9 @@
2017-09-25 Paul Eggert <***@cs.ucla.edu>

+ parse-datetime: fix dependency
+ * modules/parse-datetime (Depends-on): Depend
+ on nstrftime, not strftime.
+
parse-datetime, posixtm: avoid uninit access
* lib/parse-datetime.y (parse_datetime2):
* lib/posixtm.c (posixtime):
diff --git a/modules/parse-datetime b/modules/parse-datetime
index 5dc6d4510..692da2572 100644
--- a/modules/parse-datetime
+++ b/modules/parse-datetime
@@ -17,8 +17,8 @@ gettext-h
intprops
inttypes
mktime
+nstrftime
setenv
-strftime
unsetenv
time
time_r

--
2.13.5

Paul Eggert

2017-09-26 01:29:10 UTC

* m4/sys_types_h.m4: Use https: URL.
---
ChangeLog | 3 +++
m4/sys_types_h.m4 | 4 ++--
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 912a48e80..0995e5d53 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,8 @@
2017-09-25 Paul Eggert <***@cs.ucla.edu>

+ sys_types: update URL
+ * m4/sys_types_h.m4: Use https: URL.
+
parse-datetime: fix dependency
* modules/parse-datetime (Depends-on): Depend
on nstrftime, not strftime.
diff --git a/m4/sys_types_h.m4 b/m4/sys_types_h.m4
index 06268cfb2..de56d04fc 100644
--- a/m4/sys_types_h.m4
+++ b/m4/sys_types_h.m4
@@ -1,4 +1,4 @@
-# sys_types_h.m4 serial 8
+# sys_types_h.m4 serial 9
dnl Copyright (C) 2011-2017 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
@@ -40,7 +40,7 @@ AC_DEFUN([gl_SYS_TYPES_H_DEFAULTS],
m4_version_prereq([2.70], [], [

# This is taken from the following Autoconf patch:
-# http://git.savannah.gnu.org/gitweb/?p=autoconf.git;a=commitdiff;h=e17a30e987d7ee695fb4294a82d987ec3dc9b974
+# https://git.savannah.gnu.org/cgit/autoconf.git/commit/?id=e17a30e987d7ee695fb4294a82d987ec3dc9b974

m4_undefine([AC_HEADER_MAJOR])
AC_DEFUN([AC_HEADER_MAJOR],

--
2.13.5

Paul Eggert

2017-09-26 01:29:13 UTC

* tests/uniname/test-uninames.c (fill_names, fill_aliases):
Check for integer overflow.
---
ChangeLog | 4 ++++
tests/uniname/test-uninames.c | 12 ++++++++----
2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index c1bf07eff..4585cf12b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,9 @@
2017-09-25 Paul Eggert <***@cs.ucla.edu>

+ uniname/uniname-tests: integer overflow fix
+ * tests/uniname/test-uninames.c (fill_names, fill_aliases):
+ Check for integer overflow.
+
duplocale-tests: fix unlikely crash
* tests/test-duplocale.c (get_locale_dependent_values):
Don’t crash with absurdly long month names.
diff --git a/tests/uniname/test-uninames.c b/tests/uniname/test-uninames.c
index dfa2c4230..476bb07f0 100644
--- a/tests/uniname/test-uninames.c
+++ b/tests/uniname/test-uninames.c
@@ -64,6 +64,7 @@ fill_names (const char *unicodedata_filename)
char *p;
char *comment;
unsigned int i;
+ unsigned long ul;

lineno++;

@@ -94,12 +95,13 @@ fill_names (const char *unicodedata_filename)
exit (EXIT_FAILURE);
}
*p = '\0';
- i = strtoul (field0, NULL, 16);
- if (i >= 0x110000)
+ ul = strtoul (field0, NULL, 16);
+ if (ul >= 0x110000)
{
fprintf (stderr, "index too large\n");
exit (EXIT_FAILURE);
}
+ i = ul;
unicode_names[i] = xstrdup (field1);
}
if (ferror (stream) || fclose (stream))
@@ -132,6 +134,7 @@ fill_aliases (const char *namealiases_filename)
char *p;
char *comment;
unsigned int uc;
+ unsigned long ul;

comment = strchr (line, '#');
if (comment != NULL)
@@ -161,12 +164,13 @@ fill_aliases (const char *namealiases_filename)
}
*p = '\0';

- uc = strtoul (field0, NULL, 16);
- if (uc >= 0x110000)
+ ul = strtoul (field0, NULL, 16);
+ if (ul >= 0x110000)
{
fprintf (stderr, "index too large\n");
exit (EXIT_FAILURE);
}
+ uc = ul;

if (aliases_count == ALIASLEN)
{

--
2.13.5

Bruno Haible

2017-09-27 00:01:20 UTC

Post by Paul Eggert
+ uniname/uniname-tests: integer overflow fix
+ Check for integer overflow.

Thanks, Paul. A small followup, to reduce the number of local variables:

2017-09-26 Bruno Haible <***@clisp.org>

uniname/uniname-tests: Tighten code.
* tests/uniname/test-uninames.c (fill_names, fill_aliases): Merge two
local variables into one.

diff --git a/tests/uniname/test-uninames.c b/tests/uniname/test-uninames.c
index 476bb07..46a9a91 100644
--- a/tests/uniname/test-uninames.c
+++ b/tests/uniname/test-uninames.c
@@ -63,8 +63,7 @@ fill_names (const char *unicodedata_filename)
{
char *p;
char *comment;
- unsigned int i;
- unsigned long ul;
+ unsigned long i;

lineno++;

@@ -95,13 +94,12 @@ fill_names (const char *unicodedata_filename)
exit (EXIT_FAILURE);
}
*p = '\0';
- ul = strtoul (field0, NULL, 16);
- if (ul >= 0x110000)
+ i = strtoul (field0, NULL, 16);
+ if (i >= 0x110000)
{
fprintf (stderr, "index too large\n");
exit (EXIT_FAILURE);
}
- i = ul;
unicode_names[i] = xstrdup (field1);
}
if (ferror (stream) || fclose (stream))
@@ -133,8 +131,7 @@ fill_aliases (const char *namealiases_filename)
{
char *p;
char *comment;
- unsigned int uc;
- unsigned long ul;
+ unsigned long uc;

comment = strchr (line, '#');
if (comment != NULL)
@@ -164,13 +161,12 @@ fill_aliases (const char *namealiases_filename)
}
*p = '\0';

- ul = strtoul (field0, NULL, 16);
- if (ul >= 0x110000)
+ uc = strtoul (field0, NULL, 16);
+ if (uc >= 0x110000)
{
fprintf (stderr, "index too large\n");
exit (EXIT_FAILURE);
}
- uc = ul;

if (aliases_count == ALIASLEN)
{

Paul Eggert

2017-09-26 01:29:11 UTC

* modules/chown-tests:
* modules/fchownat-tests, modules/fdutimensat-tests:
* modules/futimens-tests, modules/lchown-tests:
* modules/stat-time-tests, modules/utime-tests:
* modules/utimens-tests, modules/utimensat-tests:
Depend on intprops.
* tests/nap.h: Include intprops.h.
(diff_timespec): Handle overflow properly.
---
ChangeLog | 10 ++++++++++
modules/chown-tests | 1 +
modules/fchownat-tests | 1 +
modules/fdutimensat-tests | 1 +
modules/futimens-tests | 1 +
modules/lchown-tests | 1 +
modules/stat-time-tests | 1 +
modules/utime-tests | 1 +
modules/utimens-tests | 1 +
modules/utimensat-tests | 1 +
tests/nap.h | 21 +++++++++++++--------
11 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 0995e5d53..9935941df 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,15 @@
2017-09-25 Paul Eggert <***@cs.ucla.edu>

+ maint: fix overflow checking in nap.h
+ * modules/chown-tests:
+ * modules/fchownat-tests, modules/fdutimensat-tests:
+ * modules/futimens-tests, modules/lchown-tests:
+ * modules/stat-time-tests, modules/utime-tests:
+ * modules/utimens-tests, modules/utimensat-tests:
+ Depend on intprops.
+ * tests/nap.h: Include intprops.h.
+ (diff_timespec): Handle overflow properly.
+
sys_types: update URL
* m4/sys_types_h.m4: Use https: URL.

diff --git a/modules/chown-tests b/modules/chown-tests
index 58b965682..13138f438 100644
--- a/modules/chown-tests
+++ b/modules/chown-tests
@@ -7,6 +7,7 @@ tests/macros.h

Depends-on:
ignore-value
+intprops
lstat
mgetgroups
nanosleep
diff --git a/modules/fchownat-tests b/modules/fchownat-tests
index 4e289bda5..81adf7fe8 100644
--- a/modules/fchownat-tests
+++ b/modules/fchownat-tests
@@ -8,6 +8,7 @@ tests/macros.h

Depends-on:
ignore-value
+intprops
mgetgroups
nanosleep
openat-h
diff --git a/modules/fdutimensat-tests b/modules/fdutimensat-tests
index a77ada4f0..37f70c652 100644
--- a/modules/fdutimensat-tests
+++ b/modules/fdutimensat-tests
@@ -10,6 +10,7 @@ tests/macros.h
Depends-on:
fcntl-h
ignore-value
+intprops
nanosleep
openat
timespec
diff --git a/modules/futimens-tests b/modules/futimens-tests
index c7e9db2f0..519141300 100644
--- a/modules/futimens-tests
+++ b/modules/futimens-tests
@@ -10,6 +10,7 @@ Depends-on:
gettext-h
fcntl-h
ignore-value
+intprops
nanosleep
timespec
dup
diff --git a/modules/lchown-tests b/modules/lchown-tests
index d7288fea6..c5bba89d5 100644
--- a/modules/lchown-tests
+++ b/modules/lchown-tests
@@ -7,6 +7,7 @@ tests/macros.h

Depends-on:
ignore-value
+intprops
mgetgroups
nanosleep
stat-time
diff --git a/modules/stat-time-tests b/modules/stat-time-tests
index 18843de5a..c512eca76 100644
--- a/modules/stat-time-tests
+++ b/modules/stat-time-tests
@@ -4,6 +4,7 @@ tests/macros.h
tests/nap.h

Depends-on:
+intprops
nanosleep
time

diff --git a/modules/utime-tests b/modules/utime-tests
index 1d3da120e..a64d0a006 100644
--- a/modules/utime-tests
+++ b/modules/utime-tests
@@ -8,6 +8,7 @@ Depends-on:
dup
gettext-h
ignore-value
+intprops
nanosleep
symlink
timespec
diff --git a/modules/utimens-tests b/modules/utimens-tests
index d5e3085d2..2a95346a0 100644
--- a/modules/utimens-tests
+++ b/modules/utimens-tests
@@ -11,6 +11,7 @@ Depends-on:
dup
gettext-h
ignore-value
+intprops
nanosleep
symlink
timespec
diff --git a/modules/utimensat-tests b/modules/utimensat-tests
index 09e5cb15b..15c79407e 100644
--- a/modules/utimensat-tests
+++ b/modules/utimensat-tests
@@ -9,6 +9,7 @@ tests/macros.h

Depends-on:
ignore-value
+intprops
nanosleep
timespec
utimecmp
diff --git a/tests/nap.h b/tests/nap.h
index c16ee904e..24043c612 100644
--- a/tests/nap.h
+++ b/tests/nap.h
@@ -22,6 +22,8 @@
# include <limits.h>
# include <stdbool.h>

+# include <intprops.h>
+
/* Name of the witness file. */
#define TEMPFILE BASE "nap.tmp"

@@ -38,17 +40,20 @@ diff_timespec (struct timespec a, struct timespec b)
time_t bs = b.tv_sec;
int ans = a.tv_nsec;
int bns = b.tv_nsec;
+ int sdiff;
+
+ ASSERT (0 <= ans && ans < 2000000000);
+ ASSERT (0 <= bns && bns < 2000000000);

if (! (bs < as || (bs == as && bns < ans)))
return 0;
- if (as - bs <= INT_MAX / 1000000000)
- {
- int sdiff = (as - bs) * 1000000000;
- int usdiff = ans - bns;
- if (usdiff < INT_MAX - sdiff)
- return sdiff + usdiff;
- }
- return INT_MAX;
+
+ if (INT_SUBTRACT_WRAPV (as, bs, &sdiff)
+ || INT_MULTIPLY_WRAPV (sdiff, 1000000000, &sdiff)
+ || INT_ADD_WRAPV (sdiff, ans - bns, &sdiff))
+ return INT_MAX;
+
+ return sdiff;
}

/* If DO_WRITE, bump the modification time of the file designated by NAP_FD.

--
2.13.5

Paul Eggert

2017-09-26 01:29:12 UTC

* tests/test-duplocale.c (get_locale_dependent_values):
Don’t crash with absurdly long month names.
---
ChangeLog | 4 ++++
tests/test-duplocale.c | 3 ++-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 9935941df..c1bf07eff 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,9 @@
2017-09-25 Paul Eggert <***@cs.ucla.edu>

+ duplocale-tests: fix unlikely crash
+ * tests/test-duplocale.c (get_locale_dependent_values):
+ Don’t crash with absurdly long month names.
+
maint: fix overflow checking in nap.h
* modules/chown-tests:
* modules/fchownat-tests, modules/fdutimensat-tests:
diff --git a/tests/test-duplocale.c b/tests/test-duplocale.c
index 9b2c4cf1d..f48fedf6a 100644
--- a/tests/test-duplocale.c
+++ b/tests/test-duplocale.c
@@ -48,7 +48,8 @@ get_locale_dependent_values (struct locale_dependent_values *result)
snprintf (result->numeric, sizeof (result->numeric),
"%g", 3.5);
/* result->numeric is usually "3,5" */
- strcpy (result->time, nl_langinfo (MON_1));
+ strncpy (result->time, nl_langinfo (MON_1), sizeof result->time - 1);
+ result->time[sizeof result->time - 1] = '\0';
/* result->time is usually "janvier" */
}

--
2.13.5

Bruno Haible

2017-09-27 22:46:19 UTC

Post by Paul Eggert
@@ -48,7 +48,8 @@ get_locale_dependent_values (struct locale_dependent_values *result)
snprintf (result->numeric, sizeof (result->numeric),
"%g", 3.5);
/* result->numeric is usually "3,5" */
- strcpy (result->time, nl_langinfo (MON_1));
+ strncpy (result->time, nl_langinfo (MON_1), sizeof result->time - 1);
+ result->time[sizeof result->time - 1] = '\0';
/* result->time is usually "janvier" */
}

This change has replaced code with 1 drawback
- The string copy may overrun the buffer.
by code with 3 drawbacks
- The string copy may be silently truncated.
- The code needs 2 lines, instead of 1 line.
- In the common cases, the large result buffer gets needlessly filled
with NULs.

I think the best way to deal with this situation is the function 'strlcpy':

ASSERT (strlcpy (result->time, nl_langinfo (MON_1), sizeof result->time) < sizeof result->time);

This way,
- The string copy will not overrun the buffer.
- The string copy will always be NUL-terminated.
- Silent truncation does not occur.
- The code fits in one line.
- The code is not needlessly inefficient.

Here's a proposal to add 'strlcpy' to gnulib.

Yes, I have read the relevant documentation:
https://www.freebsd.org/cgi/man.cgi?query=strlcpy&sektion=3

and the discussions:
https://www.sourceware.org/ml/libc-alpha/2000-08/msg00052.html
https://lwn.net/Articles/612244/
https://sourceware.org/ml/libc-alpha/2014-09/msg00350.html
https://sourceware.org/glibc/wiki/strlcpy

The major argument against strlcpy is that it is not fool-proof:
If the caller ignores the return value, silent truncation can occur.
To prevent this, the proposed patch declares strlcpy with
__attribute__((__warn_unused_result__)) on all platforms.

Bruno

Paul Eggert

2017-09-27 23:27:01 UTC

I'd really rather not promote the use of strlcpy in GNU code, as it is
contrary to GNU style. Plus, I'm not a fan of __warn_unused_result__; in
my experience it's too often more trouble than it's worth, and I expect
this trouble would occur with strlcpy if we started using it.

That being said, I take your point that I should not have used strncpy
to pacify GCC in that test case: I plead guilty to being lazy because
the test-case code already has arbitrary limits.

Anyway, strlcpy is overkill here, as snprintf does the job just as well
here, and is already standard and supported by Gnulib. So I propose the
attached simpler patch instead.

Bruno Haible

2017-09-28 00:35:58 UTC

Hi Paul,

Post by Paul Eggert
Anyway, strlcpy is overkill here, as snprintf does the job just as well
here

Not at all. snprintf is not at the right level of abstraction.

There are three levels of abstractions to consider:

(1) C code which composes a memory region by writing to bytes individually.
(2) C code which works with strings.
(3) C code which does memory allocations, formatted output etc.

Here the task is to copy a string to a bounded memory area.

The level (1) - which is embodied by the "strncpy and set NUL byte" approach -
is too low-level, because
- it is easy to make mistakes at this level (off-by-1),
- it does not encourage the user to think about truncation and how to
handle it.

The level (3) - which is embodied by the "snprintf %s" approach =
is too high-level, because
- it is overkill to parse a format string when all you want to do is
copy a single string,
- snprintf can call malloc() and can fail with ENOMEM; these are undesired
outcomes here.

The level (2) is right for the task, but the standardized functions
(memcpy, strcpy, strncpy) are not up to the job:
- memcpy because it requires too much preparation code, and is still
vulnerable to off-by-1 mistakes,
- strcpy because it may produce buffer overruns,
- strncpy because it may produce silent truncation.

strlcpy with __warn_unused_result__ is the best solution for this task.

Post by Paul Eggert
I'd really rather not promote the use of strlcpy in GNU code, as it is
contrary to GNU style.

1) I do not promote "strlcpy" with the flaw of ignoring the return value.
I promote "strlcpy with __warn_unused_result__", which is an entirely
different beast, because it will make the users aware of the return
value and of the silent truncation issue. In some places the users
will notice that strlcpy does not buy them much, compared to the
"avoid arbitrary limits"[1] approach, and will switch over to what
you call "GNU style". In other places, they will insert an abort()
or assert() to handle the truncation case - which is *better* than
the strncpy approach.

2) You need to distinguish application code and test code.
The GNU standards [1] request to avoid arbitrary limits for programs.
It is completely OK to assume that month names are less than 100 bytes
long, *in unit tests*. If the same standards were set for test code
than for application code, it would become even more tedious and
boring to write unit tests than it already is.

Probably I should configure the Coverity scan of Gnulib such that
it puts the test code in a different area, so that we can apply
different filters and prioritizations to the tests, and so that
we won't be troubled by "sloppy" style in the tests.

Post by Paul Eggert
Plus, I'm not a fan of __warn_unused_result__; in
my experience it's too often more trouble than it's worth

We have a module 'ignore-value' that deals with it; so the trouble
you are mentioning must be a trouble in the past, not in the future.

Bruno

[1] https://www.gnu.org/prep/standards/html_node/Semantics.html

Bruno Haible

2017-09-28 00:44:53 UTC

Post by Bruno Haible
In some places the users
will notice that strlcpy does not buy them much, compared to the
"avoid arbitrary limits"[1] approach, and will switch over to what
you call "GNU style". In other places, they will insert an abort()
or assert() to handle the truncation case - which is *better* than
the strncpy approach.

For example, in gnulib's setlocale.c override. This file has fixed-size
buffers and silent truncation - because it uses the "strncpy and set NUL byte"
approach. As soon as someone (me) will use strlcpy with __warn_unused_result__
there, he will change the code to do
- either dynamic allocation and no arbitrary limits,
- or provide a good alternative to the silent truncation.
Either result will be better than the current one.

Bruno

Jim Meyering

2017-09-28 00:59:45 UTC

Post by Bruno Haible

Post by Bruno Haible
In some places the users
will notice that strlcpy does not buy them much, compared to the
"avoid arbitrary limits"[1] approach, and will switch over to what
you call "GNU style". In other places, they will insert an abort()
or assert() to handle the truncation case - which is *better* than
the strncpy approach.

For example, in gnulib's setlocale.c override. This file has fixed-size
buffers and silent truncation - because it uses the "strncpy and set NUL byte"
approach. As soon as someone (me) will use strlcpy with __warn_unused_result__
there, he will change the code to do
- either dynamic allocation and no arbitrary limits,
- or provide a good alternative to the silent truncation.
Either result will be better than the current one.

I have to agree that we need something, and it may even be spelled
strlcpy. It seems reasonable to allow new uses of strlcpy, given a
__warn_unused_result__ attribute -- that should prevent those new
users from ignoring that return value. I developed a strong aversion
to strncpy, and wrote this about it:
https://meyering.net/crusade-to-eliminate-strncpy/

Bruno Haible

2017-09-28 17:54:37 UTC

Hi Jim,

Post by Jim Meyering
https://meyering.net/crusade-to-eliminate-strncpy/

Thanks for your voice and past effort here.

Here's doc I propose to add to the gnulib documentation (and similar one
to wcscpy and wcsncpy):

diff --git a/doc/posix-functions/strcpy.texi b/doc/posix-functions/strcpy.texi
index 3289362..89c6cd3 100644
--- a/doc/posix-functions/strcpy.texi
+++ b/doc/posix-functions/strcpy.texi
@@ -17,3 +17,6 @@ OS X 10.8.
Portability problems not fixed by Gnulib:
@itemize
@end itemize
+
+Note: @code{strcpy (dst, src)} is only safe to use when you can guarantee that
+there are at least @code{strlen (src) + 1} bytes allocated at @code{dst}.
diff --git a/doc/posix-functions/strncpy.texi b/doc/posix-functions/strncpy.texi
index 3cc6b45..087acaf 100644
--- a/doc/posix-functions/strncpy.texi
+++ b/doc/posix-functions/strncpy.texi
@@ -17,3 +17,12 @@ OS X 10.8.
Portability problems not fixed by Gnulib:
@itemize
@end itemize
+
+Note: This function was designed for the use-case of filling a fixed-size
+record with a string, before writing it to a file. This function is
+@strong{not} appropriate for copying a string into a bounded memory area,
+because you have no guarantee that the result will be NUL-terminated.
+Even if you add the NUL byte at the end yourself, this function is
+inefficient (as it spends time clearing unused memory) and will allow
+silent truncation occur, which is not a good behavior for GNU programs.
+For more details, see @see{https://meyering.net/crusade-to-eliminate-strncpy/}.

Paul Eggert

2017-09-28 18:02:45 UTC

Post by Bruno Haible
Here's doc I propose to add to the gnulib documentation (and similar one

Thanks, that looks good. I too share an aversion to strncpy, and wish
that I had not stirred up this hornet's nest by using it in a moment of
weakness.

Bruno Haible

2017-10-03 20:03:29 UTC

Post by Paul Eggert

Post by Bruno Haible
Here's doc I propose to add to the gnulib documentation (and similar one

Thanks, that looks good. I too share an aversion to strncpy

ok, I'm committing this doc change:

2017-10-03 Bruno Haible <***@clisp.org>

doc: warn about misuse of strncpy and wcsncpy.
* doc/posix-functions/strcpy.texi: Describe requirements on prior
memory allocation.
* doc/posix-functions/wcscpy.texi: Likewise.
* doc/posix-functions/strncpy.texi: Describe what this function is not
useful for.
* doc/posix-functions/wcsncpy.texi: Likewise.

diff --git a/doc/posix-functions/strcpy.texi b/doc/posix-functions/strcpy.texi
index 3289362..89c6cd3 100644
--- a/doc/posix-functions/strcpy.texi
+++ b/doc/posix-functions/strcpy.texi
@@ -17,3 +17,6 @@ OS X 10.8.
Portability problems not fixed by Gnulib:
@itemize
@end itemize
+
+Note: @code{strcpy (dst, src)} is only safe to use when you can guarantee that
+there are at least @code{strlen (src) + 1} bytes allocated at @code{dst}.
diff --git a/doc/posix-functions/strncpy.texi b/doc/posix-functions/strncpy.texi
index 3cc6b45..5e22233 100644
--- a/doc/posix-functions/strncpy.texi
+++ b/doc/posix-functions/strncpy.texi
@@ -17,3 +17,12 @@ OS X 10.8.
Portability problems not fixed by Gnulib:
@itemize
@end itemize
+
+Note: This function was designed for the use-case of filling a fixed-size
+record with a string, before writing it to a file. This function is
+@strong{not} appropriate for copying a string into a bounded memory area,
+because you have no guarantee that the result will be NUL-terminated.
+Even if you add the NUL byte at the end yourself, this function is
+inefficient (as it spends time clearing unused memory) and will allow
+silent truncation to occur, which is not a good behavior for GNU programs.
+For more details, see @see{https://meyering.net/crusade-to-eliminate-strncpy/}.
diff --git a/doc/posix-functions/wcscpy.texi b/doc/posix-functions/wcscpy.texi
index fbac143..5e4cb01 100644
--- a/doc/posix-functions/wcscpy.texi
+++ b/doc/posix-functions/wcscpy.texi
@@ -19,3 +19,7 @@ Portability problems not fixed by Gnulib:
On AIX and Windows platforms, @code{wchar_t} is a 16-bit type and therefore cannot
accommodate all Unicode characters.
@end itemize
+
+Note: @code{wcscpy (dst, src)} is only safe to use when you can guarantee that
+there are at least @code{wcslen (src) + 1} wide characters allocated at
+@code{dst}.
diff --git a/doc/posix-functions/wcsncpy.texi b/doc/posix-functions/wcsncpy.texi
index 4a6794a..a9a555e 100644
--- a/doc/posix-functions/wcsncpy.texi
+++ b/doc/posix-functions/wcsncpy.texi
@@ -16,6 +16,15 @@ IRIX 5.3, Solaris 2.5.1.
Portability problems not fixed by Gnulib:
@itemize
@item
-On AIX and Windows platforms, @code{wchar_t} is a 16-bit type and therefore cannot
-accommodate all Unicode characters.
+On AIX and Windows platforms, @code{wchar_t} is a 16-bit type and therefore
+cannot accommodate all Unicode characters.
@end itemize
+
+Note: This function has no real use: It cannot be used for filling a fixed-size
+record with a wide string, before writing it to a file, because the wide string
+encoding is platform dependent and, on some platforms, also locale dependent.
+And this function is @strong{not} appropriate for copying a wide string into a
+bounded memory area, because you have no guarantee that the result will be
+null-terminated. Even if you add the null character at the end yourself, this
+function is inefficient (as it spends time clearing unused memory) and will
+allow silent truncation to occur, which is not a good behavior for GNU programs.

Paul Eggert

2017-09-28 01:23:32 UTC

Post by Bruno Haible
strlcpy with __warn_unused_result__ is the best solution for this task.

No it's not, because strlcpy always computes the string length of the
source, and that is neither necessary nor desirable. Furthermore, in the
bad style of programming that uses strlcpy, it's often acceptable to
ignore the result since the programmer *wants* silent truncation. That
is what strlcpy means in OpenBSD etc., and we shouldn't be trying to
reuse their name for a function with different semantics.

A better way to fix the test case is to remove its arbitrary limit on
month name lengths. That way, it won't need snprintf or strlcpy or
strncpy or anything like that. And it'll be following the GNU coding
standards better. I can propose a patch along those lines, if you like.
I do not want Gnulib to promote the use of strlcpy; that's a step in the
wrong direction.

Post by Bruno Haible
If the same standards were set for test code
than for application code, it would become even more tedious and
boring to write unit tests than it already is.

In that case, snprintf suffices: maybe snprintf is not good enough for
production code, but it's plenty good for this test case.

If you really want a function whose semantics are like strlcpy's but has
__warn_unused_result__ semantics (and while we're at it, also does not
count overlong sources, because that's silly), then I suppose we could
invent one and use it for Gnulib. But we should not call it strlcpy, and
we shouldn't use it in production code.

Dmitry Selyutin

2017-09-28 06:29:42 UTC

How about strscpy from the Linux kernel?

https://www.kernel.org/doc/htmldocs/kernel-api/API-strscpy.html

Post by Paul Eggert

Post by Bruno Haible
strlcpy with __warn_unused_result__ is the best solution for this task.

No it's not, because strlcpy always computes the string length of the
source, and that is neither necessary nor desirable. Furthermore, in the
bad style of programming that uses strlcpy, it's often acceptable to ignore
the result since the programmer *wants* silent truncation. That is what
strlcpy means in OpenBSD etc., and we shouldn't be trying to reuse their
name for a function with different semantics.
A better way to fix the test case is to remove its arbitrary limit on
month name lengths. That way, it won't need snprintf or strlcpy or strncpy
or anything like that. And it'll be following the GNU coding standards
better. I can propose a patch along those lines, if you like. I do not want
Gnulib to promote the use of strlcpy; that's a step in the wrong direction.
If the same standards were set for test code

Post by Bruno Haible
than for application code, it would become even more tedious and
boring to write unit tests than it already is.

In that case, snprintf suffices: maybe snprintf is not good enough for
production code, but it's plenty good for this test case.
If you really want a function whose semantics are like strlcpy's but has
__warn_unused_result__ semantics (and while we're at it, also does not
count overlong sources, because that's silly), then I suppose we could
invent one and use it for Gnulib. But we should not call it strlcpy, and we
shouldn't use it in production code.

Tim Rühsen

2017-09-28 10:08:02 UTC

Post by Dmitry Selyutin
How about strscpy from the Linux kernel?
https://www.kernel.org/doc/htmldocs/kernel-api/API-strscpy.html

As an application library & programmer, I need this (thanks, Dmitry)
*and* strlcpy. A gnulib module would reduce configure.ac code and
several checks - and thus would be very welcome (thanks, Bruno).

In production code strlcpy replaces snprintf with format string "%s" -
the return value (length of source string) often comes in handy for
malloc fallbacks.

But I already stumbled into the side-effects of strlcpy in the past.
Determining the length of the source is sometimes dangerous and may lead
to unexpected CPU cycle waste. So this strscpy function seems to be a
good choice if you are not interested in the source length.

With Best Regards, Tim

Paul Eggert

2017-09-28 18:01:34 UTC

Post by Dmitry Selyutin
How about strscpy from the Linux kernel?

Yes, that's a better API than strlcpy. Though I would change its ssize_t
to ptrdiff_t, so that it depends only on stddef.h. Plus, I would define
its behavior even if the buffers overlap - that's safer, and the
function is for when safety is more important than performance or
correctness.

Bruno Haible

2017-09-28 18:39:23 UTC

Post by Paul Eggert
If you really want a function whose semantics are like strlcpy's but has
__warn_unused_result__ semantics ..., then I suppose we could
invent one and use it for Gnulib. But we should not call it strlcpy

Yes, I do want such a function for copying a string to a bounded memory area.

What you don't like about strlcpy:
- The name.
- The return value: why compute strlen(src) when the function does not
otherwise need it?

The reason for this design is that in BSD, it is common practice to try
a call with a fixed-size stack-allocated or statically allocated buffer,
and try dynamic memory only when this first attempt fails. The return value
is thus useful for the second step. For system calls this is a reasonable
design, but not here: the caller can call strlen(src) by himself. It is
strange design to do the first part of the second step in the function
which is supposed to do the first step. My take on this is that such
an algorithm should be abstracted in a facility of its own, like
<scratch-buffer.h> or <linebuffer.h>.

Other things I don't like about strlcpy:
- It's hard to remember whether the return value should be tested
as >= dstsize, > dstsize, <= dstsize, < dstsize. As Jim writes [1]
"Some of us forget to read documentation. Or read it, and then forget
details."
- The argument order: while snprintf, strftime, strfmon, and others
receive the destination buffer size immediately following the
destination buffer address, strlcpy wants it as third argument.
Which is against the principle "keep together what belongs together".

Dmitry suggested to look at strscpy [2]. About this one I don't like
- The return value convention: It's suitable for in-kernel APIs only.
- The mixed use of ssize_t vs. size_t.
- The argument order.

Here's my take:

/* Copies the string SRC, possibly truncated, into the bounded memory area
[DST,...,DST+DSTSIZE-1].
If the result fits, it returns 0.
If DSTSIZE is 0, it returns -1 and sets errno to EINVAL.
If DSTSIZE is nonzero and the result does not fit, it produces a NUL-
terminated truncated result, and returns -1 and sets errno to E2BIG. */
extern int strgcpy (char *__dst, size_t __dstsize, const char *__src)
__attribute__ ((__nonnull__ (1, 3)))
__attribute__ ((__warn_unused_result__));

Bruno

[1] https://meyering.net/crusade-to-eliminate-strncpy/
[2] https://www.kernel.org/doc/htmldocs/kernel-api/API-strscpy.html

Dmitry Selyutin

2017-09-28 19:36:39 UTC

Post by Bruno Haible

Post by Paul Eggert
Though I would change its ssize_t
to ptrdiff_t, so that it depends only on stddef.h.

- The mixed use of ssize_t vs. size_t.

What's wrong with ssize_t? As for me, ptrdiff_t is not a good choice here,
because it represents not length, but difference between pointers. String
length IS a difference between pointer in some way, but well, I think that's
not the first allegory that comes to mind when one talks about strings. As for
me, the only reason to avoid ssize_t is a dependency to stddef.h, but well, I
think it is going to be used in most projects anyway, especially if these
projects are somewhat complex. I thought size_t also comes from stddef, but I
may be wrong. :-)

Post by Bruno Haible
- The argument order.

That's also I think largely a historical issue. BTW, on x86_64 it makes some
sense due to use of $rdi/$rsi. The only register that must be copied is $rcx.
After then, one can implement memcpy/strcpy in just a few lines of assembly.
Anyway, I agree that the argument order should better be changed, to avoid
confusing strgcpy with its relatives (strncpy, strlcpy, strscpy).

Post by Bruno Haible
If DSTSIZE is 0, it returns -1 and sets errno to EINVAL.
If DSTSIZE is nonzero and the result does not fit, it produces a NUL-
terminated truncated result, and returns -1 and sets errno to E2BIG.

I really dislike the idea of setting errno, since it involves thread-local
storage. I know it's a "cold" path, but nevertheless, I really favor returning
error codes if it is possible. And it is possible with ssize_t. Moreover, more
modern POSIX functions do actually favor returning error code directly.

Anyway, I really think it is a good idea to provide yet another function, for
me it seems the most logical choice; I cannot guess what "g" in strgcpy means
though. Guarded? GNU? Generic? Godlike? :-)
The only possible reason to avoid doing such change is that some people may
think that these Open Source guys went crazy and invented at least three
functions to copy strings (four, including strscpy from the Linux kernel);
moreover, since some people do not actually know the difference between strcpy
and strcat, the situation is even worse. :-)

--
With best regards,
Dmitry Selyutin

Paul Eggert

2017-09-28 22:23:32 UTC

ptrdiff_t is not a good choice here, > because it represents not length, but difference between pointers

They're the same concept in C. In 7th Edition Unix strlen returned int,
and this was better than having it return an unsigned type due to common
problems with wraparound arithmetic and comparisons with unsigned types.
In GNU Emacs source code, the general style has been to prefer signed
integers for indexes and sizes and the like, and this has helped us gain
reliability because we can catch integer overflows better by compiling
with -fsanitize=undefined, bugs that we couldn't catch so easily if we
used unsigned integers.

This is why I suggest ptrdiff_t for new low-level APIs. Although it
limits object size to PTRDIFF_MAX, that limitation is present already
because in practice behavior is undefined when you compute p1-p2 when
((char *) p1 - (char *) p2) does not fit into ptrdiff_t, which means
that objects larger than PTRDIFF_MAX are dangerous and should be avoided
anyway.

ssize_t is a worse choice for portable software, as it is not in
stddef.h and on a few old platforms ssize_t is narrower than size_t
(POSIX allows this).

All this is moot of course if any new function returns bool or void, as
I suspect it should.

Paul Eggert

2017-09-28 22:19:07 UTC

Post by Bruno Haible
in BSD, it is common practice to try
a call with a fixed-size stack-allocated or statically allocated buffer,
and try dynamic memory only when this first attempt fails.

This doesn't match my experience with BSD application code, where the
most common practice is to call strlcpy and ignore the return value. I
looked at the source for OpenSSH 7.5 (the current version). Of its 56
calls to strlcpy, only 15 check the return value, and none of these
calls try dynamic allocation later (they all simply fail in the caller
somehow). Of the 41 calls that ignore the return value, sometimes this
is a bug and sometimes not, and often when it is not a bug plain strcpy
would work as well. So the strlcpy API appears to be a poor fit for
OpenSSH's needs.

Post by Bruno Haible
/* Copies the string SRC, possibly truncated, into the bounded memory area
[DST,...,DST+DSTSIZE-1].
If the result fits, it returns 0.
If DSTSIZE is 0, it returns -1 and sets errno to EINVAL.
If DSTSIZE is nonzero and the result does not fit, it produces a NUL-
terminated truncated result, and returns -1 and sets errno to E2BIG. */
extern int strgcpy (char *__dst, size_t __dstsize, const char *__src)
__attribute__ ((__nonnull__ (1, 3)))
__attribute__ ((__warn_unused_result__));

Some comments:

* I suggest a name that does not begin with "str", as that's reserved to
the implementation.

* I too am bothered by this function setting errno (and using two
different errno values to boot); it'd be simpler for it to just to
return a value, or to abort if the destination is too small.

* If the intent is that this function be used for test-case code that
aborts if the source is too long, then I suggest that the function
simply return 'void' and abort if the source is too long, as that would
be more convenient for the callers. If the intent is something else, I'd
like to know what it is so that I can think about it more.

Thien-Thi Nguyen

2017-09-26 06:19:37 UTC

() Paul Eggert <***@cs.ucla.edu>
() Mon, 25 Sep 2017 18:29:08 -0700

Do not access uninitialized storage, even though the
resulting value is never used.

[...]

- tm0 = tm;
+ tm0.tm_sec = tm.tm_sec;
+ tm0.tm_min = tm.tm_min;
+ tm0.tm_hour = tm.tm_hour;
+ tm0.tm_mday = tm.tm_mday;
+ tm0.tm_mon = tm.tm_mon;
+ tm0.tm_year = tm.tm_year;
+ tm0.tm_isdst = tm.tm_isdst;

Start = mktime_z (tz, &tm);

These changes look like a step backward in code readability,
which could prompt a naive programmer to propose their reversion
in the future (ping-pong problem). Maybe add a preemptive
comment, or factor the assignments into a macro (w/ comment)?

--
Thien-Thi Nguyen -----------------------------------------------
(defun responsep (query)
(pcase (context query)
(`(technical ,ml) (correctp ml))
...)) 748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502

Paul Eggert

2017-09-26 06:46:38 UTC

Post by Thien-Thi Nguyen
Maybe add a preemptive
comment, or factor the assignments into a macro (w/ comment)?

It's not worth a macro. Feel free to add a comment.

Pádraig Brady

2017-11-24 01:10:25 UTC

Do not access uninitialized storage, even though the resulting
value is never used.
---
ChangeLog | 8 ++++++++
lib/parse-datetime.y | 16 ++++++++++++++--
lib/posixtm.c | 7 ++++++-
3 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 386986ee7..9c6d73f72 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,11 @@
+
+ parse-datetime, posixtm: avoid uninit access
+ Do not access uninitialized storage, even though the resulting
+ value is never used.
+
vma-iter: Improvements for BSD platforms.
diff --git a/lib/parse-datetime.y b/lib/parse-datetime.y
index 9eff2dc3b..f8da02d3f 100644
--- a/lib/parse-datetime.y
+++ b/lib/parse-datetime.y
@@ -2034,7 +2034,13 @@ parse_datetime2 (struct timespec *result, char const *p,
if (pc.local_zones_seen)
tm.tm_isdst = pc.local_isdst;
- tm0 = tm;
+ tm0.tm_sec = tm.tm_sec;
+ tm0.tm_min = tm.tm_min;
+ tm0.tm_hour = tm.tm_hour;
+ tm0.tm_mday = tm.tm_mday;
+ tm0.tm_mon = tm.tm_mon;
+ tm0.tm_year = tm.tm_year;
+ tm0.tm_isdst = tm.tm_isdst;
Start = mktime_z (tz, &tm);
@@ -2064,7 +2070,13 @@ parse_datetime2 (struct timespec *result, char const *p,
dbg_printf (_("error: tzalloc (\"%s\") failed\n"), tz2buf);
goto fail;
}
- tm = tm0;
+ tm.tm_sec = tm0.tm_sec;
+ tm.tm_min = tm0.tm_min;
+ tm.tm_hour = tm0.tm_hour;
+ tm.tm_mday = tm0.tm_mday;
+ tm.tm_mon = tm0.tm_mon;
+ tm.tm_year = tm0.tm_year;
+ tm.tm_isdst = tm0.tm_isdst;
Start = mktime_z (tz2, &tm);
repaired = mktime_ok (tz2, &tm0, &tm, Start);
tzfree (tz2);
diff --git a/lib/posixtm.c b/lib/posixtm.c
index 26a35dd3f..030f704f0 100644
--- a/lib/posixtm.c
+++ b/lib/posixtm.c
@@ -182,7 +182,12 @@ posixtime (time_t *p, const char *s, unsigned int syntax_bits)
if (! posix_time_parse (&tm0, s, syntax_bits))
return false;
- tm1 = tm0;
+ tm1.tm_sec = tm0.tm_sec;
+ tm1.tm_min = tm0.tm_min;
+ tm1.tm_hour = tm0.tm_hour;
+ tm1.tm_mday = tm0.tm_mday;
+ tm1.tm_mon = tm0.tm_mon;
+ tm1.tm_year = tm0.tm_year;
tm1.tm_isdst = -1;
t = mktime (&tm1);

This triggers the following warning with gcc-6.3

lib/posixtm.c:214:20: error: '*((void *)&tm0+20)' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
if ((tm0.tm_year ^ tm->tm_year)
~~~~~~~~~~~~~^~~~~~~~~~~~~~

It's not introducing any new issue I think, but
seems to be triggering the compiler warning due
to tm_year being explicitly set?

How about the attached to ensure tm_year is set?

cheers,
Pádraig

Paul Eggert

2017-11-24 08:31:38 UTC

Post by PÃ¡draig Brady
It's not introducing any new issue I think, but
seems to be triggering the compiler warning due
to tm_year being explicitly set?

Actually, I think it's pointing out an issue with the API. If the caller to
posixtime were to specify neither PDS_LEADING_YEAR nor PDS_TRAILING_YEAR, the
'year' function would never get called, which means that tm0.tm_year would be
uninitialized.

No caller ever does that. In practice, PDS_LEADING_YEAR and PDS_TRAILING_YEAR
are mutually exclusive and one or the other should always be specified. This
suggests that there should be just a single bit in the API, not two bits. I
installed the attached patch, which changes the API accordingly, in a way that I
think will not break any current callers. Does it pacify your GCC?

Pádraig Brady

2017-11-24 19:47:01 UTC

Post by Paul Eggert

Post by PÃ¡draig Brady
It's not introducing any new issue I think, but
seems to be triggering the compiler warning due
to tm_year being explicitly set?

Actually, I think it's pointing out an issue with the API. If the caller to
posixtime were to specify neither PDS_LEADING_YEAR nor PDS_TRAILING_YEAR, the
'year' function would never get called, which means that tm0.tm_year would be
uninitialized.
No caller ever does that. In practice, PDS_LEADING_YEAR and PDS_TRAILING_YEAR
are mutually exclusive and one or the other should always be specified. This
suggests that there should be just a single bit in the API, not two bits. I
installed the attached patch, which changes the API accordingly, in a way that I
think will not break any current callers. Does it pacify your GCC?

It does indeed since bitwise ops are not in all paths to initializing tm_year.

thanks!
Pádraig

26 Replies
4 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Paul Eggert 2017-09-26 01:29:09 UTC

Paul Eggert 2017-09-26 01:29:10 UTC

Paul Eggert 2017-09-26 01:29:13 UTC

Bruno Haible 2017-09-27 00:01:20 UTC

Paul Eggert 2017-09-26 01:29:11 UTC

Paul Eggert 2017-09-26 01:29:12 UTC

Bruno Haible 2017-09-27 22:46:19 UTC

Paul Eggert 2017-09-27 23:27:01 UTC

Bruno Haible 2017-09-28 00:35:58 UTC

Bruno Haible 2017-09-28 00:44:53 UTC

Jim Meyering 2017-09-28 00:59:45 UTC

Bruno Haible 2017-09-28 17:54:37 UTC

Paul Eggert 2017-09-28 18:02:45 UTC

Bruno Haible 2017-10-03 20:03:29 UTC

Paul Eggert 2017-09-28 01:23:32 UTC

Dmitry Selyutin 2017-09-28 06:29:42 UTC

Tim Rühsen 2017-09-28 10:08:02 UTC

Paul Eggert 2017-09-28 18:01:34 UTC

Bruno Haible 2017-09-28 18:39:23 UTC

Dmitry Selyutin 2017-09-28 19:36:39 UTC

Paul Eggert 2017-09-28 22:23:32 UTC

Paul Eggert 2017-09-28 22:19:07 UTC

Thien-Thi Nguyen 2017-09-26 06:19:37 UTC

Paul Eggert 2017-09-26 06:46:38 UTC

Pádraig Brady 2017-11-24 01:10:25 UTC

Paul Eggert 2017-11-24 08:31:38 UTC

Pádraig Brady 2017-11-24 19:47:01 UTC

about - legalese

Loading...