Discussion:
quotearg test failures
(too old to reply)
Bruno Haible
2016-10-16 16:49:55 UTC
Permalink
Raw Message
Hi Pádraig,

On a glibc system (glibc 2.15, Linux 3.8) I get these test failures
from a gnulib testdir:

1) For module 'quotearg':

FAIL: test-quotearg.sh

$ LOCALE=fr_FR.UTF-8 LOCALEDIR=locale ./test-quotearg
test-quotearg.h:53: assertion 'la == lb' failed

(gdb) where
#0 0x00007ffff7a52035 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff7a5579b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00000000004011a3 in compare (a=0x4063d5 "««\\»»", la=9, b=0x608460 "«a' b»", lb=8) at test-quotearg.h:53
#3 0x000000000040159f in compare_strings (func=0x401684 <use_quotearg_buffer>, results=0x6081e0, ascii_only=false) at test-quotearg.h:91
#4 0x0000000000401ae3 in main (argc=1, argv=0x7fffffffdb98) at test-quotearg.c:87

Apparently the quoting result of string "a' b", which ought to have been
"«a' b»", is now "««\\»»" - which is complete nonsense.

Can you reproduce this? You must first make sure that you have a fr_FR.UTF-8
locale on your system. If not, create it using
# localedef -i fr_FR -f UTF-8 fr_FR.UTF-8'


2) For module 'sh-quote', which relies on 'quotearg':

FAIL: test-sh-quote

test-sh-quote.c:48: assertion 'buf[output_len + 1] == '%'' failed
FAIL test-sh-quote (exit status: 134)


3) For module 'system-quote', which relies on 'sh-quote':

FAIL: test-system-quote.sh

test-system-quote-main.c:65: assertion 'buf[output_len + 1] == '%'' failed
FAIL test-system-quote.sh (exit status: 134)


I suspect the commit that you delivered on 2016-10-03, since that's the only
recent change in this area. Can you please take a deep look?

Bruno
Pádraig Brady
2016-10-16 22:00:35 UTC
Permalink
Raw Message
Post by Bruno Haible
Hi Pádraig,
On a glibc system (glibc 2.15, Linux 3.8) I get these test failures
FAIL: test-quotearg.sh
$ LOCALE=fr_FR.UTF-8 LOCALEDIR=locale ./test-quotearg
test-quotearg.h:53: assertion 'la == lb' failed
(gdb) where
#0 0x00007ffff7a52035 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff7a5579b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00000000004011a3 in compare (a=0x4063d5 "««\\»»", la=9, b=0x608460 "«a' b»", lb=8) at test-quotearg.h:53
#3 0x000000000040159f in compare_strings (func=0x401684 <use_quotearg_buffer>, results=0x6081e0, ascii_only=false) at test-quotearg.h:91
#4 0x0000000000401ae3 in main (argc=1, argv=0x7fffffffdb98) at test-quotearg.c:87
Apparently the quoting result of string "a' b", which ought to have been
"«a' b»", is now "««\\»»" - which is complete nonsense.
Can you reproduce this? You must first make sure that you have a fr_FR.UTF-8
locale on your system. If not, create it using
# localedef -i fr_FR -f UTF-8 fr_FR.UTF-8'
FAIL: test-sh-quote
test-sh-quote.c:48: assertion 'buf[output_len + 1] == '%'' failed
FAIL test-sh-quote (exit status: 134)
FAIL: test-system-quote.sh
test-system-quote-main.c:65: assertion 'buf[output_len + 1] == '%'' failed
FAIL test-system-quote.sh (exit status: 134)
I suspect the commit that you delivered on 2016-10-03, since that's the only
recent change in this area. Can you please take a deep look?
The attached should fix up those tests.
Sorry for missing those.

Pádraig.
Bruno Haible
2016-10-17 00:41:46 UTC
Permalink
Raw Message
Hi Pádraig,
Post by Pádraig Brady
The attached should fix up those tests.
I disagree with the removal of the '%' checks in tests/test-sh-quote.c
and tests/test-system-quote-main.c. These checks are there to guarantee
that the functions won't write past the allocated buffer. For example,
sh-quote.h specifies how large the buffer must be:

/* Copies the quoted string to p and returns the incremented p.
There must be room for shell_quote_length (string) + 1 bytes at p. */
extern char * shell_quote_copy (char *p, const char *string);

system-quote.h specifies it similarly:

/* Copies the quoted string to p and returns the incremented p.
There must be room for system_quote_length (string) + 1 bytes at p. */
extern char *
system_quote_copy (char *p,
enum system_command_interpreter interpreter,
const char *string);

The tests verify this. Please can you restore the checks and instead
see why quotearg now writes more bytes than necessary?

Note that both shell_quote_length and shell_quote_copy are defined
through the 'quotearg' module, with the same options (called
'sh_quoting_options' there). The buffer overrun indicates an internal
inconsistency in the 'quotearg' module.

Bruno
Bruno Haible
2016-10-18 12:15:28 UTC
Permalink
Raw Message
Hi Pádraig,
Post by Bruno Haible
I disagree with the removal of the '%' checks in tests/test-sh-quote.c
Agreed. There is a disparity now because there is a switch
from a longer (shell) to a more concise (c) quoting style
in certain cases (when just a single quote is encountered).
And what happens more precisely (I've single-stepped it) is:

1) Call to
quotearg_buffer_restyled (buffer=0x7fffffffd620 "", buffersize=18446744073709551615,
arg=0x40433f "'foo'bar", argsize=8,
quoting_style=shell_quoting_style, flags=0,
quote_these_too=0x607018, left_quote=0x0, right_quote=0x0)
at quotearg.c:252

In line 372 it sets
quoting_style = shell_always_quoting_style;

It parses the first character and from line 546 jumps to force_outer_quoting_style.

2) Recursive call to
quotearg_buffer_restyled (buffer=0x7fffffffd620 "", buffersize=18446744073709551615,
arg=0x40433f "'foo'bar", argsize=8,
quoting_style=shell_always_quoting_style, flags=0,
quote_these_too=0x0, left_quote=0x0, right_quote=0x0)
at quotearg.c:252

It processes the entire string and produces
"''\\''foo'\\''bar"
which is a string of length 15+NUL.

Then, in line 715, it finds the following values:
(gdb) print quoting_style
$17 = shell_always_quoting_style
(gdb) print elide_outer_quotes
$18 = false
(gdb) print all_c_and_shell_quote_compat
$19 = true
(gdb) print encountered_single_quote
$20 = true

3) Recursive call to
quotearg_buffer_restyled (buffer=0x7fffffffd620 "", buffersize=18446744073709551615,
arg=0x40433f "'foo'bar", argsize=8,
quoting_style=c_quoting_style, flags=1,
quote_these_too=0x0, left_quote=0x0, right_quote=0x0)
at quotearg.c:252
and this one produces
"\"'foo'bar\\'"
which is a string of length 11+NUL.

The problem is that this code writes 15+1 bytes into the buffer,
then restarts and then reports that it has written 11+1 bytes into
the buffer. shell_quote then knows that it has to allocate 11+1 bytes,
and the second call to quotearg_buffer makes a buffer overrun.
I'll fix this up either by unsetting a new USE_MOST_CONCISE option
from shell_quote_{length,copy}
Ugh. This will add complexity to ugliness.

How about this? You are doing multiple passes through the argument string
anyway now. (Originally quotearg was meant as a single-pass implementation,
but it isn't any more.) So
- remove the code block of lines 711..720,
- instead, before the for(...) loop, determine all_c_and_shell_quote_compat
and encountered_single_quote in an extra single pass through the string,
- and put the code block from lines 711..720 *BEFORE* the for(...) loop.

This way, the buffer will only be filled once. => No buffer overrun.

I'm reverting the unintented parts of test-sh-quote.c. You can then decide
how you want to modify quotearg.c (and sh-quote.c if you want).

Bruno


2016-10-18 Bruno Haible <***@clisp.org>

sh-quote, system-quote: revert regression of unit test.
* tests/test-sh-quote.c (check_one): Do detect buffer overruns.
* tests/test-system-quote-main.c (check_one): Likewise.

diff --git a/tests/test-sh-quote.c b/tests/test-sh-quote.c
index 091da34..ff05c8e 100644
--- a/tests/test-sh-quote.c
+++ b/tests/test-sh-quote.c
@@ -41,9 +41,11 @@ check_one (const char *input, const char *expected)

ASSERT (output_len <= sizeof (buf) - 2);
memset (buf, '\0', output_len + 1);
+ buf[output_len + 1] = '%';
bufend = shell_quote_copy (buf, input);
ASSERT (bufend == buf + output_len);
ASSERT (memcmp (buf, output, output_len + 1) == 0);
+ ASSERT (buf[output_len + 1] == '%');

ASSERT (strcmp (output, expected) == 0);

diff --git a/tests/test-system-quote-main.c b/tests/test-system-quote-main.c
index 40d7ec6..8eb6a17 100644
--- a/tests/test-system-quote-main.c
+++ b/tests/test-system-quote-main.c
@@ -58,9 +58,11 @@ check_one (enum system_command_interpreter interpreter, const char *prog,

ASSERT (output_len <= sizeof (buf) - 2);
memset (buf, '\0', output_len + 1);
+ buf[output_len + 1] = '%';
bufend = system_quote_copy (buf, interpreter, input);
ASSERT (bufend == buf + output_len);
ASSERT (memcmp (buf, output, output_len + 1) == 0);
+ ASSERT (buf[output_len + 1] == '%');

/* Store INPUT in EXPECTED_DATA_FILE, for verification by the child
process. */
Padraig Brady
2016-10-18 20:43:35 UTC
Permalink
Raw Message
Post by Bruno Haible
How about this? You are doing multiple passes through the argument string
anyway now. (Originally quotearg was meant as a single-pass implementation,
but it isn't any more.) So
- remove the code block of lines 711..720,
- instead, before the for(...) loop, determine all_c_and_shell_quote_compat
and encountered_single_quote in an extra single pass through the string,
- and put the code block from lines 711..720 *BEFORE* the for(...) loop.
This way, the buffer will only be filled once. => No buffer overrun.
I'm reverting the unintented parts of test-sh-quote.c. You can then decide
how you want to modify quotearg.c (and sh-quote.c if you want).
In the attached I added a single extra orig_buffersize variable to
control an extra read-only scan.

Pádraig.
Bruno Haible
2016-10-18 23:50:55 UTC
Permalink
Raw Message
Hi Padraig,
Post by Padraig Brady
In the attached I added a single extra orig_buffersize variable to
control an extra read-only scan.
Thanks, it looks good.

The other recursive call to quotearg_buffer_restyled

return quotearg_buffer_restyled (buffer, buffersize, arg, argsize,
quoting_style,
flags & ~QA_ELIDE_OUTER_QUOTES, NULL,
left_quote, right_quote);

is also OK, because the code is careful not to make more than one
STORE (...) call per input byte if elide_outer_quotes is true. There are
enough occurrences of

if (elide_outer_quotes)
goto force_outer_quoting_style;

so that the string that was produced before this recursive call will
not be longer than the one produced by the recursive call. => All fine.

Bruno

Loading...