Discussion:
[bug-gnulib] bugs in regexp.c
Sam Steingold
2005-03-08 00:00:20 UTC
Permalink
regexp.c exhibits the following 8 bugs:

1. regcomp("*a"): "Invalid preceding regular expression"
2. regcomp("^*"): "Invalid preceding regular expression"
3. regcomp("$*"): "Invalid preceding regular expression"
4. regcomp("(*)b"): "Invalid preceding regular expression"
5. regcomp("{[^}\n]*}"): "Invalid content of \\{\\}"
6. regcomp("{[^}\n]*}"): "Invalid content of \\{\\}"
7. regcomp ("{[^}\n]*}"): "Invalid content of \\{\\}"

8. regcomp("[[:xdigit]+") works, but when the result is matched against
"0[x:dig", it matches just "[", not "[x:dig"

bugs discovered by the CLISP regexp test suite.
--
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.memri.org/> <http://www.honestreporting.com>
<http://pmw.org.il/> <http://www.iris.org.il> <http://www.mideasttruth.com/>
The early bird may get the worm, but the second mouse gets the cheese.
Paul Eggert
2005-03-09 01:03:27 UTC
Permalink
Post by Sam Steingold
1. regcomp("*a"): "Invalid preceding regular expression"
2. regcomp("^*"): "Invalid preceding regular expression"
3. regcomp("$*"): "Invalid preceding regular expression"
4. regcomp("(*)b"): "Invalid preceding regular expression"
5. regcomp("{[^}\n]*}"): "Invalid content of \\{\\}"
6. regcomp("{[^}\n]*}"): "Invalid content of \\{\\}"
7. regcomp ("{[^}\n]*}"): "Invalid content of \\{\\}"
8. regcomp("[[:xdigit]+") works, but when the result is matched against
"0[x:dig", it matches just "[", not "[x:dig"
(3) is a bug, but the others aren't bugs, surely, since POSIX says
those other EREs are invalid or have undefined behavior. Perhaps you
need to update the CLISP regexp test suite?
Sam Steingold
2005-03-09 19:37:29 UTC
Permalink
Post by Paul Eggert
Post by Sam Steingold
1. regcomp("*a"): "Invalid preceding regular expression"
2. regcomp("^*"): "Invalid preceding regular expression"
3. regcomp("$*"): "Invalid preceding regular expression"
4. regcomp("(*)b"): "Invalid preceding regular expression"
5. regcomp("{[^}\n]*}"): "Invalid content of \\{\\}"
6. regcomp("{[^}\n]*}"): "Invalid content of \\{\\}"
7. regcomp ("{[^}\n]*}"): "Invalid content of \\{\\}"
8. regcomp("[[:xdigit]+") works, but when the result is matched against
"0[x:dig", it matches just "[", not "[x:dig"
(3) is a bug, but the others aren't bugs, surely, since POSIX says
those other EREs are invalid or have undefined behavior.
what's wrong with 8?
Post by Paul Eggert
Perhaps you need to update the CLISP regexp test suite?
maybe.
Paul Eggert
2005-03-11 00:29:50 UTC
Permalink
Post by Sam Steingold
Post by Sam Steingold
3. regcomp("$*"): "Invalid preceding regular expression"
8. regcomp("[[:xdigit]+") works, but when the result is matched against
"0[x:dig", it matches just "[", not "[x:dig"
what's wrong with 8?
POSIX says:

The character sequences "[.", "[=", and "[:" (left-bracket followed
by a period, equals-sign, or colon) shall be special inside a bracket
expression and are used to delimit collating symbols, equivalence
class expressions, and character class expressions. These symbols
shall be followed by a valid expression and the matching terminating
sequence ".]", "=]", or ":]", as described in the following items.

(8) violates a "shall" so it's not a good test case.
Paul Eggert
2005-03-11 22:10:29 UTC
Permalink
Post by Paul Eggert
Post by Sam Steingold
3. regcomp("$*"): "Invalid preceding regular expression"
8. regcomp("[[:xdigit]+") works, but when the result is matched against
"0[x:dig", it matches just "[", not "[x:dig"
(8) violates a "shall" so it's not a good test case.
then why doesn't regcomp signal an error?
Because regcomp is required to signal an error only for invalid
regular expressions that do not have undefined behavior. As I
understand it (8) has undefined behavior, so as far as POSIX is
concerned regcomp can do anything it pleases (including dump core).
For more, please see
<http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html>
and look for "invalid".
Post by Paul Eggert
I don't know how to reproduce it easily.
main () {regcomp(NULL,"$*",0);}
As I understand it that's not portable code, since POSIX doesn't allow
the first arg of regcomp to be null.

I tried to correct it as follows, using the gnulib version of regex.h
and regex.c, but this test case worked for me. Do you have a complete
example illustrating the problem?

#include <sys/types.h>
#include <regex.h>

int
main (int argc, char **argv)
{
regex_t r;
if (regcomp (&r, "$*", 0) != 0)
abort ();
if (regexec (&r, "ab$", 0, 0, 0) != 0)
abort ();
return 0;
}
Sam Steingold
2005-03-18 16:03:07 UTC
Permalink
Post by Paul Eggert
Post by Paul Eggert
Post by Sam Steingold
3. regcomp("$*"): "Invalid preceding regular expression"
8. regcomp("[[:xdigit]+") works, but when the result is matched against
"0[x:dig", it matches just "[", not "[x:dig"
(8) violates a "shall" so it's not a good test case.
then why doesn't regcomp signal an error?
Because regcomp is required to signal an error only for invalid
regular expressions that do not have undefined behavior. As I
understand it (8) has undefined behavior, so as far as POSIX is
concerned regcomp can do anything it pleases (including dump core).
For more, please see
<http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html>
and look for "invalid".
indeed, regcomp is not _required_ to signal an error in this case,
but I think it is a good practice to treat undefined behavior as an
error. this encourages safe portable code.
Post by Paul Eggert
Post by Paul Eggert
I don't know how to reproduce it easily.
I tried to correct it as follows, using the gnulib version of regex.h
and regex.c, but this test case worked for me. Do you have a complete
example illustrating the problem?
#include <sys/types.h>
#include <regex.h>
int
main (int argc, char **argv)
{
regex_t r;
if (regcomp (&r, "$*", 0) != 0)
(regcomp (&r, "$*", REG_EXTENDED) != 0)
Post by Paul Eggert
abort ();
if (regexec (&r, "ab$", 0, 0, 0) != 0)
abort ();
return 0;
}
abnormal program termination
--
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.openvotingconsortium.org/> <http://www.memri.org/>
<http://www.jihadwatch.org/> <http://www.dhimmi.com/>
Warning! Dates in calendar are closer than they appear!
Paul Eggert
2005-03-18 21:23:41 UTC
Permalink
Post by Sam Steingold
I think it is a good practice to treat undefined behavior as an
error.
That may be, but in that case the test cases assuming a particular
extension tof POSIX, and the test cases therefore will not be portable
to all POSIX implementations. I suspect that in some cases the
regexp.c behavior is compatible with traditional UNIX, which did not
always treat undefined behavior as an error; changing it might break
applications assuming the traditional UNIX behavior.
Post by Sam Steingold
(regcomp (&r, "$*", REG_EXTENDED) != 0)
Ah, OK, I see now. That has undefined behavior. "$*" is a valid BRE,
but it is not a valid ERE.

<http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04_06>
defines the behavior of "*" only if it is preceded by "an ERE matching
a single character or an ERE enclosed in parentheses". But "$" is not
such an ERE.

Sam Steingold
2005-03-11 13:56:22 UTC
Permalink
Post by Paul Eggert
Post by Sam Steingold
Post by Sam Steingold
3. regcomp("$*"): "Invalid preceding regular expression"
8. regcomp("[[:xdigit]+") works, but when the result is matched against
"0[x:dig", it matches just "[", not "[x:dig"
what's wrong with 8?
The character sequences "[.", "[=", and "[:" (left-bracket followed
by a period, equals-sign, or colon) shall be special inside a bracket
expression and are used to delimit collating symbols, equivalence
class expressions, and character class expressions. These symbols
shall be followed by a valid expression and the matching terminating
sequence ".]", "=]", or ":]", as described in the following items.
(8) violates a "shall" so it's not a good test case.
then why doesn't regcomp signal an error?
Loading...