Discussion:
Sync gnulib regex with glibc
Adhemerval Zanella
2017-12-21 19:50:53 UTC
Permalink
Hi Paul,

Working on syncing gnulib regex files with glibc a change on gnulib
triggers a regression on a glibc testcase:

posix/bug-regex28.c:

24 struct tests
25 {
26 const char *regex;
27 const char *string;
28 reg_syntax_t syntax;
29 int retval;
30 } tests[] = {
31 #define EGREP RE_SYNTAX_EGREP
32 #define EGREP_NL (RE_SYNTAX_EGREP | RE_DOT_NEWLINE) & ~RE_HAT_LISTS_NOT_NEWLINE
33 { "a.b", "a\nb", EGREP, -1 },
34 { "a.b", "a\nb", EGREP_NL, 0 },
35 { "a[^x]b", "a\nb", EGREP, -1 },
36 { "a[^x]b", "a\nb", EGREP_NL, 0 },

Basically it is expecting "a.b" with RE_SYNTAX_EGREP to not grep
"a\nb" but the way that RE_SYNTAX_EGREP is defined now on gnulib
is does not hold true any more:

--- posix/regex.h 2017-12-21 12:57:39.814761254 -0200
+++ ../../gnulib/gnulib-lib/lib/regex.h 2017-09-14 14:20:50.809030813 -0300

-#define RE_SYNTAX_GREP \
- (RE_BK_PLUS_QM | RE_CHAR_CLASSES \
- | RE_HAT_LISTS_NOT_NEWLINE | RE_INTERVALS \
- | RE_NEWLINE_ALT)
-
-#define RE_SYNTAX_EGREP \
- (RE_CHAR_CLASSES | RE_CONTEXT_INDEP_ANCHORS \
- | RE_CONTEXT_INDEP_OPS | RE_HAT_LISTS_NOT_NEWLINE \
- | RE_NEWLINE_ALT | RE_NO_BK_PARENS \
- | RE_NO_BK_VBAR)
-
-#define RE_SYNTAX_POSIX_EGREP \
- (RE_SYNTAX_EGREP | RE_INTERVALS | RE_NO_BK_BRACES \
- | RE_INVALID_INTERVAL_ORD)
+# define RE_SYNTAX_GREP \
+ ((RE_SYNTAX_POSIX_BASIC | RE_NEWLINE_ALT) \
+ & ~(RE_CONTEXT_INVALID_DUP | RE_DOT_NOT_NULL))
+
+# define RE_SYNTAX_EGREP \
+ ((RE_SYNTAX_POSIX_EXTENDED | RE_INVALID_INTERVAL_ORD | RE_NEWLINE_ALT) \
+ & ~(RE_CONTEXT_INVALID_OPS | RE_DOT_NOT_NULL))
+
+/* POSIX grep -E behavior is no longer incompatible with GNU. */
+# define RE_SYNTAX_POSIX_EGREP \
+ RE_SYNTAX_EGREP


The glibc testfile, bug-regex28.c, is related to BZ#3957 [1], which is not
strictly related to RE_SYNTAX_{E}GREP definition. On gnulib side the change
was done somewhat recently (2015) by 5a5a9388e.

It does look like a correct change, but what I am worried from glibc
standpoint is if it would require a compatibility implementation
(potentially mapping RE_SYNTAX_{E}GREP to old definition on compat symbol).

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=3957
Zack Weinberg
2017-12-21 20:15:51 UTC
Permalink
On Thu, Dec 21, 2017 at 11:50 AM, Adhemerval Zanella
Post by Adhemerval Zanella
The glibc testfile, bug-regex28.c, is related to BZ#3957 [1], which is not
strictly related to RE_SYNTAX_{E}GREP definition. On gnulib side the change
was done somewhat recently (2015) by 5a5a9388e.
It does look like a correct change, but what I am worried from glibc
standpoint is if it would require a compatibility implementation
(potentially mapping RE_SYNTAX_{E}GREP to old definition on compat symbol).
I don't think that would actually help anything in this case. "This
program's embedded regexps are misinterpreted by a newer libc.so" is
not noticeably different from "This program's embedded regexps are
misinterpreted if the program has been recompiled with newer glibc".
Especially when it has to do with a corner case like whether . matches
\n. Release notes are probably the best we can do here.

zw
Paul Eggert
2017-12-21 21:51:24 UTC
Permalink
Post by Adhemerval Zanella
It does look like a correct change, but what I am worried from glibc
standpoint is if it would require a compatibility implementation
(potentially mapping RE_SYNTAX_{E}GREP to old definition on compat symbol).
I doubt whether that would be needed (or, as Zack writes, helpful) in this case.
(Neither RE_SYNTAX_GREP nor RE_SYNTAX_EGREP are documented in any way, amusingly
enough, so one cannot say we're changing documented behavior....)
Loading...