Discussion:
dfa is slower than regex in some cases
Add Reply
Norihiro Tanaka
2017-12-03 05:22:23 UTC
Reply
Permalink
Raw Message
Hi,

A case that dfa is slower than regex on bug-gawk mailing list is
discussed 4 months ago.

http://lists.gnu.org/archive/html/bug-gawk/2017-07/msg00002.html

Although gawk is fixed to not use dfa in some cases, it looks like a
temporary treatment. (11d4ea518166ffbc0c2fe85d090723e8f299486c)

--------
$ env LC_ALL=C ./gawk --version
GNU Awk 4.2.0, API: 2.0 (GNU MPFR 3.1.5, GNU MP 4.3.2)
Copyright (C) 1989, 1991-2017 Free Software Foundation.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
$ time -p env LC_ALL=C ./gawk -f test.awk test.inp >/dev/null
real 1.38
user 1.17
sys 0.11
$ time -p env LC_ALL=C GAWK_NO_DFA=1 ./gawk -f test.awk test.inp >/dev/null
real 0.51
user 0.43
sys 0.07
--------

This problem seems to be due to dfa.c. In fact, grep is also slower
than not to use dfa in the same cases.

--------
$ env LC_ALL=C src/grep --version
grep (GNU grep) 3.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
$ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
real 1.06
user 0.97
sys 0.08
--------

# grep does not have control by GREP_NO_DFA environment, so I changed
# as following and rebuilded, and run the same test as above.

========
diff --git a/lib/dfa.c b/lib/dfa.c
index 2b9c80e..f234b88 100644
--- a/lib/dfa.c
+++ b/lib/dfa.c
@@ -3474,21 +3474,7 @@ dfacomp (char const *s, size_t len, struct dfa *d, bool searchflag)
dfaparse (s, len, d);
dfassbuild (d);

- if (dfa_supported (d))
- {
- dfaoptimize (d);
- dfaanalyze (d, searchflag);
- }
- else
- {
- d->dfaexec = dfaexec_noop;
- }
-
- if (d->superset)
- {
- d->fast = true;
- dfaanalyze (d->superset, searchflag);
- }
+ d->dfaexec = dfaexec_noop;
}

/* Free the storage held by the components of a dfa. */
========

--------
$ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
real 0.43
user 0.37
sys 0.05
--------

However, I do not still know why dfa is slower than regex in this case.

Thanks,
Norihiro
Norihiro Tanaka
2017-12-03 05:23:50 UTC
Reply
Permalink
Raw Message
I attached a test case on this mail.

On Sun, 03 Dec 2017 14:22:23 +0900
Post by Norihiro Tanaka
Hi,
A case that dfa is slower than regex on bug-gawk mailing list is
discussed 4 months ago.
http://lists.gnu.org/archive/html/bug-gawk/2017-07/msg00002.html
Although gawk is fixed to not use dfa in some cases, it looks like a
temporary treatment. (11d4ea518166ffbc0c2fe85d090723e8f299486c)
--------
$ env LC_ALL=C ./gawk --version
GNU Awk 4.2.0, API: 2.0 (GNU MPFR 3.1.5, GNU MP 4.3.2)
Copyright (C) 1989, 1991-2017 Free Software Foundation.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
$ time -p env LC_ALL=C ./gawk -f test.awk test.inp >/dev/null
real 1.38
user 1.17
sys 0.11
$ time -p env LC_ALL=C GAWK_NO_DFA=1 ./gawk -f test.awk test.inp >/dev/null
real 0.51
user 0.43
sys 0.07
--------
This problem seems to be due to dfa.c. In fact, grep is also slower
than not to use dfa in the same cases.
--------
$ env LC_ALL=C src/grep --version
grep (GNU grep) 3.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
$ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
real 1.06
user 0.97
sys 0.08
--------
# grep does not have control by GREP_NO_DFA environment, so I changed
# as following and rebuilded, and run the same test as above.
========
diff --git a/lib/dfa.c b/lib/dfa.c
index 2b9c80e..f234b88 100644
--- a/lib/dfa.c
+++ b/lib/dfa.c
@@ -3474,21 +3474,7 @@ dfacomp (char const *s, size_t len, struct dfa *d, bool searchflag)
dfaparse (s, len, d);
dfassbuild (d);
- if (dfa_supported (d))
- {
- dfaoptimize (d);
- dfaanalyze (d, searchflag);
- }
- else
- {
- d->dfaexec = dfaexec_noop;
- }
-
- if (d->superset)
- {
- d->fast = true;
- dfaanalyze (d->superset, searchflag);
- }
+ d->dfaexec = dfaexec_noop;
}
/* Free the storage held by the components of a dfa. */
========
--------
$ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
real 0.43
user 0.37
sys 0.05
--------
However, I do not still know why dfa is slower than regex in this case.
Thanks,
Norihiro
--
$B!!EDCf!!5*MN(B (Norihiro TANAKA)
$B!!(BE-mail : ***@kcn.ne.jp
Paul Eggert
2017-12-03 05:48:34 UTC
Reply
Permalink
Raw Message
Post by Norihiro Tanaka
- if (dfa_supported (d))
- {
- dfaoptimize (d);
- dfaanalyze (d, searchflag);
- }
- else
- {
- d->dfaexec = dfaexec_noop;
- }
-
- if (d->superset)
- {
- d->fast = true;
- dfaanalyze (d->superset, searchflag);
- }
+ d->dfaexec = dfaexec_noop;
Are you suggesting that we install this patch? Sorry, this was not clear from
your email.
Norihiro Tanaka
2017-12-03 06:14:21 UTC
Reply
Permalink
Raw Message
On Sat, 2 Dec 2017 21:48:34 -0800
Post by Norihiro Tanaka
- if (dfa_supported (d))
- {
- dfaoptimize (d);
- dfaanalyze (d, searchflag);
- }
- else
- {
- d->dfaexec = dfaexec_noop;
- }
-
- if (d->superset)
- {
- d->fast = true;
- dfaanalyze (d->superset, searchflag);
- }
+ d->dfaexec = dfaexec_noop;
Are you suggesting that we install this patch? Sorry, this was not clear from your email.
No. I installed this patch only to test the problem.

Loading...