Discussion:
[PATCH] dfa: save memory for states
(too old to reply)
Norihiro Tanaka
2016-10-10 14:28:48 UTC
Permalink
Raw Message
Hi,

dfa matcher is not good at matching for long patterns as following.

seq -f '%g bottles of beer on the wall' 2400 >in
env LC_ALL=C src/grep -vf in in

In fact, for this case GNU grep is considerably slower than ripgrep
which is released recently. I think that it is cased by some reasons
and one of them is in memory allocation.

This patch checks number of states in beginning of dfa execution, and if
there are a lot of states, release them. it can not only save memory,
but may improve performance, as when the number of the state becomes
large, it takes time to find the same state from caches of states.

Thanks.
Norihiro
Bruno Haible
2016-10-10 15:41:03 UTC
Permalink
Raw Message
Hi Norihiro,

I'm not maintainer of the 'dfa' module, but nevertheless I'd like to know
what is going on more precisely (because clearing a cache more often
deteriorates running times than improve running times). 3 questions:

1)
Post by Norihiro Tanaka
This patch checks number of states in beginning of dfa execution, and if
there are a lot of states, release them.
Have you measured what are the execution times without/before and with/after
your patch, for example in the "seq -f '%g bottles of beer on the wall' 2400"
example that you showed? Can you show us the benchmark results?

2) Your patch cuts down on three different caches:
- d->states,
- d->trans, d->fails,
- d->mb_trans.
Can you prepare a separate patch for each of the three changes, and
present the benchmark results of each? It would be interesting to know
which of the three has the most effect on the timings.

3) If you are right that reducing the size of a cache improves the timings,
then it means the lookup in the cache is not O(1)? In other words, can the
lookup in the cache be optimized (through more adapted data structures -
I'm thinking of binary trees or hash tables), so that a larger cache size
will NOT lead to a longer execution time?

Bruno
Norihiro Tanaka
2016-10-10 23:25:39 UTC
Permalink
Raw Message
On Mon, 10 Oct 2016 17:41:03 +0200
Post by Bruno Haible
Hi Norihiro,
I'm not maintainer of the 'dfa' module, but nevertheless I'd like to know
what is going on more precisely (because clearing a cache more often
1)
Post by Norihiro Tanaka
This patch checks number of states in beginning of dfa execution, and if
there are a lot of states, release them.
Have you measured what are the execution times without/before and with/after
your patch, for example in the "seq -f '%g bottles of beer on the wall' 2400"
example that you showed? Can you show us the benchmark results?
Hi Bruno,

I tested on following machine, and I looked at slite performance
improvement.

Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz
gcc version 4.4.7 (GCC)
Linux rhel6 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

- before

$ time -p env LC_ALL=C grep -vf 600 600
real 3.95
user 3.35
sys 0.59

- after

$ time -p env LC_ALL=C grep -vf 600 600
real 3.75
user 2.81
sys 0.70

However, focus saving memory please. See also below.

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22357
grep -f huge memory usage
Post by Bruno Haible
- d->states,
- d->trans, d->fails,
- d->mb_trans.
Can you prepare a separate patch for each of the three changes, and
present the benchmark results of each? It would be interesting to know
which of the three has the most effect on the timings.
No, we can not clear D->state without clearing D->trans, D->fails and
D->mb_trans, as state numbers are refered in them.

BTW, dfa had mechanism to clear caches for D->trans, D->fails and
D->mb_trans. See build_state() and transit_state().
Post by Bruno Haible
3) If you are right that reducing the size of a cache improves the timings,
then it means the lookup in the cache is not O(1)? In other words, can the
lookup in the cache be optimized (through more adapted data structures -
I'm thinking of binary trees or hash tables), so that a larger cache size
will NOT lead to a longer execution time?
To achieve it, we need a lot of changes. So I am considering it as next
step.

Thanks,
Norihiro
Norihiro Tanaka
2016-10-10 23:35:24 UTC
Permalink
Raw Message
Hi Bruno,

On Tue, 11 Oct 2016 08:25:39 +0900
Post by Norihiro Tanaka
gcc version 4.4.7 (GCC)
Linux rhel6 2.6.32-642.el6.x86_64 #1 SMP Wed Apr 13 00:51:26 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
- before
$ time -p env LC_ALL=C grep -vf 600 600
real 3.95
user 3.35
sys 0.59
- after
$ time -p env LC_ALL=C grep -vf 600 600
real 3.75
user 2.81
sys 0.70
Sorry, I mistakenenly used pre-installed grep. I tested it with grep in
correct path. The version of grep is 2.26.

$ seq -f '%g bottoms bear on the wall' 600 >600

- before

$ time -p env LC_ALL=C src/grep -vf ~/600 ~/600
real 3.70
user 2.26
sys 1.42

- after

$ time -p env LC_ALL=C src/grep -vf ~/600 ~/600
real 2.57
user 2.20
sys 0.31

Thanks,
Norihiro
Bruno Haible
2016-10-11 05:30:16 UTC
Permalink
Raw Message
Hi Norihiro,
Post by Norihiro Tanaka
$ seq -f '%g bottoms bear on the wall' 600 >600
- before
$ time -p env LC_ALL=C src/grep -vf ~/600 ~/600
real 3.70
user 2.26
sys 1.42
- after
$ time -p env LC_ALL=C src/grep -vf ~/600 ~/600
real 2.57
user 2.20
sys 0.31
Terrific! Congratulations and thank you!!

Bruno

Paul Eggert
2016-10-10 15:32:47 UTC
Permalink
Raw Message
Thanks for that performance improvement. I installed it into gnulib and
grep.
Norihiro Tanaka
2016-10-10 23:26:37 UTC
Permalink
Raw Message
On Mon, 10 Oct 2016 08:32:47 -0700
Thanks for that performance improvement. I installed it into gnulib and grep.
Hi Paul,

Thanks for installing it to both.

Norihiro
Loading...