[pygnulib] simplify cache configure.ac parsing
(too old to reply)
Dmitry Selyutin
2017-09-12 15:54:08 UTC
This change does not affect the current gnulib-tool.py, just `python`
Still this change is going to be integrated later into the gnulib-tool.py.

I've been testing a new command-line parsing along with parsing cached
configuration (configure.ac, gnulib-cache.m4 and gnulib-comp.m4 processing).
I've noticed that we spend a lot of time whilst processing the contents of
AC_PREREQ and AC_CONFIG_AUX_DIR macros. These regular expressions have the
following form (I've removed some junk):


In Python, however, it seems to be enough to just use the following form:


Once I started using the latest form, the time required to process each of
these regular expressions decreased for about half a second. The regex works
even on the following cases:

" AC_PREREQ([2.67])"

I suspect that the original form just was a copy-paste from the original
gnulib-tool, where it could have been used due to the usage of sed to parse
the contents of the configure.ac file. So the questions are:
1. Is the new behavior correct?
2. Shall I push this small optimization?

I'd like to do it, because right now everything else I've rewritten works
almost instantly, but I still have some doubts. What do you think?
BTW, the version from the pygnulib differs a bit already from the
shell script; I've attached the patch. I've also decided to use raw string
literals just to make regex less verbose.
With best regards,
Dmitry Selyutin
Bruno Haible
2017-09-12 17:08:55 UTC
Hi Dmitry,
Post by Dmitry Selyutin
BTW, the version from the pygnulib differs a bit already from the
gnulib-tool shell script
The shell script contains this code (unchanged since 2009):

s,^dnl .*$,,
s, dnl .*$,,
s,^.*AC_CONFIG_AUX_DIR([[ ]*\([^]"$`\\)]*\).*$,guessed_auxdir="\1",p
eval `sed -n -e "$my_sed_traces" < "$configure_ac"`

The first 3 lines of the sed script remove comments; I guess gnulib-tool.py
ought to do the same, because we really don't be fooled by invocations that
have been commented out.
Post by Dmitry Selyutin
" AC_PREREQ([2.67])"
You can reasonably assume that a configure.ac will not contain the first
or third of these lines, because AC_PREREQ (and likewise A[CM]_PROG_LIBTOOL)
are usually used at the top-level only. The second line can occur, though,
as there is no pressure on the programmers to use no indentation.