Discussion:
Automatically-generated regexp documentation
(too old to reply)
Reuben Thomas
2016-10-21 12:52:22 UTC
Permalink
Raw Message
I just noticed while working on Emacs's whitespace.el that Emacs’s syntax
coloring does not color the sequence “\\{” specially in a string, though it
recognises other regexp metacharacters such as “\\(”.

I looked in the Emacs manuals, and found no mention of braces as a special
character, despite the fact that they’re already used in
whitespace-space-after-tab-regexp, for example; and of course, when I
looked in the source code, I found the support there, as one would
expect, in src/regex.c.

The definitive documentation for the GNU regex implementation, as far as I
know, is in gnulib’s doc/regex.texi, but as that file says:

We wrote this chapter with programmers in mind, not users of
programs---such as Emacs---that use Regex. We describe the Regex
library in its entirety, not how to write regular expressions that a
particular program understands.

Since the regexp code is widely used, with mechanically-defined syntax
variants, in GNU grep, GNU sed etc., it would seem possible to have a
mechanically-generated manual, which takes a set of GNU regexp syntax
flags, as taken by re_set_syntax, and spits out suitably-configured
documentation.

This could be either incorporated into other programs' manuals, or, perhaps
better, supplied as standalone info files and man pages, suitable for
cross-referencing from other programs' manuals, rather like a user-level
version of the current gnulib/regex.texi.

In principle it would be possible to compile a manual for every possible
syntax variant, but in practice I imagine one would want to supply one for
POSIX BREs and EREs, GNU-extended EREs (i.e. GNU egrep syntax) and Emacs.
--
http://rrt.sc3d.org
Eric Blake
2016-10-21 13:25:15 UTC
Permalink
Raw Message
Post by Reuben Thomas
Since the regexp code is widely used, with mechanically-defined syntax
variants, in GNU grep, GNU sed etc., it would seem possible to have a
mechanically-generated manual, which takes a set of GNU regexp syntax
flags, as taken by re_set_syntax, and spits out suitably-configured
documentation.
This could be either incorporated into other programs' manuals, or, perhaps
better, supplied as standalone info files and man pages, suitable for
cross-referencing from other programs' manuals, rather like a user-level
version of the current gnulib/regex.texi.
In principle it would be possible to compile a manual for every possible
syntax variant, but in practice I imagine one would want to supply one for
POSIX BREs and EREs, GNU-extended EREs (i.e. GNU egrep syntax) and Emacs.
In fact, such documentation already exists, and IS used in other GNU
manuals: at least findutils and m4 use the regexprops-generic module,
which generates the gnulib file doc/regexprops-generic.texi as a
human-readable user description of all regex flavors.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Reuben Thomas
2016-10-21 13:46:49 UTC
Permalink
Raw Message
Post by Eric Blake
In fact, such documentation already exists, and IS used in other GNU
manuals: at least findutils and m4 use the regexprops-generic module,
which generates the gnulib file doc/regexprops-generic.texi as a
human-readable user description of all regex flavors.
​That's excellent! Another undiscovered gnulib gem.
--
http://rrt.sc3d.org
Reuben Thomas
2016-10-21 18:43:37 UTC
Permalink
Raw Message
Post by Reuben Thomas
Post by Eric Blake
In fact, such documentation already exists, and IS used in other GNU
manuals: at least findutils and m4 use the regexprops-generic module,
which generates the gnulib file doc/regexprops-generic.texi as a
human-readable user description of all regex flavors.
​That's excellent! Another undiscovered gnulib gem.
​I looked at the Emacs Lisp manual to see what it currently looks like. The
relevant part, on the syntax of regexps, is here:

https://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-of-Regexps.html#Syntax-of-Regexps

Currently, it seems rather more elaborated than the gnulib documentation.

Would it be acceptable to gnulib to have its template updated to be more
like Emacs's? Essentially, this means the inclusion of more explanation. I
imagine this would be necessary for the Emacs maintainers to consider
taking their documentation from regexproper-generic.texi.

Note that I'm not talking about introducing full-on tutorial and example
sections. In the case of Emacs, the user manual has a section on regexps
which is more tutorial in nature, while the Elisp manual has a separate
examples section.

There are also notes in the Emacs manual, as comments in the source, that
suggest differences that are not currently documented: for example, a claim
that character class ranges such as [a-z] respect LC_COLLATE in grep, but
not in Emacs. This sort of difference could usefully be studied, and
perhaps hooks introduced for program-specific notes.

I'm happy to look into this if it's agreeable to the maintainers. It seems
to me an obvious opportunity to extend the good work already done by this
module in increasing documentation quality while reducing maintenance
effort.
--
http://rrt.sc3d.org
Eric Blake
2016-10-21 18:59:52 UTC
Permalink
Raw Message
[adding James and findutils]
Post by Reuben Thomas
Post by Reuben Thomas
Post by Eric Blake
In fact, such documentation already exists, and IS used in other GNU
manuals: at least findutils and m4 use the regexprops-generic module,
which generates the gnulib file doc/regexprops-generic.texi as a
human-readable user description of all regex flavors.
​That's excellent! Another undiscovered gnulib gem.
​I looked at the Emacs Lisp manual to see what it currently looks like. The
https://www.gnu.org/software/emacs/manual/html_node/elisp/Syntax-of-Regexps.html#Syntax-of-Regexps
Currently, it seems rather more elaborated than the gnulib documentation.
Would it be acceptable to gnulib to have its template updated to be more
like Emacs's? Essentially, this means the inclusion of more explanation. I
imagine this would be necessary for the Emacs maintainers to consider
taking their documentation from regexproper-generic.texi.
Gnulib is currently just a distribution point; the upstream template was
written by James Youngman for findutils (see
findutils.git/lib/regexprops.c). But I suspect patches would be
welcome, if you'd like to contribute some improvements.
Post by Reuben Thomas
Note that I'm not talking about introducing full-on tutorial and example
sections. In the case of Emacs, the user manual has a section on regexps
which is more tutorial in nature, while the Elisp manual has a separate
examples section.
There are also notes in the Emacs manual, as comments in the source, that
suggest differences that are not currently documented: for example, a claim
that character class ranges such as [a-z] respect LC_COLLATE in grep, but
not in Emacs. This sort of difference could usefully be studied, and
perhaps hooks introduced for program-specific notes.
I'm happy to look into this if it's agreeable to the maintainers. It seems
to me an obvious opportunity to extend the good work already done by this
module in increasing documentation quality while reducing maintenance
effort.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Reuben Thomas
2016-10-21 19:04:30 UTC
Permalink
Raw Message
Post by Eric Blake
[adding James and findutils]
Gnulib is currently just a distribution point; the upstream template was
written by James Youngman for findutils (see
findutils.git/lib/regexprops.c). But I suspect patches would be
welcome, if you'd like to contribute some improvements
​.
​Thanks. I tried checking out findutils git from Savannah but got:

​
$ git clone git://git.savannah.gnu.org/findutils.git
Cloning into 'findutils'...
remote: Counting objects: 20126, done.
remote: Compressing objects: 100% (3683/3683), done.
error: object ac0afbe494560392efad871a77f18c0aaa4d504a: missingEmail:
invalid author/committer line - missing email
fatal: Error in object
fatal: index-pack failed
$ git --version
git version 2.7.4
​
​
--
http://rrt.sc3d.org
Paul Eggert
2016-10-21 19:11:26 UTC
Permalink
Raw Message
Post by Reuben Thomas
invalid author/committer line - missing email
I am not observing that problem, also with Git 2.7.4:

$ git clone git://git.savannah.gnu.org/findutils.git tmp
Cloning into 'tmp'...
remote: Counting objects: 20126, done.
remote: Compressing objects: 100% (3683/3683), done.
remote: Total 20126 (delta 16037), reused 20126 (delta 16037)
Receiving objects: 100% (20126/20126), 10.96 MiB | 569.00 KiB/s, done.
Resolving deltas: 100% (16037/16037), done.
Checking connectivity... done.
$ git --version
git version 2.7.4

You might try running it again. Maybe it's the DDOS on Dyn?

Leyden J. Dyn dinged by DDoS: US DNS firm gives web a bad hair day. The
Register 2016-10-21. http://www.theregister.co.uk/2016/10/21/dns_dyn_ddos/
Reuben Thomas
2016-10-21 19:18:21 UTC
Permalink
Raw Message
Post by Paul Eggert
Post by Reuben Thomas
invalid author/committer line - missing email
$ git clone git://git.savannah.gnu.org/findutils.git tmp
Cloning into 'tmp'...
remote: Counting objects: 20126, done.
remote: Compressing objects: 100% (3683/3683), done.
remote: Total 20126 (delta 16037), reused 20126 (delta 16037)
Receiving objects: 100% (20126/20126), 10.96 MiB | 569.00 KiB/s, done.
Resolving deltas: 100% (16037/16037), done.
Checking connectivity... done.
$ git --version
git version 2.7.4
You might try running it again. Maybe it's the DDOS on Dyn?
​I had the following setting in my .gitconfig:

[transfer]
# for $deity's sake, check that anything we're getting is complete and sane
on a regular basis
# See
https://groups.google.com/forum/m/#!topic/binary-transparency/f-BI4o8HZW0
fsckobjects = true
​
​I commented it out, and the checkout succeeded. If I then run "git fsck",
I get lots of "error in tag foo: missingEmail" errors, and an error exit
code. I guess this is just a minor annoyance?
--
http://rrt.sc3d.org
Paul Eggert
2016-10-21 19:25:45 UTC
Permalink
Raw Message
Post by Reuben Thomas
fsckobjects = true

​ I commented it out, and the checkout succeeded. If I then run "git
fsck", I get lots of "error in tag foo: missingEmail" errors, and an
error exit code. I guess this is just a minor annoyance?
More like a major annoyance, I'm afraid. It means that the findutils
repository is slightly incompatible with Git and needs to be fixed on
Savannah. (If memory serves, this is due to a bug in older Git.) Fixing
a repository is not something one does lightly since it involves
changing history.
Eric Blake
2016-10-21 19:32:44 UTC
Permalink
Raw Message
Post by Paul Eggert
Post by Reuben Thomas
fsckobjects = true
​
​ I commented it out, and the checkout succeeded. If I then run "git
fsck", I get lots of "error in tag foo: missingEmail" errors, and an
error exit code. I guess this is just a minor annoyance?
More like a major annoyance, I'm afraid. It means that the findutils
repository is slightly incompatible with Git and needs to be fixed on
Savannah. (If memory serves, this is due to a bug in older Git.) Fixing
a repository is not something one does lightly since it involves
changing history.
Can we ascertain whether it is only unsigned tags that are affected (in
which case deleting the broken tags and pushing corrected replacement
tags with the same name in their place will solve the issue for all new
clones, because commits remain unchanged), or are there also commits
affected (in which case rewriting history will involve creating new
commit ids for all descendants of the problematic commit, which is a
rather dramatic effect, and which breaks any signed tags to old commit ids)?
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Reuben Thomas
2016-10-21 19:39:45 UTC
Permalink
Raw Message
Post by Eric Blake
Post by Paul Eggert
Post by Reuben Thomas
fsckobjects = true
​
​ I commented it out, and the checkout succeeded. If I then run "git
fsck", I get lots of "error in tag foo: missingEmail" errors, and an
error exit code. I guess this is just a minor annoyance?
More like a major annoyance, I'm afraid. It means that the findutils
repository is slightly incompatible with Git and needs to be fixed on
Savannah. (If memory serves, this is due to a bug in older Git.) Fixing
a repository is not something one does lightly since it involves
changing history.
Can we ascertain whether it is only unsigned tags that are affected (in
which case deleting the broken tags and pushing corrected replacement
tags with the same name in their place will solve the issue for all new
clones, because commits remain unchanged), or are there also commits
affected (in which case rewriting history will involve creating new
commit ids for all descendants of the problematic commit, which is a
rather dramatic effect, and which breaks any signed tags to old commit ids)?
T
​hey seem only to be tags: git fsck --verbose says only for each error:

Checking tag 68c0e85dba93471b4f7d450ae2e6139a56574549
error in tag 68c0e85dba93471b4f7d450ae2e6139a56574549: missingEmail:
invalid author/committer line - missing email
​
I can't see any errors for commits.
--
http://rrt.sc3d.org
Eric Blake
2016-10-21 20:17:21 UTC
Permalink
Raw Message
Post by Reuben Thomas
Checking tag 68c0e85dba93471b4f7d450ae2e6139a56574549
invalid author/committer line - missing email
​
I can't see any errors for commits.
I count 97 such broken tags; and I verified I can push replacement tags
(tag 68c0e85 was for FINDUTILS_4_3_5-1; if you do a fresh clone, tag
68c0e85 is now gone, and replaced by an identically-named annotated tag
6ee72b929 with my email as tagger). I'm planning to script the
conversion of all the tags, but am trying to figure out if it is worth
back-dating the tags, and whether an unsigned annotated tag or a simpler
lightweight tag is the better thing to push in place of each tag that
gets corrected.

At any rate, I'll post the commands I used to make the conversion, once
it is complete; here's my starting point.

$ git fsck 2>&1 | sed -n 's/error in tag \([^:]*\).*/\1/p' > broken
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Eric Blake
2016-10-21 20:59:20 UTC
Permalink
Raw Message
Post by Eric Blake
Post by Reuben Thomas
Checking tag 68c0e85dba93471b4f7d450ae2e6139a56574549
invalid author/committer line - missing email
​
I can't see any errors for commits.
I count 97 such broken tags; and I verified I can push replacement tags
(tag 68c0e85 was for FINDUTILS_4_3_5-1; if you do a fresh clone, tag
68c0e85 is now gone, and replaced by an identically-named annotated tag
6ee72b929 with my email as tagger). I'm planning to script the
Typo; that's 6eee72b. But again, it may be short-lived, if I can
recreate tags with original dates rather than today's date.
Post by Eric Blake
conversion of all the tags, but am trying to figure out if it is worth
back-dating the tags, and whether an unsigned annotated tag or a simpler
lightweight tag is the better thing to push in place of each tag that
gets corrected.
At any rate, I'll post the commands I used to make the conversion, once
it is complete; here's my starting point.
$ git fsck 2>&1 | sed -n 's/error in tag \([^:]*\).*/\1/p' > broken
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Bernhard Voelker
2016-10-22 09:25:34 UTC
Permalink
Raw Message
I count 97 such broken tags; and I verified I can push replacement tags (tag 68c0e85 was for FINDUTILS_4_3_5-1; if
you do a fresh clone, tag 68c0e85 is now gone, and replaced by an identically-named annotated tag 6ee72b929 with my
email as tagger). I'm planning to script the conversion of all the tags, but am trying to figure out if it is
worth back-dating the tags, and whether an unsigned annotated tag or a simpler lightweight tag is the better thing
to push in place of each tag that gets corrected.
So why is this "broken"? I can see 'git fsck' still running well with
version 2.1.4 while a later version complains about it. Isn't this a
(maybe known?) regression or break in backward-compatibility in git itself?

Have a nice day,
Berny
Jim Meyering
2016-10-23 15:57:27 UTC
Permalink
Raw Message
Post by Eric Blake
I'm planning to script the
conversion of all the tags, but am trying to figure out if it is worth
back-dating the tags, and whether an unsigned annotated tag or a simpler
lightweight tag is the better thing to push in place of each tag that
gets corrected.
I have found that it is worth the small additional effort to back-date
tags when doing that. Otherwise, they will all be misleadingly listed
with the date of the conversion, and in some clients will be shown as
"new".
Eric Blake
2016-10-25 14:26:03 UTC
Permalink
Raw Message
Post by Eric Blake
At any rate, I'll post the commands I used to make the conversion, once
it is complete; here's my starting point.
$ git fsck 2>&1 | sed -n 's/error in tag \([^:]*\).*/\1/p' > broken
Here's the final steps I used; the findutils.git repository is now
clean, using a script graciously provided by Kyle J. McKay, where the
attached file 'broken' is the list of tags that I fixed:

$ # get the cleanup script
$ wget
https://gist.githubusercontent.com/mackyle/9ea081513f6b90bb4470b7b2bc6e4bce/raw/7f1814a5b9823278852731e7f13e717937dffec2/export-fixed-tags
$ chmod +x export-fixed-tags
$ # create fresh clone, so none of my local tags get pushed
$ git clone git://git.sv.gnu.org/findutils.git findutils-cleanup
$ cd findutils-cleanup
$ # fix up the repository, creating a backdated tag for every broken tag
$ ../export-fixed-tags | git fast-import
processing invalid tag refs/tags/FINDUTILS-4_1-10
...
skipping OK tag refs/tags/v4.6.0 (v4.6.0)
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects: 5000
Total objects: 97 ( 0 duplicates )
...
$ # 97 tags were replaced; grab their names into 'broken'
$ git push -n --tags 2>&1 | \
sed -n 's/.* \([a-zA-Z0-9_-]*\) .already.*/\1/p' > broken
$ # delete the old tags, then push the fixed ones
$ git push origin $(sed s/^/:/ broken)
$ git push --tags


If you have previously checked out the repository, you will NOT get the
updated tags unless you FIRST delete all the broken tags you have
previously downloaded. If you want to clean things up locally, you can
use the following steps:

$ git tag -d $(cat broken)
$ git fetch origin --tags

You can then use git prune to remove the now-dangling broken tags.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
James Youngman
2016-11-27 17:09:02 UTC
Permalink
Raw Message
Post by Eric Blake
Post by Eric Blake
At any rate, I'll post the commands I used to make the conversion, once
it is complete; here's my starting point.
$ git fsck 2>&1 | sed -n 's/error in tag \([^:]*\).*/\1/p' > broken
Here's the final steps I used; the findutils.git repository is now
clean, using a script graciously provided by Kyle J. McKay, where the
Thanks for doing this!
James.

Eric Blake
2016-10-21 19:26:20 UTC
Permalink
Raw Message
Post by Reuben Thomas
[transfer]
# for $deity's sake, check that anything we're getting is complete and sane
on a regular basis
# See
https://groups.google.com/forum/m/#!topic/binary-transparency/f-BI4o8HZW0
fsckobjects = true
​
​I commented it out, and the checkout succeeded. If I then run "git fsck",
I get lots of "error in tag foo: missingEmail" errors, and an error exit
code. I guess this is just a minor annoyance?
Sounds like the tags were created in a much older version of git, and
that newer git complains about what the older version was able to
create. The solution is probably to recreate those tags with a valid
format, although propagating new tags to anyone that already has the old
ones checked out is not easy.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Bruno Haible
2016-10-21 19:46:40 UTC
Permalink
Raw Message
Post by Eric Blake
Post by Reuben Thomas
I get lots of "error in tag foo: missingEmail" errors, and an error exit
code. I guess this is just a minor annoyance?
Sounds like the tags were created in a much older version of git, and
that newer git complains about what the older version was able to
create. The solution is probably to recreate those tags with a valid
format, although propagating new tags to anyone that already has the old
ones checked out is not easy.
Here's how the Automake repository got fixed regarding this type of error:
https://lists.gnu.org/archive/html/bug-automake/2016-06/msg00004.html
Eric Blake
2016-10-21 20:42:59 UTC
Permalink
Raw Message
Post by Bruno Haible
Post by Eric Blake
Post by Reuben Thomas
I get lots of "error in tag foo: missingEmail" errors, and an error exit
code. I guess this is just a minor annoyance?
Sounds like the tags were created in a much older version of git, and
that newer git complains about what the older version was able to
create. The solution is probably to recreate those tags with a valid
format, although propagating new tags to anyone that already has the old
ones checked out is not easy.
https://lists.gnu.org/archive/html/bug-automake/2016-06/msg00004.html
Well, it's a (partial) recipe for how to fix automake, although I note
that automake.git has not actually been fixed yet. I've asked for more
information on that thread on the rest of the steps used in that recipe
(namely, how to generate the file to feed to git fast-import in the
first place). Then I can probably try to fix automake.git as well as
findutils.git.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Loading...