Discussion:
[PATCH] maint: set LANG=C instead of LC_ALL=C
Daniel P. Berrange
2017-08-10 16:36:24 UTC
Permalink
The maint.mk file currently sets LC_ALL=C so that build rules get a
predictable locale, independant of the user's environment settings.

It is sometimes neccesssary to override the locale when running
build commands from make rules, but as maint.mk set LC_ALL, it
is impossible to selectively override rules e.g. LC_CTYPE=C.UTF-8
will have no effect if LC_ALL is already set.

To deal with this maint.mk should instead set LANG=C, and then
explicitly unset all the other LC_* variables.

Signed-off-by: Daniel P. Berrange <***@redhat.com>
---
top/maint.mk | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/top/maint.mk b/top/maint.mk
index 09a98bce8..2d8f1391b 100644
--- a/top/maint.mk
+++ b/top/maint.mk
@@ -144,9 +144,27 @@ today = $(shell date +%Y-%m-%d)
news-check-lines-spec ?= 1,10
news-check-regexp ?= '^\*.* $(VERSION_REGEXP) \($(today)\)'

-# Prevent programs like 'sort' from considering distinct strings to be equal.
-# Doing it here saves us from having to set LC_ALL elsewhere in this file.
-export LC_ALL = C
+# Ensure a stable locale, independant of the user's environment, so that
+# locale sensitive programs used in the build (eg 'sort') have predictable
+# output.
+#
+# We want apps to be able override individual locale categories in their
+# make rules though, so must not set LC_ALL ourselves, but instead use
+# LANG as the lowest priority variable.
+unexport LC_ALL
+unexport LC_CTYPE
+unexport LC_NUMERIC
+unexport LC_TIME
+unexport LC_COLLATE
+unexport LC_MONETARY
+unexport LC_MESSAGES
+unexport LC_PAPER
+unexport LC_NAME
+unexport LC_ADDRESS
+unexport LC_TELEPHONE
+unexport LC_MEASUREMENT
+unexport LC_IDENTIFICATION
+export LANG = C

## --------------- ##
## Sanity checks. ##
--
2.13.3
Paul Eggert
2017-08-10 17:59:47 UTC
Permalink
Post by Daniel P. Berrange
It is sometimes neccesssary to override the locale when running
build commands from make rules, but as maint.mk set LC_ALL, it
is impossible to selectively override rules e.g. LC_CTYPE=C.UTF-8
will have no effect if LC_ALL is already set.
Why not unset LC_ALL?
Eric Blake
2017-08-10 19:14:57 UTC
Permalink
Post by Paul Eggert
Post by Daniel P. Berrange
It is sometimes neccesssary to override the locale when running
build commands from make rules, but as maint.mk set LC_ALL, it
is impossible to selectively override rules e.g. LC_CTYPE=C.UTF-8
will have no effect if LC_ALL is already set.
Why not unset LC_ALL?
You still want a sane fallback for all the categories that you are not
explicitly setting. It's harder to type:

LANG=C LC_CTYPE=C.UTF-8 env -u LC_ALL foo

than it is to type

LC_CTYPE=C.UTF-8 foo

where we know that LANG=C is already set and LC_ALL is already
out-of-the-way.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Bruno Haible
2017-08-10 22:37:18 UTC
Permalink
Post by Eric Blake
You still want a sane fallback for all the categories that you are not
LANG=C LC_CTYPE=C.UTF-8 env -u LC_ALL foo
or:
LANG=C LC_CTYPE=C.UTF-8 LC_ALL= foo
Post by Eric Blake
than it is to type
LC_CTYPE=C.UTF-8 foo
where we know that LANG=C is already set and LC_ALL is already
out-of-the-way.
In the big picture, I find the current situation more maintainable and robust:
Everyone knows that when specifying a "mixed" locale they need to set all
of LC_ALL, LC_<category>, and LANG.

Whereas in the world you are depicting, you would be doing half of a
mixed locale specification and depending on maint.mk to do the other half.
Call it "dependency" or call it "distributed responsibility" - in any
case it would require coordination and reduce the freedom of action
for the maintainers of maint.mk.

In particular, you don't even have a guarantee that LANG=C and LC_CTYPE=C.UTF-8
fit well together. (It might work on some libcs and fail on others.) With your
proposal, we don't have a clear responsibility: it's not clear whether
the setting of LANG=C by maint.mk or the setting of LC_CTYPE=C.UTF-8 is to
be changed.

Which is why I prefer the current situation, without mixed or intertwined
responsibilities.

Bruno
Eric Blake
2017-08-11 01:22:42 UTC
Permalink
Post by Bruno Haible
Post by Eric Blake
You still want a sane fallback for all the categories that you are not
LANG=C LC_CTYPE=C.UTF-8 env -u LC_ALL foo
LANG=C LC_CTYPE=C.UTF-8 LC_ALL= foo
If setting LC_ALL to empty forces fallback, then that is indeed easier
than explicitly unsetting it. Still, if we are advocating mixed locale
execution, we MUST ensure sane defaults for ALL of the LC_* variables.
So even if you are advocating for keeping LC_ALL set, we should STILL
sanitize LANG and all the other LC_* variables (either unset them, or
set them to C). The above command line will not work if LC_MESSAGES is
still set to some other locale, particularly one not encoded in UTF-8.
Post by Bruno Haible
Everyone knows that when specifying a "mixed" locale they need to set all
of LC_ALL, LC_<category>, and LANG.
Not just the LC_<category> they are overriding, but every other
LC_<category> as well.
Post by Bruno Haible
Whereas in the world you are depicting, you would be doing half of a
mixed locale specification and depending on maint.mk to do the other half.
Call it "dependency" or call it "distributed responsibility" - in any
case it would require coordination and reduce the freedom of action
for the maintainers of maint.mk.
But the command lines with the distributed responsibility are longer, so
having sane defaults IS worth having. Maybe you can still argue that
the sane default should include LC_ALL=C, but that does NOT mean that
the sane default can overlook the other tiers.
Post by Bruno Haible
In particular, you don't even have a guarantee that LANG=C and LC_CTYPE=C.UTF-8
fit well together. (It might work on some libcs and fail on others.)
We already have problems with Python refusing to import UTF-8 data in
LC_ALL=C environments (which is arguably a bug in python, since POSIX
says locale "C" is 8-bit clean and therefore cannot cause encoding
errors). C.UTF-8 does not exist everywhere, but does appear to shut up
the python problem when mixed with LANG=C, at least on the platforms
where the problems are encountered in the first place.
Post by Bruno Haible
With your
proposal, we don't have a clear responsibility: it's not clear whether
the setting of LANG=C by maint.mk or the setting of LC_CTYPE=C.UTF-8 is to
be changed.
I think maint.mk should still default to the C locale, but make it easy
to do a mixed-locale override. With LANG set and all other tiers
cleared, mixed-locale overrides are easy; but with LC_ALL set, an
override has to worry about all three tiers.
Post by Bruno Haible
Which is why I prefer the current situation, without mixed or intertwined
responsibilities.
Bruno
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Bruno Haible
2017-08-11 21:17:54 UTC
Permalink
Post by Eric Blake
Still, if we are advocating mixed locale
execution, we MUST ensure sane defaults for ALL of the LC_* variables.
In theory all the LC_* variables, yes. In practice, only those locale categories
that the programs actually uses (e.g. test-quotearg.c). Which rarely goes
beyond LC_CTYPE, LC_NUMERIC, LC_COLLATE.

But whether that makes 3 variables that you have to set/unset, or 12,
is irrelevant. My point is that it's simpler - both conceptually and
for practical issues - to define a self-contained locale specification -
whether it contains 15 characters or 200 characters, doesn't matter -
rather than define half of it and then stumble across issues.
Post by Eric Blake
We already have problems with Python refusing to import UTF-8 data in
LC_ALL=C environments (which is arguably a bug in python, since POSIX
says locale "C" is 8-bit clean and therefore cannot cause encoding
errors). C.UTF-8 does not exist everywhere, but does appear to shut up
the python problem when mixed with LANG=C, at least on the platforms
where the problems are encountered in the first place.
So, there is not even a universal good value of LC_ALL!
C programs linked against libc need one set of variables; Python programs
another set; Java programs another set, etc.

I am in favour of putting common "default code" or fallbacks into gnulib,
when there is no doubt that
1. it is the correct default,
2. it actually significantly helps the projects that use gnulib.
This is not the case here:
1. You have shown that LC_ALL=C or LANG=C is not optimal for Python.
2. The self-contained locale specification fits in 200 characters.

Bruno

Eric Blake
2017-08-10 19:18:44 UTC
Permalink
Post by Daniel P. Berrange
The maint.mk file currently sets LC_ALL=C so that build rules get a
predictable locale, independant of the user's environment settings.
s/independant/independent/
Post by Daniel P. Berrange
It is sometimes neccesssary to override the locale when running
s/neccesssary/necessary/
Post by Daniel P. Berrange
build commands from make rules, but as maint.mk set LC_ALL, it
is impossible to selectively override rules e.g. LC_CTYPE=C.UTF-8
will have no effect if LC_ALL is already set.
To deal with this maint.mk should instead set LANG=C, and then
explicitly unset all the other LC_* variables.
---
top/maint.mk | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
-# Prevent programs like 'sort' from considering distinct strings to be equal.
-# Doing it here saves us from having to set LC_ALL elsewhere in this file.
-export LC_ALL = C
+# Ensure a stable locale, independant of the user's environment, so that
s/independant/independent/
Post by Daniel P. Berrange
+# locale sensitive programs used in the build (eg 'sort') have predictable
+# output.
+#
+# We want apps to be able override individual locale categories in their
+# make rules though, so must not set LC_ALL ourselves, but instead use
+# LANG as the lowest priority variable.
+unexport LC_ALL
+unexport LC_CTYPE
Why not do a single line:

unexport LC_ALL LC_CTYPE ...
Post by Daniel P. Berrange
+export LANG = C
Otherwise, the idea is sane to me.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Loading...