Discussion:
Default Windows locale for localename.c
l***@gmail.com
2018-03-08 22:51:44 UTC
Permalink
Currently localename.c (which is used by gettext, so it's really everywhere)
calls GetThreadLocale() to get current locale identifier (if LC_* or LANG are
not set).

One problem with that is that GetThreadLocale() is unaffected by setlocale()
(already discussed in "Fix libunistring in MS-Windows locales" back in 2014).

Another problem is that GetThreadLocale() defaults like this (according to MSDN):
"When a new thread is created in a process, it inherits the locale of the
creating thread. This locale can be either the default Standards and Formats
locale or a different locale set for the creating thread in a call to
SetThreadLocale"

It was pointed out to me recently that the locale obtained from "Standards and
Formats" (also known as "intl.cpl") is *not* the "UI locale". That is, the
settings from intl.cpl influence the way timestamps, dates, currency and
numbers are displayed. But they do not decide the UI language.

Because of this gettext-using programs tend to pick the wrong default locale
when running on Windows.

I'm currently thinking what to do about this. Maybe GetThreadUILanguage()
instead? OTOH, date, time, number, currency formats and days-of-the-week names
should probably continue to be obtained from intl.cpl (directly or indirectly;
i'm not sure whether C runtime functions do that, but there are W32 APIs for
that, such as GetLocaleInfo(), and i remember seeing them being used in a plibc
nl_langinfo() implementation back in the day; gnulib uses strftime() for this,
and it seems to be working, so that could be expanded too, i guess).
Eli Zaretskii
2018-03-12 17:20:05 UTC
Permalink
Date: Fri, 9 Mar 2018 01:51:44 +0300
Currently localename.c (which is used by gettext, so it's really everywhere)
calls GetThreadLocale() to get current locale identifier (if LC_* or LANG are
not set).
That's not entirely accurate, at least not wrt the version in the
Gnulib Git repository. There we have this:

const char *
gl_locale_name (int category, const char *categoryname)
{
const char *retval;

retval = gl_locale_name_thread (category, categoryname);
if (retval != NULL)
return retval;

retval = gl_locale_name_posix (category, categoryname);
if (retval != NULL)
return retval;

return gl_locale_name_default ();
}

GetThreadLocale is called from gl_locale_name_default, which is the
last resort. Before that, the function calls gl_locale_name_thread,
which on MS-Windows calls setlocale, as you wanted.

So I'm not sure what problem did you have and why. Or maybe I'm
missing something.
Because of this gettext-using programs tend to pick the wrong default locale
when running on Windows.
Could you please describe the situation where this happens in detail?
And what exactly do you mean by "the wrong default locale", given that
a locale could be used for different purposes, and on Windows, as you
correctly point out, there are several "kinds" of default locales.
I'm currently thinking what to do about this. Maybe GetThreadUILanguage()
instead?
That function is available only since Vista, so IMO it cannot be used
unconditionally, and some fallback should be available when it isn't.
OTOH, date, time, number, currency formats and days-of-the-week names
should probably continue to be obtained from intl.cpl (directly or indirectly;
i'm not sure whether C runtime functions do that, but there are W32 APIs for
that, such as GetLocaleInfo(), and i remember seeing them being used in a plibc
nl_langinfo() implementation back in the day; gnulib uses strftime() for this,
and it seems to be working, so that could be expanded too, i guess).
Programs that mix CRT functions, such as setlocale, and Win32 APIs,
such as GetLocaleInfo and GetThreadLocale, are IME asking for trouble,
and people who do that should know very well what they are doing,
because such mixtures have various subtle issues.

On balance, I think using setlocale whenever possible is a good way of
getting Posix-like behavior that programs supported by Gnulib expect.
It isn't perfect, because using other APIs it might be possible to
provide some enhanced features, but OTOH using those APIs could lose
big time in other situations. So I think Gnulib is correct in using
setlocale whenever possible, and only falling back on other APIs if
setlocale somehow fails.

It's possible that you bumped into some specialized situation where
this defaults is somehow inappropriate, which is why I would ask you
to kindly describe your use case in more detail.

Thanks.
l***@gmail.com
2018-03-12 20:23:21 UTC
Permalink
Post by Eli Zaretskii
From: lrn
Date: Fri, 9 Mar 2018 01:51:44 +0300
Currently localename.c (which is used by gettext, so it's really everywhere)
calls GetThreadLocale() to get current locale identifier (if LC_* or LANG are
not set).
That's not entirely accurate, at least not wrt the version in the
const char *
gl_locale_name (int category, const char *categoryname)
{
const char *retval;
retval = gl_locale_name_thread (category, categoryname);
if (retval != NULL)
return retval;
retval = gl_locale_name_posix (category, categoryname);
if (retval != NULL)
return retval;
return gl_locale_name_default ();
}
GetThreadLocale is called from gl_locale_name_default, which is the
last resort. Before that, the function calls gl_locale_name_thread,
which on MS-Windows calls setlocale, as you wanted.
So I'm not sure what problem did you have and why. Or maybe I'm
missing something.
I've re-traced this, here's how it works:

libintl_dcigettext (domainname=0x45e3718 "gedit", ***@entry=0x0,
msgid1=***@entry=0x6887f8af <__FUNCTION__.74302+47> "",
msgid2=***@entry=0x0, plural=***@entry=0, n=***@entry=0,
category=***@entry=1729)
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/dcigettext.c:667

guess_category_value (categoryname=0x61ed714d "LC_MESSAGES", category=1729)
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/dcigettext.c:1561

_nl_locale_name_posix (category=1729, categoryname=0x61ed714d "LC_MESSAGES")
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/localename.c:2815

_nl_locale_name_environ (category=1729, categoryname=0x61ed714d "LC_MESSAGES")
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/localename.c:2819

does a number of getenv() calls, all fail, so it returns NULL, so back to
guess_category_value()

_nl_locale_name_default ()
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/localename.c:2941

which in the source code is called gl_locale_name_default() (so i guess this is
gnulib name mangling in action? Come to think of it, all other "_nl" function
above also have "gl" prefix in the source code), which just calls
GetThreadLocale () immediately.
Post by Eli Zaretskii
Because of this gettext-using programs tend to pick the wrong default locale
when running on Windows.
Could you please describe the situation where this happens in detail?
And what exactly do you mean by "the wrong default locale", given that
a locale could be used for different purposes, and on Windows, as you
correctly point out, there are several "kinds" of default locales.
My Windows installation is in English (i.e. this is the language of the
installer and the UI language for all applications and system software), but i
have Russian regional settings (time, date, weekday names, currency, number
formatting). Because of this any program using gettext is in Russian, unless i
deliberately set LANG or LC_*.

Note that the "Language for non-Unicode programs" setting (also in intl.cpl)
does not affect the value that GetThreadLocale() returns. The only thing that
matters is what you select in the "Format:" dropdown list at the "Fromats" tab.
Even if you go ahead and rearrange all specific formats to something completely
different afterwardы, the item selected in that list will define the value that
GetThreadLocale() returns.
Post by Eli Zaretskii
I'm currently thinking what to do about this. Maybe GetThreadUILanguage()
instead?
That function is available only since Vista, so IMO it cannot be used
unconditionally, and some fallback should be available when it isn't.
Well, that was just a shot in the dark. First i'll need to verify that the
function(s) in question actually return(s) the right things (which means
running test programs on different Windows versions with different UI languages
(with or without using MUI) and regional settings).
Eli Zaretskii
2018-03-12 20:50:56 UTC
Permalink
[Please CC me on the responses, as I'm not subscribed to this list.]
Date: Mon, 12 Mar 2018 23:23:21 +0300
Post by Eli Zaretskii
const char *
gl_locale_name (int category, const char *categoryname)
{
const char *retval;
retval = gl_locale_name_thread (category, categoryname);
if (retval != NULL)
return retval;
retval = gl_locale_name_posix (category, categoryname);
if (retval != NULL)
return retval;
return gl_locale_name_default ();
}
GetThreadLocale is called from gl_locale_name_default, which is the
last resort. Before that, the function calls gl_locale_name_thread,
which on MS-Windows calls setlocale, as you wanted.
So I'm not sure what problem did you have and why. Or maybe I'm
missing something.
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/dcigettext.c:667
guess_category_value (categoryname=0x61ed714d "LC_MESSAGES", category=1729)
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/dcigettext.c:1561
_nl_locale_name_posix (category=1729, categoryname=0x61ed714d "LC_MESSAGES")
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/localename.c:2815
This shows that guess_category_value calls gl_locale_name_posix
directly, instead of calling gl_locale_name. So the question now
becomes why does gettext do that, and whether it should do something
different at least on MS-Windows.

Also note that LC_MESSAGES is not supported by the Windows setlocale,
so either gettext should use some other category, or it will have to
invent stuff out of thin air, something I don't recommend.
My Windows installation is in English (i.e. this is the language of the
installer and the UI language for all applications and system software), but i
have Russian regional settings (time, date, weekday names, currency, number
formatting). Because of this any program using gettext is in Russian, unless i
deliberately set LANG or LC_*.
Which in general sounds quite reasonable to me, btw, given the
description of your setup.

And what does setlocale(LC_ALL,NULL) return on that system?
l***@gmail.com
2018-03-13 16:11:35 UTC
Permalink
Post by Eli Zaretskii
From: LRN
Date: Mon, 12 Mar 2018 23:23:21 +0300
Post by Eli Zaretskii
const char *
gl_locale_name (int category, const char *categoryname)
{
const char *retval;
retval = gl_locale_name_thread (category, categoryname);
if (retval != NULL)
return retval;
retval = gl_locale_name_posix (category, categoryname);
if (retval != NULL)
return retval;
return gl_locale_name_default ();
}
GetThreadLocale is called from gl_locale_name_default, which is the
last resort. Before that, the function calls gl_locale_name_thread,
which on MS-Windows calls setlocale, as you wanted.
So I'm not sure what problem did you have and why. Or maybe I'm
missing something.
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/dcigettext.c:667
guess_category_value (categoryname=0x61ed714d "LC_MESSAGES", category=1729)
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/dcigettext.c:1561
_nl_locale_name_posix (category=1729, categoryname=0x61ed714d "LC_MESSAGES")
at gettext-0.19.7/gettext-tools/../gettext-runtime/intl/localename.c:2815
This shows that guess_category_value calls gl_locale_name_posix
directly, instead of calling gl_locale_name. So the question now
becomes why does gettext do that, and whether it should do something
different at least on MS-Windows.
Try git blame? I don't have a git clone of gettext at hand.
Post by Eli Zaretskii
Also note that LC_MESSAGES is not supported by the Windows setlocale,
so either gettext should use some other category, or it will have to
invent stuff out of thin air, something I don't recommend.
Don't know about gettext (i'm not an active participant there), but glib has
#ifdefs for this, and a special function to call to get message locale on W32,
instead of passing LC_MESSAGES to setlocale(). A side-note: that special
function also calls GetThreadLocale(), so i clearly have my work cut out for me...
Post by Eli Zaretskii
My Windows installation is in English (i.e. this is the language of the
installer and the UI language for all applications and system software), but i
have Russian regional settings (time, date, weekday names, currency, number
formatting). Because of this any program using gettext is in Russian, unless i
deliberately set LANG or LC_*.
Which in general sounds quite reasonable to me, btw, given the
description of your setup.
Well, on one hand, i agree that it's a setting that a user can change easily
enough (no extra data needed, no admin privileges, no need to re-login), and it
is kind of related to user locale.

On the other hand, this clashes with the actual language used by the OS. I
mean, *everything* (text in all GUI and commandline programs that come with the
OS) is in English. That can be changed by downloading a language pack (formerly
known as a MUI pack), installing it, and then switching the language (control
panel -> language -> Options (for a language in the list; add one if it isn't
in the list) -> Override display language. This is the MS-blessed
language-changing procedure, and this is how you are supposed to change the
system UI language after installation (the OS has a pre-set language used
during the installation; it's also the language used by the OS if it's
pre-installed on your PC).

The Formats language setting is more about *where you are*: how dates are
written here, which decimal separator is used, what's the local currency, how
the time is displayed, what are the names of the days of the week - even if you
know multiple languages, it's easier to have these set in a way that matches
the things used at your location.
UI language is the W32 equivalent of LC_MESSAGES, i guess.

It seems that proprietary programs follows the UI language setting, not the
formats locale (that said, i'm not the best person to ask, since i don't really
have much to do with proprietary apps). MSDN [1] explains the multiple language
situation in Vista+ (which is what we should aim to use; XP can just keep using
GetThreadLocale() for all i care; that said, GetUserDefaultUILanguage() is
claimed to be 2000+, so we could use that) and that
GetUserDefaultUILanguage()/GetUserPreferredUILanguages() return the UI
language(s) for the user.

MSDN [1] also lists per-process and per-thread language functions, but these
either default to empty return string or explicitly fall back to more generic
User/System/Install language.
Post by Eli Zaretskii
And what does setlocale(LC_ALL,NULL) return on that system?
It returns "C".


[1] https://msdn.microsoft.com/en-us/library/windows/desktop/dd374098(v=vs.85).aspx
Eli Zaretskii
2018-03-13 17:06:32 UTC
Permalink
Date: Tue, 13 Mar 2018 19:11:35 +0300
Post by Eli Zaretskii
This shows that guess_category_value calls gl_locale_name_posix
directly, instead of calling gl_locale_name. So the question now
becomes why does gettext do that, and whether it should do something
different at least on MS-Windows.
Try git blame? I don't have a git clone of gettext at hand.
I'd prefer to hear from Bruno instead...
Post by Eli Zaretskii
And what does setlocale(LC_ALL,NULL) return on that system?
It returns "C".
Really? That's strange. And what does GetACP return?
l***@gmail.com
2018-03-13 20:45:08 UTC
Permalink
Post by Eli Zaretskii
Cc: Eli Zaretskii
From: LRN
Date: Tue, 13 Mar 2018 19:11:35 +0300
Post by Eli Zaretskii
And what does setlocale(LC_ALL,NULL) return on that system?
It returns "C".
Really? That's strange. And what does GetACP return?
1251. Most likely because this is what "Current language for non-Unicode
programs" in intl.cpl is set to "Russian". Changing it to "English (US)" would
cause GetACP() to return 1252.
Eli Zaretskii
2018-03-15 08:37:10 UTC
Permalink
Date: Tue, 13 Mar 2018 23:45:08 +0300
Post by Eli Zaretskii
Post by l***@gmail.com
Post by Eli Zaretskii
And what does setlocale(LC_ALL,NULL) return on that system?
It returns "C".
Really? That's strange.
Sorry, I guess I didn't make myself clear enough. Every C program
starts with a C locale; what I meant is what does setlocale return
after you do this:

setlocale (LC_ALL, "");

I expect it to return "English_United States.1251" in all the
categories. If that's what happens in your case, then the only
problem with gettext is that it doesn't call gl_locale_name, but
instead calls gl_locale_name_posix directly. I'd say it's a gettext
bug.

Bruno?
l***@gmail.com
2018-03-15 16:45:11 UTC
Permalink
Post by Eli Zaretskii
Cc: Eli Zaretskii
From: LRN
Date: Tue, 13 Mar 2018 23:45:08 +0300
Post by Eli Zaretskii
Post by l***@gmail.com
Post by Eli Zaretskii
And what does setlocale(LC_ALL,NULL) return on that system?
It returns "C".
Really? That's strange.
Sorry, I guess I didn't make myself clear enough. Every C program
starts with a C locale; what I meant is what does setlocale return
setlocale (LC_ALL, "");
I expect it to return "English_United States.1251" in all the
categories.
Nope, after i call setlocale (LC_ALL, ""), all subsequent calls to
setlocale(category, NULL) return "Russian_Russia.1251".
Post by Eli Zaretskii
If that's what happens in your case, then the only
problem with gettext is that it doesn't call gl_locale_name, but
instead calls gl_locale_name_posix directly. I'd say it's a gettext
bug.
gl_locale_name() calls gl_locale_name_thread(), which returns the result of
setlocale(category, NULL) for all supported categories (i.e. for everything
except LC_MESSAGES; it returns NULL for those). If it returns NULL, then the
rest of it is the same as what gettext does (posix, then default).
Eli Zaretskii
2018-03-15 16:56:59 UTC
Permalink
Date: Thu, 15 Mar 2018 19:45:11 +0300
Post by Eli Zaretskii
setlocale (LC_ALL, "");
I expect it to return "English_United States.1251" in all the
categories.
Nope, after i call setlocale (LC_ALL, ""), all subsequent calls to
setlocale(category, NULL) return "Russian_Russia.1251".
That's really strange, given the description of how you configured
your system.
Post by Eli Zaretskii
If that's what happens in your case, then the only
problem with gettext is that it doesn't call gl_locale_name, but
instead calls gl_locale_name_posix directly. I'd say it's a gettext
bug.
gl_locale_name() calls gl_locale_name_thread(), which returns the result of
setlocale(category, NULL) for all supported categories (i.e. for everything
except LC_MESSAGES; it returns NULL for those). If it returns NULL, then the
rest of it is the same as what gettext does (posix, then default).
Gettext (or any other program) shouldn't be calling setlocale with
LC_MESSAGES on MS-Windows.
l***@gmail.com
2018-03-15 17:01:35 UTC
Permalink
Post by Eli Zaretskii
From: LRN
Date: Thu, 15 Mar 2018 19:45:11 +0300
gl_locale_name() calls gl_locale_name_thread(), which returns the result of
setlocale(category, NULL) for all supported categories (i.e. for everything
except LC_MESSAGES; it returns NULL for those). If it returns NULL, then the
rest of it is the same as what gettext does (posix, then default).
Gettext (or any other program) shouldn't be calling setlocale with
LC_MESSAGES on MS-Windows.
gettext (or any other program) calls gnulib gl_locale_name() with LC_MESSAGES,
which calls gnulib gl_locale_name_thread() with LC_MESSAGES, which does not
call CRT setlocale() with LC_MESSAGES, because it shouldn't be calling
setlocale with LC_MESSAGES on MS-Windows. So that's not an issue here.
l***@gmail.com
2018-03-13 22:26:27 UTC
Permalink
Post by l***@gmail.com
MSDN [1] explains the multiple language
situation in Vista+ (which is what we should aim to use; XP can just keep using
GetThreadLocale() for all i care; that said, GetUserDefaultUILanguage() is
claimed to be 2000+, so we could use that) and that
GetUserDefaultUILanguage()/GetUserPreferredUILanguages() return the UI
language(s) for the user.
MSDN [1] also lists per-process and per-thread language functions, but these
either default to empty return string or explicitly fall back to more generic
User/System/Install language.
[1] https://msdn.microsoft.com/en-us/library/windows/desktop/dd374098(v=vs.85).aspx
Tested it out.

For English UI and Russian format setting:

GetUserDefaultUILanguage() returns:
0x409 (primary ID 0x9 sub ID 0x1 (meaning en_US)).

GetUserPreferredUILanguages() returns:
"0409" for language ID and "en-US" for language name.
(1 language, in a 00-terminated string).


For Russian UI and English format setting:

GetUserDefaultUILanguage() returns:
0x419 (primary ID 0x19 sub ID 0x1 (meaning ru_RU)).

GetUserPreferredUILanguages() returns:
"0419" for language ID and "ru-RU" for language name.
"0409" for language ID and "en-US" for language name.
(2 languages, in a 0-delimited, 00-terminated string).

Since there's no POSIX api for returning a list of locales, we'd be OK with
just GetUserDefaultUILanguage() (although MSDN recommends
GetUserPreferredUILanguages(), since it returns strings and thus can represent
custom locales; though we might need new code for parsing its output).
Eli Zaretskii
2018-03-15 08:40:54 UTC
Permalink
Date: Wed, 14 Mar 2018 01:26:27 +0300
0x409 (primary ID 0x9 sub ID 0x1 (meaning en_US)).
"0409" for language ID and "en-US" for language name.
(1 language, in a 00-terminated string).
0x419 (primary ID 0x19 sub ID 0x1 (meaning ru_RU)).
"0419" for language ID and "ru-RU" for language name.
"0409" for language ID and "en-US" for language name.
(2 languages, in a 0-delimited, 00-terminated string).
Since there's no POSIX api for returning a list of locales, we'd be OK with
just GetUserDefaultUILanguage() (although MSDN recommends
GetUserPreferredUILanguages(), since it returns strings and thus can represent
custom locales; though we might need new code for parsing its output).
I think just calling setlocale should be enough, and if gettext calls
gl_locale_name, you will have what you want. Calling Win32 APIs
directly is not something I recommend, unless there's no other way of
doing something reasonable by using CRT functions.
l***@gmail.com
2018-03-15 16:57:40 UTC
Permalink
Post by Eli Zaretskii
Date: Wed, 14 Mar 2018 01:26:27 +0300
0x409 (primary ID 0x9 sub ID 0x1 (meaning en_US)).
"0409" for language ID and "en-US" for language name.
(1 language, in a 00-terminated string).
0x419 (primary ID 0x19 sub ID 0x1 (meaning ru_RU)).
"0419" for language ID and "ru-RU" for language name.
"0409" for language ID and "en-US" for language name.
(2 languages, in a 0-delimited, 00-terminated string).
Since there's no POSIX api for returning a list of locales, we'd be OK with
just GetUserDefaultUILanguage() (although MSDN recommends
GetUserPreferredUILanguages(), since it returns strings and thus can represent
custom locales; though we might need new code for parsing its output).
I think just calling setlocale should be enough, and if gettext calls
gl_locale_name, you will have what you want. Calling Win32 APIs
directly is not something I recommend, unless there's no other way of
doing something reasonable by using CRT functions.
Well, that's a pity. And here i already made a patch for this...

Sorry, it's not a git format-patch (turns out gettext git repo doesn't carry
all these multiple copies of gnulib around, so there's almost nothing to patch
there).
l***@gmail.com
2018-03-22 07:51:06 UTC
Permalink
Post by l***@gmail.com
Post by Eli Zaretskii
From: LRN
Cc: Eli Zaretskii
Date: Wed, 14 Mar 2018 01:26:27 +0300
0x409 (primary ID 0x9 sub ID 0x1 (meaning en_US)).
"0409" for language ID and "en-US" for language name.
(1 language, in a 00-terminated string).
0x419 (primary ID 0x19 sub ID 0x1 (meaning ru_RU)).
"0419" for language ID and "ru-RU" for language name.
"0409" for language ID and "en-US" for language name.
(2 languages, in a 0-delimited, 00-terminated string).
Since there's no POSIX api for returning a list of locales, we'd be OK with
just GetUserDefaultUILanguage() (although MSDN recommends
GetUserPreferredUILanguages(), since it returns strings and thus can represent
custom locales; though we might need new code for parsing its output).
I think just calling setlocale should be enough, and if gettext calls
gl_locale_name, you will have what you want. Calling Win32 APIs
directly is not something I recommend, unless there's no other way of
doing something reasonable by using CRT functions.
Well, that's a pity. And here i already made a patch for this...
Sorry, it's not a git format-patch (turns out gettext git repo doesn't carry
all these multiple copies of gnulib around, so there's almost nothing to patch
there).
So, it's been a week. Is no one interested in discussing this?

Loading...