Discussion:
Fix libunistring in MS-Windows locales
(too old to reply)
Eli Zaretskii
2014-07-03 15:23:32 UTC
Permalink
Raw Message
I submitted these changes to the libunistring mailing list, but was
advised to send them here, as libunistring appears to be a collection
of Gnulib modules. So here I am, repeating what I sent to
libunistring list. (The patch below is relative to the latest Gnulib
master.)

Libunistring on MS-Windows is currently less useful than it could be.
In a nutshell, it supports only the default system locale; setting the
locale to anything else using 'setlocale' will cause many libunistring
functions to fail.

One reason is that libunistring needs to know the codeset of the
locale, to use it when it converts strings to and from Unicode using
iconv. However, on Windows the function locale_charset always returns
the default ANSI codepage, using the GetACP API. That API always
returns the same codepage regardless of locale changes. The result is
that when libunistring is passed a string in any encoding incompatible
with the default system codepage, any calls to libiconv will fail with
EILSEQ. And since most libunistring functions call libiconv, they all
fail.

The second problem that gets in the way is the one in gl_locale_name.
On MS-Windows, this function returns only the default locale (by
calling GetThreadLocale, which is again oblivious to any locale
changes made by 'setlocale'). So again, only the default locale can
be supported correctly. E.g., functions that need to apply
language-specific rules, like for Turkish, will work incorrectly
because the language they get from gl_locale_name is not Turkish, even
though 'setlocale' was previously called to set the Turkish locale.

To fix these problems, I propose the following changes. They have
been extensively tested using Guile's test suite, in particular the
i18n.test there.

I hope these changes will be accepted, because as I said, without them
libunistring is much less useful on MS-Windows than it could be.

Thanks.

2014-07-02 Eli Zaretskii <***@gnu.org>

Improve support for non-default locales on MS-Windows.
* lib/localcharset.c (locale_charset) [WINDOWS_NATIVE]: Before
falling back on the default system codepage, try extracting
the codepage from what 'setlocale' returns. This allows to
take into account changes of the codeset due to non-default
locale set by a previous call to 'setlocale'.

* lib/localename.c (LOCALE_NAME_MAX_LENGTH) [WINDOWS_NATIVE]:
Define if not already defined.
(enum_locales_fn, get_lcid) [WINDOWS_NATIVE]: New functions.
(gl_locale_name_thread) [WINDOWS_NATIVE]: Produce the
current locale by calling 'setlocale', then converting the
locale name into LCID by calling 'get_lcid'. This allows to
take into account changes in the current locale from the
default one, in contrast to GetThreadLocale.

--- lib/localcharset.c~0 2014-02-09 09:41:26 +0200
+++ lib/localcharset.c 2014-07-03 18:17:44 +0300
@@ -34,6 +34,7 @@

#if defined _WIN32 || defined __WIN32__
# define WINDOWS_NATIVE
+# include <locale.h>
#endif

#if defined __EMX__
@@ -461,14 +462,34 @@ locale_charset (void)

static char buf[2 + 10 + 1];

- /* The Windows API has a function returning the locale's codepage as a
- number: GetACP().
- When the output goes to a console window, it needs to be provided in
- GetOEMCP() encoding if the console is using a raster font, or in
- GetConsoleOutputCP() encoding if it is using a TrueType font.
- But in GUI programs and for output sent to files and pipes, GetACP()
- encoding is the best bet. */
- sprintf (buf, "CP%u", GetACP ());
+ /* The Windows API has a function returning the locale's codepage as
+ a number, but the value doesn't change according to what the
+ 'setlocale' call specified. So we use it as a last resort, in
+ case the string returned by 'setlocale' doesn't specify the
+ codepage. */
+ char *current_locale = setlocale (LC_ALL, NULL);
+ char *pdot;
+
+ /* If they set different locales for different categories,
+ 'setlocale' will return a semi-colon separated list of locale
+ values. To make sure we use the correct one, we choose LC_CTYPE. */
+ if (strchr (current_locale, ';'))
+ current_locale = setlocale (LC_CTYPE, NULL);
+
+ pdot = strrchr (current_locale, '.');
+ if (pdot)
+ sprintf (buf, "CP%s", pdot + 1);
+ else
+ {
+ /* The Windows API has a function returning the locale's codepage as a
+ number: GetACP().
+ When the output goes to a console window, it needs to be provided in
+ GetOEMCP() encoding if the console is using a raster font, or in
+ GetConsoleOutputCP() encoding if it is using a TrueType font.
+ But in GUI programs and for output sent to files and pipes, GetACP()
+ encoding is the best bet. */
+ sprintf (buf, "CP%u", GetACP ());
+ }
codeset = buf;

#elif defined OS2


--- lib/localename.c~0 2014-02-09 09:41:26 +0200
+++ lib/localename.c 2014-07-03 18:11:45 +0300
@@ -60,6 +60,7 @@
#if defined WINDOWS_NATIVE || defined __CYGWIN__ /* Native Windows or Cygwin */
# define WIN32_LEAN_AND_MEAN
# include <windows.h>
+# include <winnls.h>
/* List of language codes, sorted by value:
0x01 LANG_ARABIC
0x02 LANG_BULGARIAN
@@ -1124,6 +1125,9 @@
# ifndef LOCALE_SNAME
# define LOCALE_SNAME 0x5c
# endif
+# ifndef LOCALE_NAME_MAX_LENGTH
+# define LOCALE_NAME_MAX_LENGTH 85
+# endif
#endif


@@ -2502,7 +2506,69 @@ gl_locale_name_from_win32_LCID (LCID lci
return gl_locale_name_from_win32_LANGID (langid);
}

-#endif
+#ifdef WINDOWS_NATIVE
+
+/* Two variables to interface between get_lcid and the EnumLocales
+ callback function below. */
+static LCID found_lcid;
+static char lname[LC_MAX * (LOCALE_NAME_MAX_LENGTH + 1) + 1];
+
+/* Callback function for EnumLocales. */
+static BOOL CALLBACK
+enum_locales_fn (LPTSTR locale_num_str)
+{
+ char *endp;
+ char locval[2 * LOCALE_NAME_MAX_LENGTH + 1 + 1];
+ LCID try_lcid = strtoul (locale_num_str, &endp, 16);
+
+ if (GetLocaleInfo (try_lcid, LOCALE_SENGLANGUAGE,
+ locval, LOCALE_NAME_MAX_LENGTH))
+ {
+ strcat (locval, "_");
+ if (GetLocaleInfo (try_lcid, LOCALE_SENGCOUNTRY,
+ locval + strlen (locval), LOCALE_NAME_MAX_LENGTH))
+ {
+ size_t locval_len = strlen (locval);
+
+ if (strncmp (locval, lname, locval_len) == 0
+ && (lname[locval_len] == '.'
+ || lname[locval_len] == '\0'))
+ {
+ found_lcid = try_lcid;
+ return FALSE;
+ }
+ }
+ }
+ return TRUE;
+}
+
+/* Return the Locale ID (LCID) number given the locale's name, a
+ string, in LOCALE_NAME. This works by enumerating all the locales
+ supported by the system, until we find one whose name matches
+ LOCALE_NAME. */
+static LCID
+get_lcid (const char *locale_name)
+{
+ /* A simple cache. */
+ static LCID last_lcid;
+ static char last_locale[1000];
+
+ if (last_lcid > 0 && strcmp (locale_name, last_locale) == 0)
+ return last_lcid;
+ strncpy (lname, locale_name, sizeof (lname) - 1);
+ lname[sizeof (lname) - 1] = '\0';
+ found_lcid = 0;
+ EnumSystemLocales (enum_locales_fn, LCID_SUPPORTED);
+ if (found_lcid > 0)
+ {
+ last_lcid = found_lcid;
+ strcpy (last_locale, locale_name);
+ }
+ return found_lcid;
+}
+
+#endif /* WINDOWS_NATIVE */
+#endif /* WINDOWS_NATIVE || __CYGWIN__ */


#if HAVE_USELOCALE /* glibc or Mac OS X */
@@ -2660,6 +2726,26 @@ gl_locale_name_thread (int category, con
const char *name = gl_locale_name_thread_unsafe (category, categoryname);
if (name != NULL)
return struniq (name);
+#elif defined WINDOWS_NATIVE
+ if (LC_MIN <= category && category <= LC_MAX)
+ {
+ char *locname = setlocale (category, NULL);
+
+ /* If CATEGORY is LC_ALL, the result might be a semi-colon
+ separated list of locales. We need only one, so we take the
+ one corresponding to LC_CTYPE, as the most important for
+ character translations. */
+ if (strchr (locname, ';'))
+ locname = setlocale (LC_CTYPE, NULL);
+
+ /* Convert locale name to LCID. We don't want to use
+ LocaleNameToLCID because (a) it is only available since Vista,
+ and (b) it doesn't accept locale names returned by 'setlocale'. */
+ LCID lcid = get_lcid (locname);
+
+ if (lcid > 0)
+ return gl_locale_name_from_win32_LCID (lcid);
+ }
#endif
return NULL;
}
Karl Berry
2014-07-03 22:19:20 UTC
Permalink
Raw Message
As far as I know, bug-libunistring is still intended to exist
independent of gnulib. At least I have never heard otherwise from
Bruno.

However, Bruno has just today officially stepped down as maintainer of
libunistring, and all his other remaining packages.

Daiki, would you like to become the new maintainer? Alternatively, if
you think libunistring as a separate entity should go away and it just
be part of gnulib, that's fine. Ditto libiconv (another of Bruno's
packages), which I suspect is in a similar situation, on both counts.

karl
Daiki Ueno
2014-07-04 03:39:32 UTC
Permalink
Raw Message
Post by Karl Berry
Daiki, would you like to become the new maintainer?
If nobody else is interested, I can take it.
Post by Karl Berry
Alternatively, if you think libunistring as a separate entity should
go away and it just be part of gnulib, that's fine. Ditto libiconv
(another of Bruno's packages), which I suspect is in a similar
situation, on both counts.
I think a separate project (at least a git repository) is still needed
for standalone releases. Since those libraries include large data,
providing them as a shared library can be more efficient than as part of
gnulib.

Regards,
--
Daiki Ueno
Paul Eggert
2014-07-05 15:46:24 UTC
Permalink
Raw Message
Post by Daiki Ueno
If nobody else is interested, I can take it.
Thanks for volunteering. It's a great loss for us that Bruno has
stepped down, and we appreciate your stepping in. I have added you to
the gnulib group on Savannah and have installed the attached patch
(compressed) to try to keep track of this.

This is my guess at the libunistring modules; since I don't know
libunistring well, I could have guessed wrong, in which case we should
fix it. You didn't volunteer for libiconv- and gettext-related modules
so I reverted their maintainer to 'all'; if you'd like to volunteer for
them too please let us know.
Karl Berry
2014-07-06 23:57:06 UTC
Permalink
Raw Message
You didn't volunteer for libiconv- and gettext-related modules
so I reverted their maintainer to 'all';

Daiki has been the maintainer of gettext (the package) for quite some
time, as you know. I asked him about libiconv (the package) in other
email; I'm guessing he'll reply after the weekend, hopefully with an
"ok" :).

However, what is the benefit to labelling individuals as maintainers of
modules in the first place, rather than just having everything be "all"?
Was it that Bruno wanted to personally approve changes to "his" modules?

I seem to remember proposing (years ago) to automatically sync gnulib
from gettext (the source repository or release, as preferred), and
Bruno did not want to do anything like that, but wanted to do everything
by hand. I never understood, though it seemed to all magically work out.

I guess we should figure out how the gnulib updates of this stuff are
going to work in the post-Bruno world.

karl
Paul Eggert
2014-07-07 01:09:21 UTC
Permalink
Raw Message
Post by Karl Berry
However, what is the benefit to labelling individuals as maintainers of
modules in the first place, rather than just having everything be "all"?
Was it that Bruno wanted to personally approve changes to "his" modules?
Yes, that was basically it. As I recall, Bruno was the one who felt
most strongly about it. It'd be fine with me if we got rid of the
Maintainer: section entirely; it'd be one less thing to maintain, at any
rate. It'd also be fine with me if we used more automation to merge
updates from upstream sources.
Eli Zaretskii
2014-07-15 15:53:39 UTC
Permalink
Raw Message
Ping! Almost 2 weeks went by with no response.

TIA
Date: Thu, 03 Jul 2014 18:23:32 +0300
I submitted these changes to the libunistring mailing list, but was
advised to send them here, as libunistring appears to be a collection
of Gnulib modules. So here I am, repeating what I sent to
libunistring list. (The patch below is relative to the latest Gnulib
master.)
Libunistring on MS-Windows is currently less useful than it could be.
In a nutshell, it supports only the default system locale; setting the
locale to anything else using 'setlocale' will cause many libunistring
functions to fail.
One reason is that libunistring needs to know the codeset of the
locale, to use it when it converts strings to and from Unicode using
iconv. However, on Windows the function locale_charset always returns
the default ANSI codepage, using the GetACP API. That API always
returns the same codepage regardless of locale changes. The result is
that when libunistring is passed a string in any encoding incompatible
with the default system codepage, any calls to libiconv will fail with
EILSEQ. And since most libunistring functions call libiconv, they all
fail.
The second problem that gets in the way is the one in gl_locale_name.
On MS-Windows, this function returns only the default locale (by
calling GetThreadLocale, which is again oblivious to any locale
changes made by 'setlocale'). So again, only the default locale can
be supported correctly. E.g., functions that need to apply
language-specific rules, like for Turkish, will work incorrectly
because the language they get from gl_locale_name is not Turkish, even
though 'setlocale' was previously called to set the Turkish locale.
To fix these problems, I propose the following changes. They have
been extensively tested using Guile's test suite, in particular the
i18n.test there.
I hope these changes will be accepted, because as I said, without them
libunistring is much less useful on MS-Windows than it could be.
Thanks.
Improve support for non-default locales on MS-Windows.
* lib/localcharset.c (locale_charset) [WINDOWS_NATIVE]: Before
falling back on the default system codepage, try extracting
the codepage from what 'setlocale' returns. This allows to
take into account changes of the codeset due to non-default
locale set by a previous call to 'setlocale'.
Define if not already defined.
(enum_locales_fn, get_lcid) [WINDOWS_NATIVE]: New functions.
(gl_locale_name_thread) [WINDOWS_NATIVE]: Produce the
current locale by calling 'setlocale', then converting the
locale name into LCID by calling 'get_lcid'. This allows to
take into account changes in the current locale from the
default one, in contrast to GetThreadLocale.
--- lib/localcharset.c~0 2014-02-09 09:41:26 +0200
+++ lib/localcharset.c 2014-07-03 18:17:44 +0300
@@ -34,6 +34,7 @@
#if defined _WIN32 || defined __WIN32__
# define WINDOWS_NATIVE
+# include <locale.h>
#endif
#if defined __EMX__
@@ -461,14 +462,34 @@ locale_charset (void)
static char buf[2 + 10 + 1];
- /* The Windows API has a function returning the locale's codepage as a
- number: GetACP().
- When the output goes to a console window, it needs to be provided in
- GetOEMCP() encoding if the console is using a raster font, or in
- GetConsoleOutputCP() encoding if it is using a TrueType font.
- But in GUI programs and for output sent to files and pipes, GetACP()
- encoding is the best bet. */
- sprintf (buf, "CP%u", GetACP ());
+ /* The Windows API has a function returning the locale's codepage as
+ a number, but the value doesn't change according to what the
+ 'setlocale' call specified. So we use it as a last resort, in
+ case the string returned by 'setlocale' doesn't specify the
+ codepage. */
+ char *current_locale = setlocale (LC_ALL, NULL);
+ char *pdot;
+
+ /* If they set different locales for different categories,
+ 'setlocale' will return a semi-colon separated list of locale
+ values. To make sure we use the correct one, we choose LC_CTYPE. */
+ if (strchr (current_locale, ';'))
+ current_locale = setlocale (LC_CTYPE, NULL);
+
+ pdot = strrchr (current_locale, '.');
+ if (pdot)
+ sprintf (buf, "CP%s", pdot + 1);
+ else
+ {
+ /* The Windows API has a function returning the locale's codepage as a
+ number: GetACP().
+ When the output goes to a console window, it needs to be provided in
+ GetOEMCP() encoding if the console is using a raster font, or in
+ GetConsoleOutputCP() encoding if it is using a TrueType font.
+ But in GUI programs and for output sent to files and pipes, GetACP()
+ encoding is the best bet. */
+ sprintf (buf, "CP%u", GetACP ());
+ }
codeset = buf;
#elif defined OS2
--- lib/localename.c~0 2014-02-09 09:41:26 +0200
+++ lib/localename.c 2014-07-03 18:11:45 +0300
@@ -60,6 +60,7 @@
#if defined WINDOWS_NATIVE || defined __CYGWIN__ /* Native Windows or Cygwin */
# define WIN32_LEAN_AND_MEAN
# include <windows.h>
+# include <winnls.h>
0x01 LANG_ARABIC
0x02 LANG_BULGARIAN
@@ -1124,6 +1125,9 @@
# ifndef LOCALE_SNAME
# define LOCALE_SNAME 0x5c
# endif
+# ifndef LOCALE_NAME_MAX_LENGTH
+# define LOCALE_NAME_MAX_LENGTH 85
+# endif
#endif
@@ -2502,7 +2506,69 @@ gl_locale_name_from_win32_LCID (LCID lci
return gl_locale_name_from_win32_LANGID (langid);
}
-#endif
+#ifdef WINDOWS_NATIVE
+
+/* Two variables to interface between get_lcid and the EnumLocales
+ callback function below. */
+static LCID found_lcid;
+static char lname[LC_MAX * (LOCALE_NAME_MAX_LENGTH + 1) + 1];
+
+/* Callback function for EnumLocales. */
+static BOOL CALLBACK
+enum_locales_fn (LPTSTR locale_num_str)
+{
+ char *endp;
+ char locval[2 * LOCALE_NAME_MAX_LENGTH + 1 + 1];
+ LCID try_lcid = strtoul (locale_num_str, &endp, 16);
+
+ if (GetLocaleInfo (try_lcid, LOCALE_SENGLANGUAGE,
+ locval, LOCALE_NAME_MAX_LENGTH))
+ {
+ strcat (locval, "_");
+ if (GetLocaleInfo (try_lcid, LOCALE_SENGCOUNTRY,
+ locval + strlen (locval), LOCALE_NAME_MAX_LENGTH))
+ {
+ size_t locval_len = strlen (locval);
+
+ if (strncmp (locval, lname, locval_len) == 0
+ && (lname[locval_len] == '.'
+ || lname[locval_len] == '\0'))
+ {
+ found_lcid = try_lcid;
+ return FALSE;
+ }
+ }
+ }
+ return TRUE;
+}
+
+/* Return the Locale ID (LCID) number given the locale's name, a
+ string, in LOCALE_NAME. This works by enumerating all the locales
+ supported by the system, until we find one whose name matches
+ LOCALE_NAME. */
+static LCID
+get_lcid (const char *locale_name)
+{
+ /* A simple cache. */
+ static LCID last_lcid;
+ static char last_locale[1000];
+
+ if (last_lcid > 0 && strcmp (locale_name, last_locale) == 0)
+ return last_lcid;
+ strncpy (lname, locale_name, sizeof (lname) - 1);
+ lname[sizeof (lname) - 1] = '\0';
+ found_lcid = 0;
+ EnumSystemLocales (enum_locales_fn, LCID_SUPPORTED);
+ if (found_lcid > 0)
+ {
+ last_lcid = found_lcid;
+ strcpy (last_locale, locale_name);
+ }
+ return found_lcid;
+}
+
+#endif /* WINDOWS_NATIVE */
+#endif /* WINDOWS_NATIVE || __CYGWIN__ */
#if HAVE_USELOCALE /* glibc or Mac OS X */
@@ -2660,6 +2726,26 @@ gl_locale_name_thread (int category, con
const char *name = gl_locale_name_thread_unsafe (category, categoryname);
if (name != NULL)
return struniq (name);
+#elif defined WINDOWS_NATIVE
+ if (LC_MIN <= category && category <= LC_MAX)
+ {
+ char *locname = setlocale (category, NULL);
+
+ /* If CATEGORY is LC_ALL, the result might be a semi-colon
+ separated list of locales. We need only one, so we take the
+ one corresponding to LC_CTYPE, as the most important for
+ character translations. */
+ if (strchr (locname, ';'))
+ locname = setlocale (LC_CTYPE, NULL);
+
+ /* Convert locale name to LCID. We don't want to use
+ LocaleNameToLCID because (a) it is only available since Vista,
+ and (b) it doesn't accept locale names returned by 'setlocale'. */
+ LCID lcid = get_lcid (locname);
+
+ if (lcid > 0)
+ return gl_locale_name_from_win32_LCID (lcid);
+ }
#endif
return NULL;
}
Paul Eggert
2014-07-15 19:20:44 UTC
Permalink
Raw Message
Thanks, I installed that, with minor changes to indentation and suchlike.
Eli Zaretskii
2014-07-15 20:06:45 UTC
Permalink
Raw Message
Date: Tue, 15 Jul 2014 12:20:44 -0700
Thanks, I installed that, with minor changes to indentation and suchlike.
Thank you.
Daiki Ueno
2014-07-16 10:02:03 UTC
Permalink
Raw Message
Post by Eli Zaretskii
Date: Tue, 15 Jul 2014 12:20:44 -0700
Thanks, I installed that, with minor changes to indentation and suchlike.
Thank you.
Sorry for not looking at it timely. Do you need a new libunistring
release with this change? I don't even know if Guile is still using the
packaged libunistring.

By the way, isn't there a minor race in gl_locale_name_thread, since it
depends on a global variable found_lcid?

Regards,
--
Daiki Ueno
Eli Zaretskii
2014-07-16 15:19:46 UTC
Permalink
Raw Message
Date: Wed, 16 Jul 2014 19:02:03 +0900
Do you need a new libunistring release with this change?
I think it would be good, yes.
I don't even know if Guile is still using the packaged libunistring.
It does, at least as of its latest official release 2.0.11.
By the way, isn't there a minor race in gl_locale_name_thread, since it
depends on a global variable found_lcid?
Yes. (It is not a problem for Guile, since Guile built with threads
on Windows is severely broken, so the only way of having a useful
Guile on Windows is to build it without threads.) Suggestions for
fixing that are welcome. Maybe just use the '__thread' qualifier for
that variable?
Daiki Ueno
2014-07-17 03:59:59 UTC
Permalink
Raw Message
Post by Eli Zaretskii
Post by Daiki Ueno
By the way, isn't there a minor race in gl_locale_name_thread, since it
depends on a global variable found_lcid?
Yes. (It is not a problem for Guile, since Guile built with threads
on Windows is severely broken, so the only way of having a useful
Guile on Windows is to build it without threads.) Suggestions for
fixing that are welcome. Maybe just use the '__thread' qualifier for
that variable?
I'm not familiar with Windows TLS support, but '__thread' seems to be
compiler specific. Gnulib has glthread/tls but I'm not sure if it is
worth adding another dependency to localename.c (which is also shared
with libintl...) for this specific issue.

Maybe just lock around the use?

Regards,
--
Daiki Ueno
Eli Zaretskii
2014-07-17 15:52:13 UTC
Permalink
Raw Message
Date: Thu, 17 Jul 2014 12:59:59 +0900
Post by Eli Zaretskii
Post by Daiki Ueno
By the way, isn't there a minor race in gl_locale_name_thread, since it
depends on a global variable found_lcid?
Yes. (It is not a problem for Guile, since Guile built with threads
on Windows is severely broken, so the only way of having a useful
Guile on Windows is to build it without threads.) Suggestions for
fixing that are welcome. Maybe just use the '__thread' qualifier for
that variable?
I'm not familiar with Windows TLS support, but '__thread' seems to be
compiler specific.
Do we care for any Windows compilers except GCC?
Gnulib has glthread/tls but I'm not sure if it is worth adding
another dependency to localename.c (which is also shared with
libintl...) for this specific issue.
It isn't worth it (and AFAICS, that module doesn't have a Windows
implementation anyway).
Maybe just lock around the use?
I don't like locking for no good reason.

I could use explicit Windows API calls (TlsGetValue etc.) instead of
GCC-specific features, if that is more acceptable.
Paul Eggert
2014-07-17 15:56:06 UTC
Permalink
Raw Message
Post by Eli Zaretskii
Do we care for any Windows compilers except GCC?
Not as far as I know, for Gnulib. (Though I'm no Windows expert.)
Bruno Haible
2016-12-02 12:45:28 UTC
Permalink
Raw Message
Post by Paul Eggert
Post by Eli Zaretskii
Do we care for any Windows compilers except GCC?
Not as far as I know, for Gnulib. (Though I'm no Windows expert.)
Well, I care about it. Just added updated build instructions for use with MSVC
to libiconv and gperf:
http://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=blob_plain;f=README.windows;hb=HEAD

Compared to MSVC 9, MSVC 14 now has <stdint.h> (finally!).

Bruno
Daiki Ueno
2014-07-17 17:23:11 UTC
Permalink
Raw Message
Post by Eli Zaretskii
Post by Daiki Ueno
I'm not familiar with Windows TLS support, but '__thread' seems to be
compiler specific.
Do we care for any Windows compilers except GCC?
I was rather worried about particular versions of GCC, which might not
support it at all (< 3.3?), or might require extra compiler options or
runtime dependency:

http://sourceforge.net/p/mingw/bugs/1333/
http://sourceforge.net/p/mingw-w64/mailman/message/31213279/

I could be wrong and perhaps it might not be a problem with the recent
MinGW releases.

Regards,
--
Daiki Ueno
Daiki Ueno
2014-07-18 07:06:11 UTC
Permalink
Raw Message
Post by Daiki Ueno
I could be wrong and perhaps it might not be a problem with the recent
MinGW releases.
FWIW I just tried myself (on wine, though) and it seems to work with:

- GCC 4.8.3 (Fedora mingw64-gcc package, cross compiling)
- GCC 4.7.2 (mingw-w32-bin_i686-mingw_20111219, self compiling)

So, I guess it is OK to use __thread, though other 3 static variables
(lname, last_lcid, and last_locale) would also need the same care.

For the archive, I'm attaching a test program:
Eli Zaretskii
2014-07-18 09:29:09 UTC
Permalink
Raw Message
Date: Fri, 18 Jul 2014 16:06:11 +0900
- GCC 4.8.3 (Fedora mingw64-gcc package, cross compiling)
- GCC 4.7.2 (mingw-w32-bin_i686-mingw_20111219, self compiling)
So, I guess it is OK to use __thread, though other 3 static variables
(lname, last_lcid, and last_locale) would also need the same care.
Thanks.

Googling suggests that:

. __thread works as expected in MinGW GCC since version 4.5.0

. DLLs that use __thread (a.k.a. "implicit TLS") cannot be safely
loaded at run time, using LoadLibrary, on Windows versions before
Vista, they can only be loaded at program startup time (IOW, the
corresponding -lLIBRARY switch should be passed on the link
command line)

. DLLs that need to be able to support LoadLibrary on Windows XP and
older need to use explicit TLS API functions, which requires a
call to TlsAlloc in the DllMain function, another complication

Given the above, is it OK to use __thread and live with the
limitations? Or maybe we should simply go back to your idea of
locking, e.g., by using EnterCriticalSection and LeaveCriticalSection?
Eli Zaretskii
2014-07-18 09:42:52 UTC
Permalink
Raw Message
Date: Fri, 18 Jul 2014 12:29:09 +0300
Or maybe we should simply go back to your idea of locking, e.g., by
using EnterCriticalSection and LeaveCriticalSection?
Actually, I see that libunistring already includes functions for
locking (which on Windows wrap the 2 APIs I mentioned above). So I
guess it's best to use them.
Daiki Ueno
2014-07-18 23:11:11 UTC
Permalink
Raw Message
Post by Eli Zaretskii
. __thread works as expected in MinGW GCC since version 4.5.0
. DLLs that use __thread (a.k.a. "implicit TLS") cannot be safely
loaded at run time, using LoadLibrary, on Windows versions before
Vista, they can only be loaded at program startup time (IOW, the
corresponding -lLIBRARY switch should be passed on the link
command line)
. DLLs that need to be able to support LoadLibrary on Windows XP and
older need to use explicit TLS API functions, which requires a
call to TlsAlloc in the DllMain function, another complication
Thanks for checking in detail. Then it would make more sense to use
locking rather than TLS.
Post by Eli Zaretskii
Actually, I see that libunistring already includes functions for
locking (which on Windows wrap the 2 APIs I mentioned above). So I
guess it's best to use them.
Yes, could you create a patch in that direction? By the way, though a
minor nitpicking, the indentation in gl_locale_name_thread still doesn't
look correct (there is a variable declaration in the middle of block).

Regards,
--
Daiki Ueno
Eli Zaretskii
2014-07-19 07:11:07 UTC
Permalink
Raw Message
Date: Sat, 19 Jul 2014 08:11:11 +0900
Post by Eli Zaretskii
. __thread works as expected in MinGW GCC since version 4.5.0
. DLLs that use __thread (a.k.a. "implicit TLS") cannot be safely
loaded at run time, using LoadLibrary, on Windows versions before
Vista, they can only be loaded at program startup time (IOW, the
corresponding -lLIBRARY switch should be passed on the link
command line)
. DLLs that need to be able to support LoadLibrary on Windows XP and
older need to use explicit TLS API functions, which requires a
call to TlsAlloc in the DllMain function, another complication
Thanks for checking in detail. Then it would make more sense to use
locking rather than TLS.
Post by Eli Zaretskii
Actually, I see that libunistring already includes functions for
locking (which on Windows wrap the 2 APIs I mentioned above). So I
guess it's best to use them.
Yes, could you create a patch in that direction?
Will do.
By the way, though a minor nitpicking, the indentation in
gl_locale_name_thread still doesn't look correct (there is a
variable declaration in the middle of block).
Oops! Here's the trivial patch to fix that:

2014-07-19 Eli Zaretskii <***@gnu.org>

localename: Enforce declarations before statements.

* localename.c (gl_locale_name_thread): Declare 'lcid' before
the first statement.

diff --git a/lib/localename.c b/lib/localename.c
index bcef8a0..0c33ddc 100644
--- a/lib/localename.c
+++ b/lib/localename.c
@@ -2730,6 +2730,7 @@ gl_locale_name_thread (int category, const char *categoryname)
if (LC_MIN <= category && category <= LC_MAX)
{
char *locname = setlocale (category, NULL);
+ LCID lcid = 0;

/* If CATEGORY is LC_ALL, the result might be a semi-colon
separated list of locales. We need only one, so we take the
@@ -2741,7 +2742,7 @@ gl_locale_name_thread (int category, const char *categoryname)
/* Convert locale name to LCID. We don't want to use
LocaleNameToLCID because (a) it is only available since Vista,
and (b) it doesn't accept locale names returned by 'setlocale'. */
- LCID lcid = get_lcid (locname);
+ lcid = get_lcid (locname);

if (lcid > 0)
return gl_locale_name_from_win32_LCID (lcid);
Daiki Ueno
2014-08-07 00:22:32 UTC
Permalink
Raw Message
Post by Eli Zaretskii
Post by Daiki Ueno
Post by Eli Zaretskii
Actually, I see that libunistring already includes functions for
locking (which on Windows wrap the 2 APIs I mentioned above). So I
guess it's best to use them.
Yes, could you create a patch in that direction?
Will do.
Here is a simple patch for this. Is it OK for you?
Eli Zaretskii
2014-08-07 15:51:57 UTC
Permalink
Raw Message
Date: Thu, 07 Aug 2014 09:22:32 +0900
Post by Eli Zaretskii
Post by Daiki Ueno
Post by Eli Zaretskii
Actually, I see that libunistring already includes functions for
locking (which on Windows wrap the 2 APIs I mentioned above). So I
guess it's best to use them.
Yes, could you create a patch in that direction?
Will do.
Here is a simple patch for this. Is it OK for you?
Yes, it does. Thanks!

(And sorry for not getting to that myself: life intervened big time.)
Bruno Haible
2016-12-02 12:53:30 UTC
Permalink
Raw Message
Post by Eli Zaretskii
One reason is that libunistring needs to know the codeset of the
locale, to use it when it converts strings to and from Unicode using
iconv. However, on Windows the function locale_charset always returns
the default ANSI codepage, using the GetACP API. That API always
returns the same codepage regardless of locale changes. The result is
that when libunistring is passed a string in any encoding incompatible
with the default system codepage, any calls to libiconv will fail with
EILSEQ. And since most libunistring functions call libiconv, they all
fail.
...
To fix these problems, I propose the following changes. They have
been extensively tested using Guile's test suite, in particular the
i18n.test there.
Thank you very much for these well-tested changes, and explanations!

I see the theoretical possibility of a buffer overrun, so I'll sleep
better with this change:


2016-12-02 Bruno Haible <***@clisp.org>

localcharset: Avoid theoretical buffer overrun.
* lib/localcharset.c (locale_charset) [WINDOWS_NATIVE]: Don't use the
return value from setlocale if it would lead to a buffer overrun.

diff --git a/lib/localcharset.c b/lib/localcharset.c
index ff192f8..5060aaa 100644
--- a/lib/localcharset.c
+++ b/lib/localcharset.c
@@ -507,7 +507,7 @@ locale_charset (void)
current_locale = setlocale (LC_CTYPE, NULL);

pdot = strrchr (current_locale, '.');
- if (pdot)
+ if (pdot && 2 + strlen (pdot + 1) + 1 <= sizeof (buf))
sprintf (buf, "CP%s", pdot + 1);
else
{

Loading...