Discussion:
Overriding the MS-Windows wint_t type considered harmful
(too old to reply)
Eli Zaretskii
2017-04-30 15:20:57 UTC
Permalink
Raw Message
Two Gnulib header files, wctype.h and wchar.h, override the wint_t
data type as defined in the MS-Windows header files, like this:

/* mingw and MSVC define wint_t as 'unsigned short' in <crtdefs.h> or
<stddef.h>. This is too small: ISO C 99 section 7.24.1.(2) says that
wint_t must be "unchanged by default argument promotions". Override it. */
# if @GNULIB_OVERRIDES_WINT_T@
# if !GNULIB_defined_wint_t
# if @HAVE_CRTDEFS_H@
# include <crtdefs.h>
# else
# include <stddef.h>
# endif
typedef unsigned int rpl_wint_t;
# undef wint_t
# define wint_t rpl_wint_t
# define GNULIB_defined_wint_t 1
# endif
# endif

IME, this causes more trouble than it solves. While building a
pretest of Texinfo 4 with mingw.org's MinGW, it has bitten me twice.

One scenario where this gets in the way is when the application
includes some other header, e.g. ctype.h, which declares the same isw*
functions as wchar.h and wctype.h -- if the inclusion of ctype.h is
_after_ wchar.h/wctype.h are included, this produces compilation
errors, because the prototypes of the isw* functions were first seen
with the original wint_t type, and now are seen with the wider wint_t.

Another scenario is when wint_t variables are passed to functions that
expect pointers to WCHAR data type used by the lower-level Win32 APIs
(LPCWSTR etc.) -- with the overridden type, this causes compilation
warnings or errors, and requires casts or intermediate variables.

More generally, I don't see how this overriding of a native type could
possibly be a good idea, when Gnulib doesn't replace library functions
which use this data type.

So I propose to stop overriding wint_t in native Windows builds, as it
generally means trouble. I understand the rationale as expressed in
the comment above, but I think that in this case the cure is worse
than the decease.

Thanks.
Bruno Haible
2017-04-30 15:27:50 UTC
Permalink
Raw Message
Hi Eli,
Post by Eli Zaretskii
One scenario where this gets in the way is when the application
includes some other header, e.g. ctype.h, which declares the same isw*
functions as wchar.h and wctype.h -- if the inclusion of ctype.h is
_after_ wchar.h/wctype.h are included, this produces compilation
errors, because the prototypes of the isw* functions were first seen
with the original wint_t type, and now are seen with the wider wint_t.
We know how to fix such situations in gnulib. Thanks for the report.
Post by Eli Zaretskii
Another scenario is when wint_t variables are passed to functions that
expect pointers to WCHAR data type used by the lower-level Win32 APIs
(LPCWSTR etc.) -- with the overridden type, this causes compilation
warnings or errors, and requires casts or intermediate variables.
Can you elaborate on these, please? WCHAR = wchar_t != win_t.

Bruno
Eli Zaretskii
2017-04-30 18:36:02 UTC
Permalink
Raw Message
Date: Sun, 30 Apr 2017 17:27:50 +0200
Post by Eli Zaretskii
One scenario where this gets in the way is when the application
includes some other header, e.g. ctype.h, which declares the same isw*
functions as wchar.h and wctype.h -- if the inclusion of ctype.h is
_after_ wchar.h/wctype.h are included, this produces compilation
errors, because the prototypes of the isw* functions were first seen
with the original wint_t type, and now are seen with the wider wint_t.
We know how to fix such situations in gnulib. Thanks for the report.
I'm asking why get into such situations in the first place. What do
we gain?

As for fixing, the fixes are usually fragile and tend to break
whenever the system headers are rearranged, for whatever reasons.
Problems that stem from the order of including header files are always
like that, and IME it's best to avoid them in the first place.
Post by Eli Zaretskii
Another scenario is when wint_t variables are passed to functions that
expect pointers to WCHAR data type used by the lower-level Win32 APIs
(LPCWSTR etc.) -- with the overridden type, this causes compilation
warnings or errors, and requires casts or intermediate variables.
Can you elaborate on these, please? WCHAR = wchar_t != win_t.
Not sure what you want me to elaborate on. With the native data
types, I can freely assign wint_t to wchar_t and vice versa; with the
overridden Gnulib type, I cannot.
Bruno Haible
2017-05-01 00:50:25 UTC
Permalink
Raw Message
Post by Eli Zaretskii
I'm asking why get into such situations in the first place. What do
we gain?
In general, by using gnulib, you gain higher compliance to the POSIX
and C standard. In other words, code that was written with these standards
in mind has higher chances of working correctly with gnulib than without.

In the particular case of 'wint_t': The problem is no such much that
wint_t = 'unsigned short'
is not unchanged by argument promotions, but rather that this type
is not larger than 'wchar_t'. This piece of code

wint_t c = getwchar();
if (c == WEOF) goto eof_handling;

is only correct when wint_t is larger than wchar_t. The original
definition of WEOF is (wint_t)0xFFFF. This is an unassigned Unicode
character, but it can occur in files. You must not misinterpret it
as being the end of the file or stream.

It's like 20 years ago, when people were writing

char c = getchar();
if (c == EOF) goto eof_handling;

and were wondering why this was buggy.
Post by Eli Zaretskii
As for fixing, the fixes are usually fragile and tend to break
whenever the system headers are rearranged, for whatever reasons.
This is manageable.
Post by Eli Zaretskii
Post by Bruno Haible
Post by Eli Zaretskii
Another scenario is when wint_t variables are passed to functions that
expect pointers to WCHAR data type used by the lower-level Win32 APIs
(LPCWSTR etc.) -- with the overridden type, this causes compilation
warnings or errors, and requires casts or intermediate variables.
Can you elaborate on these, please? WCHAR = wchar_t != win_t.
Not sure what you want me to elaborate on. With the native data
types, I can freely assign wint_t to wchar_t and vice versa; with the
overridden Gnulib type, I cannot.
You CAN freely assign wint_t to wchar_t and vice versa: they are integer
types. What you cannot assign freely is a wint_t* to a wchar_t* or vice
versa. Which means that you discovered a bug in the source code of your
package. There is NO STANDARD that would state that wint_t and wchar_t
are of the same size.

""I have to port code that makes invalid assumptions. Gnulib detects that
this code makes invalid assumptions. Let me complain about Gnulib.""

Bruno
Eli Zaretskii
2017-05-01 07:06:54 UTC
Permalink
Raw Message
Date: Mon, 01 May 2017 02:50:25 +0200
Post by Eli Zaretskii
I'm asking why get into such situations in the first place. What do
we gain?
In general, by using gnulib, you gain higher compliance to the POSIX
and C standard. In other words, code that was written with these standards
in mind has higher chances of working correctly with gnulib than without.
In the particular case of 'wint_t': The problem is no such much that
wint_t = 'unsigned short'
is not unchanged by argument promotions, but rather that this type
is not larger than 'wchar_t'. This piece of code
wint_t c = getwchar();
if (c == WEOF) goto eof_handling;
is only correct when wint_t is larger than wchar_t. The original
definition of WEOF is (wint_t)0xFFFF. This is an unassigned Unicode
character, but it can occur in files. You must not misinterpret it
as being the end of the file or stream.
Is this really a more important use case than the ones I had to deal
with during the last week? (See bug-texinfo discussions for the
details.)

Let me turn the table, and ask you why didn't you provide a valid
inode values in your latest rework of fstat and stat for MS-Windows?
The comment was that the ino_t type is not wide enough to support
that. IOW, providing a useful inode (and device) would have meant
overriding the native definition of 'struct stat', and you've decided
not to do that. I understand the dilemma, and therefore have
sympathies to your decision, even though I myself took the other
alternatives a few times (see Emacs, for example).

But if such considerations are valid in the case of stat/fstat, where
we know that gobs of GNU software use the device and inode for
implementing important features and optimizations, then how come the
much less important use cases with semi-buggy WEOF processing are not
handled with the same caution? Why override the native definition of
a type to cater to such unimportant use case, and risk breaking
applications which assume wint_t is what it is on Windows? The
difference between these two decisions makes very little sense to me.
Post by Eli Zaretskii
As for fixing, the fixes are usually fragile and tend to break
whenever the system headers are rearranged, for whatever reasons.
This is manageable.
I propose to remove the reason that would require such management. If
possible, that's a better engineering solution, won't you agree?
You CAN freely assign wint_t to wchar_t and vice versa: they are integer
types. What you cannot assign freely is a wint_t* to a wchar_t* or vice
versa. Which means that you discovered a bug in the source code of your
package. There is NO STANDARD that would state that wint_t and wchar_t
are of the same size.
This is C we are talking about. Code that uses data types of the same
size interchangeably is all over the place. And coding for Windows,
trying to support applications that assume Posix, means standards mean
very little. Doing that is hard as it is, as you well know, so it's
best not to make any harder than it has to be.
""I have to port code that makes invalid assumptions. Gnulib detects that
this code makes invalid assumptions. Let me complain about Gnulib.""
Thank you for mocking me. I will in the future think twice before
posting to this list.
Bruno Haible
2017-05-01 09:40:14 UTC
Permalink
Raw Message
Hi Eli,
Post by Eli Zaretskii
Post by Bruno Haible
wint_t c = getwchar();
if (c == WEOF) goto eof_handling;
Is this really a more important use case than the ones I had to deal
with during the last week? (See bug-texinfo discussions for the
details.)
There is no "more important use case". Scanning
https://lists.gnu.org/archive/html/bug-texinfo/2017-04/
http://svn.savannah.gnu.org/viewvc/*checkout*/texinfo/trunk/ChangeLog
http://svn.savannah.gnu.org/viewvc/texinfo/trunk/tp/Texinfo/MiscXS/misc.c?r1=7762&r2=7765
http://svn.savannah.gnu.org/viewvc/texinfo/trunk/tp/Texinfo/Convert/XSParagraph/xspara.c?r1=7651&r2=7765
I can only see 3 issues, 2 of them we already discussed:

1) The <wchar.h> problem, which you already reported in
https://lists.gnu.org/archive/html/bug-gnulib/2017-04/msg00166.html
- thank you for this report.

2) The previous code was defining iswspace and iswupper functions.
Defining symbols with the same name as system function (at the linker
level) is dangerous practice on all non-ELF systems (Mac OS X,
AIX, Cygwin, Windows) anyway. And yes, this problem got worse
through Gnulib's type redefinition.

3) The previous code assumed that a wint_t* is the same as a WCHAR*.
Not guaranteed by any standard nor Microsoft documentation.
Post by Eli Zaretskii
Let me turn the table, and ask you why didn't you provide a valid
inode values in your latest rework of fstat and stat for MS-Windows?
The comment was that the ino_t type is not wide enough to support
that. IOW, providing a useful inode (and device) would have meant
overriding the native definition of 'struct stat'
Yes, I agree with that, and it is in fact part of the next steps, that
I mentioned yesterday in
https://lists.gnu.org/archive/html/bug-gnulib/2017-04/msg00159.html

But since overriding 'struct stat' can cause problems (e.g. some code
may assume that stat is _stati64) and not every program needs this,
I plan to make provide this support through a transversal module
https://lists.gnu.org/archive/html/bug-gnulib/2017-04/msg00117.html -
so that packages can choose whether they want it or not.
Post by Eli Zaretskii
we know that gobs of GNU software use the device and inode for
implementing important features and optimizations, then how come the
much less important use cases with semi-buggy WEOF processing are not
handled with the same caution?
It happens that simpler features often get implement earlier than more
complex ones. Overriding wint_t is simpler than writing new stat, fstat
implementations.

Bruno
Eli Zaretskii
2017-05-01 11:01:16 UTC
Permalink
Raw Message
Date: Mon, 01 May 2017 11:40:14 +0200
1) The <wchar.h> problem, which you already reported in
https://lists.gnu.org/archive/html/bug-gnulib/2017-04/msg00166.html
- thank you for this report.
2) The previous code was defining iswspace and iswupper functions.
Defining symbols with the same name as system function (at the linker
level) is dangerous practice on all non-ELF systems (Mac OS X,
AIX, Cygwin, Windows) anyway. And yes, this problem got worse
through Gnulib's type redefinition.
3) The previous code assumed that a wint_t* is the same as a WCHAR*.
Not guaranteed by any standard nor Microsoft documentation.
You forget the issue with perl.h including ctype.h after wint_t was
already redefined. This was reported here:

http://lists.gnu.org/archive/html/bug-texinfo/2017-04/msg00055.html

Anyway, this means in 2 out of 3 issues you agree that the
redefinition causes trouble. And the only argument for the
redefinition seems to be compliance to ISO C99, which at this point
sounds like a purely theoretical issue.

So I still think this redefinition does more harm than help, and
therefore Gnulib should not do that.

Thanks.
Bruno Haible
2017-05-01 10:27:57 UTC
Permalink
Raw Message
Post by Bruno Haible
In the particular case of 'wint_t': The problem is no such much that
wint_t = 'unsigned short'
is not unchanged by argument promotions, but rather that this type
is not larger than 'wchar_t'. This piece of code
wint_t c = getwchar();
if (c == WEOF) goto eof_handling;
is only correct when wint_t is larger than wchar_t. The original
definition of WEOF is (wint_t)0xFFFF.
Oops, gnulib actually redefines wint_t but not WEOF. I was too shy to
redefine WEOF.

So, the only reason why gnulib redefines wint_t is standards compliance
(ISO C 99 section 7.24.1.(2), to be precise).

Bruno
Bruno Haible
2017-05-01 10:59:04 UTC
Permalink
Raw Message
Post by Eli Zaretskii
One scenario where this gets in the way is when the application
includes some other header, e.g. ctype.h, which declares the same isw*
functions as wchar.h and wctype.h -- if the inclusion of ctype.h is
_after_ wchar.h/wctype.h are included, this produces compilation
errors, because the prototypes of the isw* functions were first seen
with the original wint_t type, and now are seen with the wider wint_t.
Thanks for the report. The case of <ctype.h> after <wctype.h> was already
handled for mingw. Here's a patch to cover the case <wchar.h> after <wctype.h>
as well.


2017-05-01 Bruno Haible <***@clisp.org>

wctype: Fix problems if <wchar.h> gets included after <wctype.h>.
* lib/wctype.in.h: Include not only <ctype.h> but also <wchar.h>. Do so
also on MSVC.
Reported by Eli Zaretskii <***@gnu.org>.

diff --git a/lib/wctype.in.h b/lib/wctype.in.h
index 933bf78..759659b 100644
--- a/lib/wctype.in.h
+++ b/lib/wctype.in.h
@@ -56,11 +56,13 @@
# include <wchar.h>
#endif

-/* mingw has declarations of towupper and towlower in <ctype.h> as
- well <wctype.h>. Include <ctype.h> in advance to avoid rpl_ prefix
- being added to the declarations. */
-#ifdef __MINGW32__
+/* Native Windows (mingw, MSVC) have declarations of towupper, towlower, and
+ isw* functions in <ctype.h>, <wchar.h> as well as in <wctype.h>. Include
+ <ctype.h>, <wchar.h> in advance to avoid rpl_ prefix being added to the
+ declarations. */
+#if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__
# include <ctype.h>
+# include <wchar.h>
#endif

/* Include the original <wctype.h> if it exists.
Eli Zaretskii
2017-05-01 11:29:20 UTC
Permalink
Raw Message
Date: Mon, 01 May 2017 12:59:04 +0200
Post by Eli Zaretskii
One scenario where this gets in the way is when the application
includes some other header, e.g. ctype.h, which declares the same isw*
functions as wchar.h and wctype.h -- if the inclusion of ctype.h is
_after_ wchar.h/wctype.h are included, this produces compilation
errors, because the prototypes of the isw* functions were first seen
with the original wint_t type, and now are seen with the wider wint_t.
Thanks for the report. The case of <ctype.h> after <wctype.h> was already
handled for mingw. Here's a patch to cover the case <wchar.h> after <wctype.h>
as well.
Thanks.

Loading...