Discussion:
Changes in nl_langinfo() and strftime() API in glibc
Rafal Luzynski
2018-01-22 23:41:27 UTC
Permalink
Hello,

I'd like to notify you that today some changes have been introduced
to nl_langinfo() and strftime() families, including strptime() as well.
They should also be ported to the implementations in Gnulib. This is
not only to make the changes available for other systems but also to
port the changes to fprintftime() function which exists only in Gnulib
and which is used by date(1) command line utility in Linux as well as
in few more utilities. I am the author of the changes so I can assist
you with porting them to Gnulib.

I am not subscribed to this list, please reply both to this list and
to my email address.

I'm sorry if this list is not the correct way to notify you, I haven't
found anything like a bugzilla nor the source code repository supporting
pull requests.

Regards,

Rafal Luzynski


Links:

https://sourceware.org/bugzilla/show_bug.cgi?id=10871
https://sourceware.org/git/?p=glibc.git;a=blob;f=NEWS;hb=HEAD
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=95cb863
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=761a585
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=2239076
Paul Eggert
2018-01-23 08:47:33 UTC
Permalink
Thanks, I merged the glibc changes into gnulib/lib/nstrftime.c by installing the
attached into gnulib master. I don't know the nl_langinfo code as well, though,
and so would appreciate advice as to how to merge that change in.

PS. This doesn't matter for Gnulib, but why define ABALTMON_1 only for the
!COMPILE_WIDE case? Not that anyone ever uses the wide code....
Rafal Luzynski
2018-01-23 13:21:41 UTC
Permalink
Post by Paul Eggert
Thanks, I merged the glibc changes into gnulib/lib/nstrftime.c by installing the
attached into gnulib master. I don't know the nl_langinfo code as well, though,
and so would appreciate advice as to how to merge that change in.
[...]
Thank you, Paul. Your patch looks correct at first sight. Unfortunately,
I'm afraid it will not even compile without the changes in nl_langinfo()
because ALTMON_1 and _NL_ABALTMON_1 symbols are undefined.

Also I'm not sure if gnulib provides its own implementation of strptime(),
it may need an update as well.

I will provide more details later in the evening, CET timezone.

Regards,

Rafal
Bruno Haible
2018-01-24 09:03:00 UTC
Permalink
Post by Rafal Luzynski
Unfortunately,
I'm afraid it will not even compile without the changes in nl_langinfo()
because ALTMON_1 and _NL_ABALTMON_1 symbols are undefined.
No, Paul's patch was correct. Paul would never push a commit that does not
compile.
Post by Rafal Luzynski
I don't know the nl_langinfo code as well, though,
Done as follows:


2018-01-24 Bruno Haible <***@clisp.org>

langinfo, nl_langinfo: Add support for alternative month names.
* m4/langinfo_h.m4 (gl_LANGINFO_H): Define HAVE_LANGINFO_ALTMON.
* lib/langinfo.in.h (ALTMON_1...ALTMON_12): New macros.
* lib/nl_langinfo.c (rpl_nl_langinfo): Treat ALTMON_i like MON_i.
* tests/test-nl_langinfo.c (main): Test ALTMON_*.
* doc/posix-headers/langinfo.texi: Document support of ALTMON_*.
* doc/posix-functions/nl_langinfo.texi: Likewise.

diff --git a/doc/posix-functions/nl_langinfo.texi b/doc/posix-functions/nl_langinfo.texi
index cd9e523..529b0e9 100644
--- a/doc/posix-functions/nl_langinfo.texi
+++ b/doc/posix-functions/nl_langinfo.texi
@@ -15,6 +15,10 @@ Minix 3.1.8, mingw, MSVC 14, BeOS.
The constant @code{CODESET} is not supported on some platforms:
glibc 2.0.6, OpenBSD 3.8.
@item
+The constants @code{ALTMON_1} to @code{ALTMON_12} are not defined on some
+platforms:
+glibc 2.26 and many others.
+@item
The constants @code{ERA}, @code{ERA_D_FMT}, @code{ERA_D_T_FMT},
@code{ERA_T_FMT}, @code{ALT_DIGITS} are not supported on some platforms:
OpenBSD 3.8.
diff --git a/doc/posix-headers/langinfo.texi b/doc/posix-headers/langinfo.texi
index a4d1516..30ae66c 100644
--- a/doc/posix-headers/langinfo.texi
+++ b/doc/posix-headers/langinfo.texi
@@ -14,6 +14,10 @@ Minix 3.1.8, mingw, MSVC 14, BeOS.
The constant @code{CODESET} is not defined on some platforms:
glibc 2.0.6, OpenBSD 3.8.
@item
+The constants @code{ALTMON_1} to @code{ALTMON_12} are not defined on some
+platforms:
+glibc 2.26 and many others.
+@item
The constants @code{ERA}, @code{ERA_D_FMT}, @code{ERA_D_T_FMT},
@code{ERA_T_FMT}, @code{ALT_DIGITS} are not defined on some platforms:
OpenBSD 3.8.
diff --git a/lib/langinfo.in.h b/lib/langinfo.in.h
index e51bb57..31ac575 100644
--- a/lib/langinfo.in.h
+++ b/lib/langinfo.in.h
@@ -86,6 +86,18 @@ typedef int nl_item;
# define MON_10 (MON_1 + 9)
# define MON_11 (MON_1 + 10)
# define MON_12 (MON_1 + 11)
+# define ALTMON_1 10200
+# define ALTMON_2 (ALTMON_1 + 1)
+# define ALTMON_3 (ALTMON_1 + 2)
+# define ALTMON_4 (ALTMON_1 + 3)
+# define ALTMON_5 (ALTMON_1 + 4)
+# define ALTMON_6 (ALTMON_1 + 5)
+# define ALTMON_7 (ALTMON_1 + 6)
+# define ALTMON_8 (ALTMON_1 + 7)
+# define ALTMON_9 (ALTMON_1 + 8)
+# define ALTMON_10 (ALTMON_1 + 9)
+# define ALTMON_11 (ALTMON_1 + 10)
+# define ALTMON_12 (ALTMON_1 + 11)
# define ABMON_1 10035
# define ABMON_2 (ABMON_1 + 1)
# define ABMON_3 (ABMON_1 + 2)
@@ -138,6 +150,22 @@ typedef int nl_item;
# define GNULIB_defined_T_FMT_AMPM 1
# endif

+# if !@HAVE_LANGINFO_ALTMON@
+# define ALTMON_1 10200
+# define ALTMON_2 (ALTMON_1 + 1)
+# define ALTMON_3 (ALTMON_1 + 2)
+# define ALTMON_4 (ALTMON_1 + 3)
+# define ALTMON_5 (ALTMON_1 + 4)
+# define ALTMON_6 (ALTMON_1 + 5)
+# define ALTMON_7 (ALTMON_1 + 6)
+# define ALTMON_8 (ALTMON_1 + 7)
+# define ALTMON_9 (ALTMON_1 + 8)
+# define ALTMON_10 (ALTMON_1 + 9)
+# define ALTMON_11 (ALTMON_1 + 10)
+# define ALTMON_12 (ALTMON_1 + 11)
+# define GNULIB_defined_ALTMON 1
+# endif
+
# if !@HAVE_LANGINFO_ERA@
# define ERA 10047
# define ERA_D_FMT 10048
diff --git a/lib/nl_langinfo.c b/lib/nl_langinfo.c
index 725ccf6..b93f7be 100644
--- a/lib/nl_langinfo.c
+++ b/lib/nl_langinfo.c
@@ -100,6 +100,24 @@ rpl_nl_langinfo (nl_item item)
case T_FMT_AMPM:
return (char *) "%I:%M:%S %p";
# endif
+# if GNULIB_defined_ALTMON
+ case ALTMON_1:
+ case ALTMON_2:
+ case ALTMON_3:
+ case ALTMON_4:
+ case ALTMON_5:
+ case ALTMON_6:
+ case ALTMON_7:
+ case ALTMON_8:
+ case ALTMON_9:
+ case ALTMON_10:
+ case ALTMON_11:
+ case ALTMON_12:
+ /* We don't ship the appropriate localizations with gnulib. Therefore,
+ treat ALTMON_i like MON_i. */
+ item = item - ALTMON_1 + MON_1;
+ break;
+# endif
# if GNULIB_defined_ERA
case ERA:
/* The format is not standardized. In glibc it is a sequence of strings
@@ -228,28 +246,49 @@ nl_langinfo (nl_item item)
return (char *) abdays[item - ABDAY_1];
return nlbuf;
}
- case MON_1:
- case MON_2:
- case MON_3:
- case MON_4:
- case MON_5:
- case MON_6:
- case MON_7:
- case MON_8:
- case MON_9:
- case MON_10:
- case MON_11:
- case MON_12:
- {
- static char const months[][sizeof "September"] = {
- "January", "February", "March", "April", "May", "June", "July",
- "September", "October", "November", "December"
- };
+ {
+ static char const months[][sizeof "September"] = {
+ "January", "February", "March", "April", "May", "June", "July",
+ "September", "October", "November", "December"
+ };
+ case MON_1:
+ case MON_2:
+ case MON_3:
+ case MON_4:
+ case MON_5:
+ case MON_6:
+ case MON_7:
+ case MON_8:
+ case MON_9:
+ case MON_10:
+ case MON_11:
+ case MON_12:
tmm.tm_mon = item - MON_1;
if (!strftime (nlbuf, sizeof nlbuf, "%B", &tmm))
return (char *) months[item - MON_1];
return nlbuf;
- }
+ case ALTMON_1:
+ case ALTMON_2:
+ case ALTMON_3:
+ case ALTMON_4:
+ case ALTMON_5:
+ case ALTMON_6:
+ case ALTMON_7:
+ case ALTMON_8:
+ case ALTMON_9:
+ case ALTMON_10:
+ case ALTMON_11:
+ case ALTMON_12:
+ tmm.tm_mon = item - ALTMON_1;
+ /* The platforms without nl_langinfo() don't support strftime with %OB.
+ We don't even need to try. */
+ #if 0
+ if (!strftime (nlbuf, sizeof nlbuf, "%OB", &tmm))
+ #endif
+ if (!strftime (nlbuf, sizeof nlbuf, "%B", &tmm))
+ return (char *) months[item - ALTMON_1];
+ return nlbuf;
+ }
case ABMON_1:
case ABMON_2:
case ABMON_3:
diff --git a/m4/langinfo_h.m4 b/m4/langinfo_h.m4
index 9ae375c..de077c3 100644
--- a/m4/langinfo_h.m4
+++ b/m4/langinfo_h.m4
@@ -1,4 +1,4 @@
-# langinfo_h.m4 serial 7
+# langinfo_h.m4 serial 8
dnl Copyright (C) 2009-2018 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
@@ -17,6 +17,7 @@ AC_DEFUN([gl_LANGINFO_H],
dnl Determine whether <langinfo.h> exists. It is missing on mingw and BeOS.
HAVE_LANGINFO_CODESET=0
HAVE_LANGINFO_T_FMT_AMPM=0
+ HAVE_LANGINFO_ALTMON=0
HAVE_LANGINFO_ERA=0
HAVE_LANGINFO_YESEXPR=0
AC_CHECK_HEADERS_ONCE([langinfo.h])
@@ -24,6 +25,7 @@ AC_DEFUN([gl_LANGINFO_H],
HAVE_LANGINFO_H=1
dnl Determine what <langinfo.h> defines. CODESET and ERA etc. are missing
dnl on OpenBSD 3.8. T_FMT_AMPM and YESEXPR, NOEXPR are missing on IRIX 5.3.
+ dnl ALTMON_* are missing on glibc 2.26 and many other systems.
AC_CACHE_CHECK([whether langinfo.h defines CODESET],
[gl_cv_header_langinfo_codeset],
[AC_COMPILE_IFELSE(
@@ -48,6 +50,18 @@ int a = T_FMT_AMPM;
if test $gl_cv_header_langinfo_t_fmt_ampm = yes; then
HAVE_LANGINFO_T_FMT_AMPM=1
fi
+ AC_CACHE_CHECK([whether langinfo.h defines ALTMON_1],
+ [gl_cv_header_langinfo_altmon],
+ [AC_COMPILE_IFELSE(
+ [AC_LANG_PROGRAM([[#include <langinfo.h>
+int a = ALTMON_1;
+]])],
+ [gl_cv_header_langinfo_altmon=yes],
+ [gl_cv_header_langinfo_altmon=no])
+ ])
+ if test $gl_cv_header_langinfo_altmon = yes; then
+ HAVE_LANGINFO_ALTMON=1
+ fi
AC_CACHE_CHECK([whether langinfo.h defines ERA],
[gl_cv_header_langinfo_era],
[AC_COMPILE_IFELSE(
@@ -78,6 +92,7 @@ int a = YESEXPR;
AC_SUBST([HAVE_LANGINFO_H])
AC_SUBST([HAVE_LANGINFO_CODESET])
AC_SUBST([HAVE_LANGINFO_T_FMT_AMPM])
+ AC_SUBST([HAVE_LANGINFO_ALTMON])
AC_SUBST([HAVE_LANGINFO_ERA])
AC_SUBST([HAVE_LANGINFO_YESEXPR])

diff --git a/tests/test-nl_langinfo.c b/tests/test-nl_langinfo.c
index 8ad9df1..a13fb71 100644
--- a/tests/test-nl_langinfo.c
+++ b/tests/test-nl_langinfo.c
@@ -92,6 +92,32 @@ main (int argc, char *argv[])
ASSERT (strlen (nl_langinfo (MON_10)) > 0);
ASSERT (strlen (nl_langinfo (MON_11)) > 0);
ASSERT (strlen (nl_langinfo (MON_12)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_1)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_2)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_3)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_4)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_5)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_6)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_7)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_8)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_9)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_10)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_11)) > 0);
+ ASSERT (strlen (nl_langinfo (ALTMON_12)) > 0);
+ /* In the tested locales, alternate month names and month names ought to be
+ the same. */
+ ASSERT (strcmp (nl_langinfo (ALTMON_1), nl_langinfo (MON_1)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_2), nl_langinfo (MON_2)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_3), nl_langinfo (MON_3)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_4), nl_langinfo (MON_4)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_5), nl_langinfo (MON_5)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_6), nl_langinfo (MON_6)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_7), nl_langinfo (MON_7)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_8), nl_langinfo (MON_8)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_9), nl_langinfo (MON_9)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_10), nl_langinfo (MON_10)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_11), nl_langinfo (MON_11)) == 0);
+ ASSERT (strcmp (nl_langinfo (ALTMON_12), nl_langinfo (MON_12)) == 0);
ASSERT (strlen (nl_langinfo (ABMON_1)) > 0);
ASSERT (strlen (nl_langinfo (ABMON_2)) > 0);
ASSERT (strlen (nl_langinfo (ABMON_3)) > 0);
Rafal Luzynski
2018-01-25 01:39:59 UTC
Permalink
Post by Bruno Haible
Post by Rafal Luzynski
Unfortunately,
I'm afraid it will not even compile without the changes in nl_langinfo()
because ALTMON_1 and _NL_ABALTMON_1 symbols are undefined.
No, Paul's patch was correct. Paul would never push a commit that does not
compile.
I didn't mean it does not compile, I just wasn't sure because I did not
test this update.
Post by Bruno Haible
Post by Rafal Luzynski
I don't know the nl_langinfo code as well, though,
Thank you, Bruno. I'm sorry I don't have time to compile and test
it now. But just at first sight: I can't see _NL_ABALTMON_* here.
Isn't it necessary?
Post by Bruno Haible
[...]
diff --git a/doc/posix-functions/nl_langinfo.texi
b/doc/posix-functions/nl_langinfo.texi
index cd9e523..529b0e9 100644
--- a/doc/posix-functions/nl_langinfo.texi
+++ b/doc/posix-functions/nl_langinfo.texi
@@ -15,6 +15,10 @@ Minix 3.1.8, mingw, MSVC 14, BeOS.
glibc 2.0.6, OpenBSD 3.8.
@item
+glibc 2.26 and many others.
To be more precise: we are adding ALTMON_1 since glibc 2.27 (as GNU
extension) so it is not defined on glibc 2.26 and older.
(This is to avoid confusion: glibc 2.26 and newer? just glibc 2.26?)
Post by Bruno Haible
[...]
diff --git a/lib/nl_langinfo.c b/lib/nl_langinfo.c
index 725ccf6..b93f7be 100644
--- a/lib/nl_langinfo.c
+++ b/lib/nl_langinfo.c
[...]
@@ -228,28 +246,49 @@ nl_langinfo (nl_item item)
return (char *) abdays[item - ABDAY_1];
return nlbuf;
}
- {
- static char const months[][sizeof "September"] = {
- "January", "February", "March", "April", "May", "June", "July",
- "September", "October", "November", "December"
- };
+ {
+ static char const months[][sizeof "September"] = {
+ "January", "February", "March", "April", "May", "June", "July",
+ "September", "October", "November", "December"
+ };
tmm.tm_mon = item - MON_1;
if (!strftime (nlbuf, sizeof nlbuf, "%B", &tmm))
return (char *) months[item - MON_1];
return nlbuf;
This looks correct: if nl_langinfo(MON_*) is not supported then you
try to retrieve the month name with stftime("%B"). Although I don't
understand why "case MON_*" has been removed and added. Formatting
changes maybe?
Post by Bruno Haible
- }
+ tmm.tm_mon = item - ALTMON_1;
+ /* The platforms without nl_langinfo() don't support strftime with %OB.
+ We don't even need to try. */
+ #if 0
+ if (!strftime (nlbuf, sizeof nlbuf, "%OB", &tmm))
+ #endif
Not really, I think that this removed implementation would be useful
sometimes. As far as I know OS X does support strftime("%OB") and
does support nl_langinfo() but does not support ALTMON_* series.
Post by Bruno Haible
+ if (!strftime (nlbuf, sizeof nlbuf, "%B", &tmm))
+ return (char *) months[item - ALTMON_1];
+ return nlbuf;
+ }
Otherwise OK: if neither ALTMON_* nor strftime("%OB") is available
on a particular platform then falling back to MON_* series is correct.

I'm sorry for this short review. I'm afraid I will not have time
to review more thoroughly this week.

Thank you for your support.

Regards,

Rafal
Bruno Haible
2018-01-25 08:04:42 UTC
Permalink
Hi Rafal,
I can't see _NL_ABALTMON_* here. Isn't it necessary?
Given the name of these constants (they start with an underscore),
it appears that they are not proposed for standardization. So,
as far as I understand, the primary way to use the abbreviated
alternate month names is through nstrftime %Ob, not through nl_langinfo.

I hope (haven't checked) that Paul's changes to nstrftime.c will,
on platforms that don't support strftime %Ob, fall back to strftime %b.
Post by Bruno Haible
+glibc 2.26 and many others.
To be more precise: we are adding ALTMON_1 since glibc 2.27 (as GNU
extension) so it is not defined on glibc 2.26 and older.
Yup, that's what I understood from the glibc commit history.
(This is to avoid confusion: glibc 2.26 and newer? just glibc 2.26?)
That's the style we use in the gnulib manual. When something is broken
in SW version x, the reader can't assume that it works in SW versions < x.
And we try to list the highest version in which it is broken (although
often we don't know precisely).
Although I don't
understand why "case MON_*" has been removed and added. Formatting
changes maybe?
I added a level of braces. This required reindentation.
Post by Bruno Haible
+ tmm.tm_mon = item - ALTMON_1;
+ /* The platforms without nl_langinfo() don't support strftime with %OB.
+ We don't even need to try. */
+ #if 0
+ if (!strftime (nlbuf, sizeof nlbuf, "%OB", &tmm))
+ #endif
Not really, I think that this removed implementation would be useful
sometimes.
No, this code is meant for the platforms Minix, mingw, MSVC, BeOS.
mingw and MSVC don't support strftime with %OB; I checked the documentation.
And for Minix and BeOS I can tell it without even checking the documentation.

Bruno
Paul Eggert
2018-01-25 01:07:23 UTC
Permalink
Post by Rafal Luzynski
I'm afraid it will not even compile without the changes in nl_langinfo()
because ALTMON_1 and _NL_ABALTMON_1 symbols are undefined.
It should work in the Gnulib context, because those symbols are never
used in that context. At least, it worked for me when I compiled it. I
think the patch will also work in the glibc context but have not tested
this (the patch would have to be backported to glibc anyway).
Rafal Luzynski
2018-01-30 22:11:56 UTC
Permalink
Hello,

Sorry for this late response, I was focused on more urgent tasks.
Post by Paul Eggert
[...]
PS. This doesn't matter for Gnulib, but why define ABALTMON_1 only for the
!COMPILE_WIDE case?
This is only to make sure that NLW() macro works as expected.
In !COMPILE_WIDE case it just outputs its argument without any
change. So NLW(ABDAY_1) will be ABDAY_1, NLW(MON_1) will be MON_1
and so on. In COMPILE_WIDE case it prepends _NL_W to the argument
so NLW(ABDAY_1) will be _NL_WABDAY_1, NLW(MON_1) will be _NL_WMON_1
and so on. For the abbreviated alternative month names we need
_NL_ABALTMON_1 and _NL_WABALTMON_1. If we passed _NL_ABALTMON_1
directly it would make _NL_ABALTMON_1 in !COMPILE_WIDE case (correct)
and _NL_W_NL_ABALTMON_1 in COMPILE_WIDE (incorrect). So I've decided
to use NLW(ABALTMON_1) which in COMPILE_WIDE case generates
_NL_WABALTMON_1 (correct) and in !COMPILE_WIDE I define ABALTMON_1
as _NL_ABALTMON_1 so NLW() generates ABALMON_1 which is actually
_NL_ABALTMON_1. Again correct.

It would be easier if ABALTMON_1 (and all ABALTMON_*) was defined
officially which I hope happens one day but for now this simple
workaround.
Post by Paul Eggert
Not that anyone ever uses the wide code....
AFAIK it's commonly used on Windows platform. I guess this is a good
target if Gnulib is supposed to provide the GNU API on non-GNU platforms.

Thank you for your support. Best regards,

Rafal
Bruno Haible
2018-01-31 07:28:07 UTC
Permalink
Post by Rafal Luzynski
Post by Paul Eggert
Not that anyone ever uses the wide code....
AFAIK it's commonly used on Windows platform. I guess this is a good
target if Gnulib is supposed to provide the GNU API on non-GNU platforms.
No. wchar_t[] APIs are too broken for application use in general [1].
On Windows platforms, the only reasonable use of wchar_t arrays you can
make is to interface to the Windows API functions, and *nothing else*.
Gnulib will *not* recommend or favour the use of wchar_t[] APIs in
applicative code.

Bruno

[1] https://www.gnu.org/software/libunistring/manual/html_node/The-wchar_005ft-mess.html
Bruno Haible
2018-01-24 09:51:37 UTC
Permalink
Hi Rafal,
Post by Rafal Luzynski
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=2239076
This documentation patch is too vague, IMO. It purports to document
"Specify when to use %OB instead of %B." But as a programmer who is not
aware of Polish and Greek grammar, it does not precisely answer the question:

When should I use %OB, and when should I use %B, in strftime?

I would look in time.texi; I find the answer too vague: "as part of a
complete date".
Is 24 January 2018 a complete date? I'd guess yes.
Is 24 January a complete date? I'd guess no.
Is January 2018 a complete date? I'd guess no.
Is January a complete date? I guess you meant no.

Is that what you intended to mean?

Bruno
Rafal Luzynski
2018-01-24 10:20:48 UTC
Permalink
Post by Bruno Haible
Hi Rafal,
Post by Rafal Luzynski
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=2239076
This documentation patch is too vague, IMO. It purports to document
"Specify when to use %OB instead of %B." But as a programmer who is not
When should I use %OB, and when should I use %B, in strftime?
Since I am not a native English speaker I have ceased writing the
documentation to the native English speakers. As I am able to write
a documentation which is kinda correct, I am unable to write a better
documentation than those who did it. I think that this document is
a balance between being concise and being a book about Slavic or
Indo-European grammar which would be too long and too boring for
most of the programmers.

Short answer to all your questions: whatever date format you use
you should make it translatable, like:

strftime (s, max, _("%A, %B %d, %y), ...

so you leave the correct format for the translators. This should
have been true since forever because the local date formats are not
limited to whether the date is full or the month is standalone but
also includes things like whether there are dots and/or commas,
date-month order, leading zeros or spaces, etc.

Indeed, as a programmer you are not obliged to know the native
languages. But it can be useful sometimes when you have to teach
the translators.
Post by Bruno Haible
I would look in time.texi; I find the answer too vague: "as part of a
complete date".
Is 24 January 2018 a complete date? I'd guess yes.
Correct.
Post by Bruno Haible
Is 24 January a complete date? I'd guess no.
Incorrect. Explanation below.
Post by Bruno Haible
Is January 2018 a complete date? I'd guess no.
Correct.
Post by Bruno Haible
Is January a complete date? I guess you meant no.
Correct.
Post by Bruno Haible
Is that what you intended to mean?
So, indeed, you did not understand correctly and it's not
your fault but the fault of the documentation. But did not
the documentation mention that a full date is a date with
the day number included?

The issue is when the day number and the month name appear
together. Some languages require a genitive case here, like
it can also be said in English: "24th of January", meaning
"the 24th day of January". It's simliar in Spanish:
"24 de enero". But in English and Spanish this is easy:
just insert "of" and "de" everywhere and the problem is
fixed. It is more complex in Catalan: they also require
"de" but it is abbreviated to "d’" if a month name starts
with a vowel, like "abril": "24 d’abril" - this is already
too complex for glibc. That's even more complex in Slavic,
Baltic, Greek and few more languages which feature a heavy
declension: in Polish January is "styczeń" (standalone,
a nominative case) but when formatting a date it's obligatory
to say "24 stycznia" (a genitive case). A complete implementation
of this system would be larger than whole implementation of
strftime(), I suppose. :-)

So, when there is no day number (and nothing similar, like
"the second week of" or "the first Sunday of") the month
name counts as standalone, a nominative case. Also when the
year number is included this still counts as standalone
because we are still talking about a month, not a day of
a month (or another part of a month).

Sorry if this message is so long. As you can see, it is too
long to put it into the documentation. I think I should write
a blog article about it.

Regards,

Rafal
Bruno Haible
2018-01-24 13:47:13 UTC
Permalink
Hi Rafal,
Post by Rafal Luzynski
did not
the documentation mention that a full date is a date with
the day number included?
No, I don't see a definition of the term "full date" or "complete date"
there.
Post by Rafal Luzynski
The issue is when the day number and the month name appear
together.
...
So, when there is no day number (and nothing similar, like
"the second week of" or "the first Sunday of") the month
name counts as standalone, a nominative case.
Good. Here you have formulated the precise statement that I sought for.

Can you please update the doc accordingly? Change
"when the month is used as part of a complete date"
to
"when the month appears together with a day-of-month".

(AFAICS strftime does not support week-of-month statements, only
week-of-year.)
Post by Rafal Luzynski
As you can see, it is too
long to put it into the documentation. I think I should write
a blog article about it.
A blog goes away someday; the documentation is there to stay and
to be improved. Please mention the essential statements in the doc;
the extra linguistic explanations with styczeń and stycznia can
indeed go in a blog.
Post by Rafal Luzynski
Short answer to all your questions: whatever date format you use
strftime (s, max, _("%A, %B %d, %y), ...
so you leave the correct format for the translators.
Ah, but translators will not look in the glibc manual. They read only
the gettext manual. So do we need some text in the gettext manual as
well? In other words, is the %B / %OB distinction something that the
programmer can do, and the translator is not bothered about it? Or
is this distinction different according to language, so the translators
must deal with it?

Bruno
Rafal Luzynski
2018-01-25 01:13:46 UTC
Permalink
Post by Bruno Haible
Hi Rafal,
[...]
Post by Rafal Luzynski
The issue is when the day number and the month name appear
together.
...
So, when there is no day number (and nothing similar, like
"the second week of" or "the first Sunday of") the month
name counts as standalone, a nominative case.
Good. Here you have formulated the precise statement that I sought for.
Can you please update the doc accordingly? Change
"when the month is used as part of a complete date"
to
"when the month appears together with a day-of-month".
OK, I have posted this suggestion to libc-alpha:

https://sourceware.org/ml/libc-alpha/2018-01/msg00773.html
Post by Bruno Haible
(AFAICS strftime does not support week-of-month statements, only
week-of-year.)
That's true, I was thinking about possible constructions supported
by a native language. You are right, this particular construction
is not supported by strftime().
Post by Bruno Haible
[...]
Post by Rafal Luzynski
Short answer to all your questions: whatever date format you use
strftime (s, max, _("%A, %B %d, %y), ...
so you leave the correct format for the translators.
Ah, but translators will not look in the glibc manual. They read only
the gettext manual. So do we need some text in the gettext manual as
well?
I'm not sure which gettext manual you are thinking about but
this gettext manual is actually a part of glibc:

https://www.gnu.org/software/libc/manual/html_node/The-Uniforum-approach.html
Post by Bruno Haible
In other words, is the %B / %OB distinction something that the
programmer can do, and the translator is not bothered about it?
I strongly believe that the format strings should be left for
the translators and the programmer's choice of a format string
should be correct for English but this is seldom correct for other
languages. This is not because of the genitive/nominative month
names but for the reasons like:

- English often uses the month-day order, most of other languages
use the day-month order;
- many languages require a dot after the day number;
- English requires a comma after the day number if it is followed
by a year number;
- some languages (e.g., East Asian) do not have month names and
use the month numbers instead;
- and many more...
Post by Bruno Haible
Or
is this distinction different according to language, so the translators
must deal with it?
The reasons above are sufficient to tell that the translators must
have dealt with it since forever. If you are asking whether the rules
where to use %OB and where %B are universal (so the translators will
not have to decide) or not (different in different languages) I must
say that I strongly doubt about how these rules work in Czech,
Serbian, and Slovak language. But let's take a look at these
numbers (they may be inaccurate, take them as an approximation):

- there are about 200 languages supported by glibc;
- about 20 of them (10%) need the nominative/genitive distinction,
in the rest of the languages there is no difference between %OB and %B;
- about 3 of those 20 (1.5% of the total number) the rules of %OB/%B
may be different.

That means that if you (as a programmer) use %OB/%B correctly then
it will work correctly either immediately or with minor changes (reorder,
add/remove punctuation) in about 98.5% of languages. Another good
news is that if you use them incorrectly then 90% of languages
will not see any difference. :-)

Regards,

Rafal
Bruno Haible
2018-01-25 08:10:46 UTC
Permalink
Hi Rafal,
Post by Rafal Luzynski
Post by Bruno Haible
"when the month is used as part of a complete date"
to
"when the month appears together with a day-of-month".
https://sourceware.org/ml/libc-alpha/2018-01/msg00773.html
Thanks. As Carlos says, the manual can even talk about language specific
things; it already does for the plural forms [1].
Post by Rafal Luzynski
Post by Bruno Haible
Ah, but translators will not look in the glibc manual. They read only
the gettext manual. So do we need some text in the gettext manual as
well?
I'm not sure which gettext manual you are thinking about but
https://www.gnu.org/software/libc/manual/html_node/The-Uniforum-approach.html
I meant the GNU gettext manual [2].
Post by Rafal Luzynski
Post by Bruno Haible
In other words, is the %B / %OB distinction something that the
programmer can do, and the translator is not bothered about it?
I strongly believe that the format strings should be left for
the translators and the programmer's choice of a format string
should be correct for English but this is seldom correct for other
languages. This is not because of the genitive/nominative month
- English often uses the month-day order, most of other languages
use the day-month order;
- many languages require a dot after the day number;
- English requires a comma after the day number if it is followed
by a year number;
- some languages (e.g., East Asian) do not have month names and
use the month numbers instead;
- and many more...
Interesting. Thanks for these thoughts. I have opened a ticket in the
gettext bug tracker to document these things. [3].

Bruno

[1] https://www.gnu.org/software/libc/manual/html_node/Advanced-gettext-functions.html
[2] https://www.gnu.org/software/gettext/manual/
[3] https://savannah.gnu.org/bugs/?52971
Loading...