preferring ptrdiff_t to size_t for object counts

Discussion:

Paul Eggert

2017-06-05 06:45:38 UTC

GNU Emacs has long been using signed types (typically ptrdiff_t) to count
objects. This has the advantage that signed integer overflow can be detected
automatically on some platforms (unfortunately, size_t arithmetic silently wraps
around). I would like to change the Gnulib modules that GNU Emacs uses, to use
this style. The main effect on these modules' non-Emacs users would be:

* They accept ptrdiff_t counts, not size_t counts. Normally sizes are computed
by new functions like xwgrowalloc. When the caller computes sizes by hand, it is
the caller's responsibility to check for integer overflow.

* They report errors via xwalloc_die, not xalloc_die.

I've also changed the modules that GNU grep uses, as a test that this idea works
on non-Emacs applications.

As this is a nontrivial change, I'll post the Gnulib patches first without
installing them, for discussion.

Bruno Haible

2017-06-05 09:57:34 UTC

Permalink

Hi Paul,

Post by Paul Eggert
GNU Emacs has long been using signed types (typically ptrdiff_t) to count
objects. This has the advantage that signed integer overflow can be detected
automatically on some platforms (unfortunately, size_t arithmetic silently wraps
around).

I have one objection, but a big one: The direct use of ptrdiff_t.

Reasons:

1) Like you, I spend time reviewing code other people have written. In these
code reviews, it is important to know whether a variable is known to always
be >= 0 or not.

For example, when we have
int n = ...;
for (int i = 0; i < n; i++) ...
I always have to spend brain cycles around the question "what if n < 0?
Does the code still achieve its goal in this case?"

Whereas if the type clearly states the intent to store only values >= 0,
there is no issue; no extra brain cycles required.

2) Standards change, and the considerations behind 'walloc' may also change.

Do you want, 5 or 10 years from now, to go through hundreds of uses of
'ptrdiff_t' and separate those uses with values >= 0 from those with values
that can be negative? I certainly don't want to.

3) GCC has range types for Ada. I would hope that someday it also has range
types for C or C++. Then, it would be very useful to express the fact that
the values are in the range [0..PTRDIFF_MAX], so that GCC can use it for
optimization.

4) For static analysis tools (gnulib now uses coverity in particular), I can
imagine that an unsigned type is easier to work with than a signed type
(i.e. that the tool can make more inferences and therefore detect more bugs
when using unsigned types).
To this effect, it is useful to use an unsigned type for those counters /
size_t object, *just* for the static analysis tool.

To fix all of these issues, I suggest to use a typedef'ed type, instead. For
example:

typedef ptrdiff_t wsize_t;

And then use wsize_t everywhere.

This solves problems 1), 2), 3), and 4 (through a #ifdefed definition of
wsize_t).

Yes it means that people reading the code will have to memorize one more type
identifier. But it is to their benefit: they will know the values are >= 0
(see point 1).

Bruno

Bruno Haible

2017-06-07 21:53:39 UTC

Permalink

Post by Bruno Haible
typedef ptrdiff_t wsize_t;

'wsize_t' or 'wcount_t'. I don't really mind the name of the type - as
long as it's a typedef.

Bruno

Paul Eggert

2017-06-07 22:12:04 UTC

Permalink

Post by Bruno Haible
I don't really mind the name of the type - as
long as it's a typedef.

I've been leaning towards a name that doesn't start with 'w', since the
type is not specific to the walloc module family. The name I'm currently
thinking of is 'in_t', short for "index type". That's an
easy-to-remember name (the type is like 'int', but possibly wider).

One other advantage of having our own signed type is that we can
guarantee that it's at least as wide as int (something that is not true
for ptrdiff_t). That way, some of my current code that says 'MIN
(INT_MAX, PTRDIFF_MAX)' can be simplified to the more-natural INT_MAX.
This is helpful for traditional interfaces that use int counters.

Bruno Haible

2017-06-08 00:36:10 UTC

Permalink

Hi Paul,

Post by Paul Eggert
The name I'm currently
thinking of is 'in_t', short for "index type". That's an
easy-to-remember name (the type is like 'int', but possibly wider).

Fine with me.

It doesn't collide: Only very few packages use this identifier 'in_t', and
only in isolated places.

Post by Paul Eggert
One other advantage of having our own signed type is that we can
guarantee that it's at least as wide as int (something that is not true
for ptrdiff_t). That way, some of my current code that says 'MIN
(INT_MAX, PTRDIFF_MAX)' can be simplified to the more-natural INT_MAX.
This is helpful for traditional interfaces that use int counters.

Indeed. (Although portability to Windows 3.1 is not in the focus of gnulib
nor of GNU programs any more.)

Bruno

Bruno Haible

2017-06-05 10:07:15 UTC

Permalink

Hi Paul,

I'd like to understand how much better this "ptrdiff_t world" is.

Post by Paul Eggert
This has the advantage that signed integer overflow can be detected
automatically on some platforms

You mean "-fsanitize=undefined", right?

Does this also catch the following situations?

a) Pointer subtraction. ISO C11 § J.2 says:
"The behavior is undefined in the following circumstances: ...
The result of subtracting two pointers is not representable in an object
of type ptrdiff_t (6.5.6)."

b) When assigning a 'size_t' value > PTRDIFF_MAX to a 'ptrdiff_t' variable,
is that undefined behaviour? Is that caught by "-fsanitize=undefined"?

Bruno