Discussion:
[PATCH] md5: accepts a new --threads option
(too old to reply)
Pádraig Brady
2009-10-22 11:09:05 UTC
Permalink
p.s. I'll look at bypassing stdio on input to see
if I can get at least the 2% back
IMHO, even if it did, it would not be worth it.
Right, a quick test here shows only a 0.8% gain from
bypassing stdio. However I also noticed that the
digest routines in gnulib do fread(4096).
Bumping that up to 32KiB gives a 3% boost.
That would be increased even more if more efficient
checksumming routines are used (which is on the horizon).
Note the coreutils min was bumped to 32KiB recently:
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=55efc5f

Does anyone have any objections to increasing
the stack requirement by 28672 bytes?

cheers,
Pádraig.
Eric Blake
2009-10-22 11:58:28 UTC
Permalink
Post by Pádraig Brady
digest routines in gnulib do fread(4096).
Bumping that up to 32KiB gives a 3% boost.
Well spotted.
Post by Pádraig Brady
Does anyone have any objections to increasing
the stack requirement by 28672 bytes?
None here - support for a stack of at least 1 megabyte is very common
these days. And the faster I/O speed justifies it.

- --
Don't work too hard, make some time for fun as well!

Eric Blake ***@byu.net
Jim Meyering
2009-10-22 12:05:30 UTC
Permalink
Post by Pádraig Brady
p.s. I'll look at bypassing stdio on input to see
if I can get at least the 2% back
IMHO, even if it did, it would not be worth it.
Right, a quick test here shows only a 0.8% gain from
bypassing stdio. However I also noticed that the
digest routines in gnulib do fread(4096).
Bumping that up to 32KiB gives a 3% boost.
Nice.
Post by Pádraig Brady
That would be increased even more if more efficient
checksumming routines are used (which is on the horizon).
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=55efc5f
Does anyone have any objections to increasing
the stack requirement by 28672 bytes?
That sounds reasonable to me, too.
No objection here.
Paolo Bonzini
2009-10-22 14:54:33 UTC
Permalink
Post by Pádraig Brady
p.s. I'll look at bypassing stdio on input to see
if I can get at least the 2% back
IMHO, even if it did, it would not be worth it.
Right, a quick test here shows only a 0.8% gain from
bypassing stdio. However I also noticed that the
digest routines in gnulib do fread(4096).
Bumping that up to 32KiB gives a 3% boost.
That would be increased even more if more efficient
checksumming routines are used (which is on the horizon).
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=55efc5f
Does anyone have any objections to increasing
the stack requirement by 28672 bytes?
I do. It's common for threads to not have a >32 kb thread. What I
don't object to, is mallocing the buffer. :-)

Paolo
Pádraig Brady
2009-10-23 09:20:24 UTC
Permalink
Post by Paolo Bonzini
Post by Pádraig Brady
p.s. I'll look at bypassing stdio on input to see
if I can get at least the 2% back
IMHO, even if it did, it would not be worth it.
Right, a quick test here shows only a 0.8% gain from
bypassing stdio. However I also noticed that the
digest routines in gnulib do fread(4096).
Bumping that up to 32KiB gives a 3% boost.
That would be increased even more if more efficient
checksumming routines are used (which is on the horizon).
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=55efc5f
Does anyone have any objections to increasing
the stack requirement by 28672 bytes?
I do. It's common for threads to not have a >32 kb thread. What I
don't object to, is mallocing the buffer. :-)
Thanks Paolo,

How about the attached.
Per byte it performs the same as the stack method,
per file there is a small overhead.

cheers,
Pádraig.
Jim Meyering
2009-10-23 10:22:16 UTC
Permalink
Pádraig Brady wrote:
...
This results in a significant decrease in syscall overhead
giving a 3% speedup to the digest utilities for example
(when processing large files from cache).
Storage is moved from the stack to the heap as some
threaded environments for example can have small stacks.
...
+ Use a better IO block size for modern systems
+ * lib/copy-file.c (copy_file_preserving): Used a 32KiB malloced buffer.
+ * lib/md2.c: Likewise.
+ * lib/md4.c: Likewise.
+ * lib/md5.c: Likewise.
+ * lib/sha1.c: Likewise.
+ * lib/sha256.c: Likewise.
+ * lib/sha512.c: Likewise.
...
void
copy_file_preserving (const char *src_filename, const char *dest_filename)
@@ -58,8 +60,7 @@ copy_file_preserving (const char *src_filename, const char *dest_filename)
struct stat statbuf;
int mode;
int dest_fd;
- char buf[4096];
- const size_t buf_size = sizeof (buf);
+ char *buf = xmalloc (IO_SIZE);
Hi Pádraig,

We must not use functions like xmalloc (that can exit)
from within library code.

Instead, you might want to use malloc, and if that fails,
revert to using the buffer on the stack.
Pádraig Brady
2009-10-23 10:22:05 UTC
Permalink
Post by Jim Meyering
...
copy_file_preserving (const char *src_filename, const char *dest_filename)
@@ -58,8 +60,7 @@ copy_file_preserving (const char *src_filename, const char *dest_filename)
struct stat statbuf;
int mode;
int dest_fd;
- char buf[4096];
- const size_t buf_size = sizeof (buf);
+ char *buf = xmalloc (IO_SIZE);
Hi Pádraig,
We must not use functions like xmalloc (that can exit)
from within library code.
Instead, you might want to use malloc, and if that fails,
revert to using the buffer on the stack.
copy_file_preserving() already exits on error
(and is documented to do so), which is why
I used xmalloc there?

cheers,
Pádraig.
Jim Meyering
2009-10-23 10:30:57 UTC
Permalink
Post by Pádraig Brady
Post by Jim Meyering
...
copy_file_preserving (const char *src_filename, const char *dest_filename)
@@ -58,8 +60,7 @@ copy_file_preserving (const char *src_filename, const char *dest_filename)
struct stat statbuf;
int mode;
int dest_fd;
- char buf[4096];
- const size_t buf_size = sizeof (buf);
+ char *buf = xmalloc (IO_SIZE);
Hi Pádraig,
We must not use functions like xmalloc (that can exit)
from within library code.
Instead, you might want to use malloc, and if that fails,
revert to using the buffer on the stack.
copy_file_preserving() already exits on error
(and is documented to do so), which is why
I used xmalloc there?
<blush!> You're right. Don't mind me ;-)
Jim Meyering
2009-10-23 10:34:05 UTC
Permalink
diff --git a/lib/md2.c b/lib/md2.c
index cb4c63b..f8878c0 100644
--- a/lib/md2.c
+++ b/lib/md2.c
@@ -33,7 +33,7 @@
# include "unlocked-io.h"
#endif
-#define BLOCKSIZE 4096
+#define BLOCKSIZE 32768
#if BLOCKSIZE % 64 != 0
# error "invalid BLOCKSIZE"
#endif
@@ -94,9 +94,12 @@ int
md2_stream (FILE *stream, void *resblock)
{
struct md2_ctx ctx;
- char buffer[BLOCKSIZE + 72];
size_t sum;
+ char* buffer = malloc(BLOCKSIZE + 72);
Spacing:

char *buffer = malloc (BLOCKSIZE + 72);
+ if (!buffer)
+ return 1;
Where is that memory freed?
Paolo Bonzini
2009-10-23 10:40:08 UTC
Permalink
+  char* buffer = malloc(BLOCKSIZE + 72);
    char *buffer = malloc (BLOCKSIZE + 72);
+  if (!buffer)
+    return 1;
Where is that memory freed?
Everything else is fine by me.

Paolo
Pádraig Brady
2009-10-23 11:17:54 UTC
Permalink
Post by Jim Meyering
diff --git a/lib/md2.c b/lib/md2.c
index cb4c63b..f8878c0 100644
--- a/lib/md2.c
+++ b/lib/md2.c
@@ -33,7 +33,7 @@
# include "unlocked-io.h"
#endif
-#define BLOCKSIZE 4096
+#define BLOCKSIZE 32768
#if BLOCKSIZE % 64 != 0
# error "invalid BLOCKSIZE"
#endif
@@ -94,9 +94,12 @@ int
md2_stream (FILE *stream, void *resblock)
{
struct md2_ctx ctx;
- char buffer[BLOCKSIZE + 72];
size_t sum;
+ char* buffer = malloc(BLOCKSIZE + 72);
char *buffer = malloc (BLOCKSIZE + 72);
+ if (!buffer)
+ return 1;
Where is that memory freed?
LOL. <blush!!>

I also didn't include stdlib.h
I also forgot to update sha{224,384}_stream()
I also messed up the tabbing

Hopefully last version attached.

cheers,
Pádraig.
Jim Meyering
2009-10-23 16:01:12 UTC
Permalink
...
Post by Pádraig Brady
Post by Jim Meyering
+ char* buffer = malloc(BLOCKSIZE + 72);
char *buffer = malloc (BLOCKSIZE + 72);
+ if (!buffer)
+ return 1;
Where is that memory freed?
LOL. <blush!!>
I also didn't include stdlib.h
I also forgot to update sha{224,384}_stream()
Glad you found those.
Post by Pádraig Brady
I also messed up the tabbing
...
Post by Pádraig Brady
Subject: [PATCH] digests, copy-file: increase the IO buffer size from 4KiB to 32KiB
This results in a significant decrease in syscall overhead
giving a 3% speedup to the digest utilities for example
(when processing large files from cache).
Storage is moved from the stack to the heap as some
threaded environments for example can have small stacks.
...
Post by Pádraig Brady
diff --git a/lib/copy-file.c b/lib/copy-file.c
...
Post by Pádraig Brady
+enum { IO_SIZE = 32*1024 };
Almost there.
I like the enum.
Did you consider making BLOCKSIZE, below, an enum, too?
as long as you're changing it...
Post by Pádraig Brady
diff --git a/lib/md2.c b/lib/md2.c
...
Post by Pádraig Brady
+ char* buffer = malloc (BLOCKSIZE + 72);
Spacing again (search for 'char*'; there are more):

char *buffer = malloc (BLOCKSIZE + 72);

Thanks!
Pádraig Brady
2009-10-23 16:11:59 UTC
Permalink
Post by Jim Meyering
...
Post by Pádraig Brady
diff --git a/lib/copy-file.c b/lib/copy-file.c
...
Post by Pádraig Brady
+enum { IO_SIZE = 32*1024 };
Almost there.
I like the enum.
Did you consider making BLOCKSIZE, below, an enum, too?
as long as you're changing it...
I did consider, but that would involve changing to
use the compiler to verify() it was a multiple of 64.
I thought that a little invasive and should be part of
a general refactoring that seems needed on those modules.
Post by Jim Meyering
Post by Pádraig Brady
diff --git a/lib/md2.c b/lib/md2.c
...
Post by Pádraig Brady
+ char* buffer = malloc (BLOCKSIZE + 72);
char *buffer = malloc (BLOCKSIZE + 72);
Stupid muscle memory.
Attached.

cheers,
Pádraig.
Jim Meyering
2009-10-23 20:06:13 UTC
Permalink
Post by Pádraig Brady
Post by Jim Meyering
...
Post by Pádraig Brady
diff --git a/lib/copy-file.c b/lib/copy-file.c
...
Post by Pádraig Brady
+enum { IO_SIZE = 32*1024 };
Almost there.
I like the enum.
Did you consider making BLOCKSIZE, below, an enum, too?
as long as you're changing it...
I did consider, but that would involve changing to
use the compiler to verify() it was a multiple of 64.
I thought that a little invasive and should be part of
a general refactoring that seems needed on those modules.
Post by Jim Meyering
Post by Pádraig Brady
diff --git a/lib/md2.c b/lib/md2.c
...
Post by Pádraig Brady
+ char* buffer = malloc (BLOCKSIZE + 72);
char *buffer = malloc (BLOCKSIZE + 72);
Stupid muscle memory.
Attached.
...
Post by Pádraig Brady
Subject: [PATCH] digests, copy-file: increase the IO buffer size from 4KiB to 32KiB
This results in a significant decrease in syscall overhead
giving a 3% speedup to the digest utilities for example
(when processing large files from cache).
Storage is moved from the stack to the heap as some
threaded environments for example can have small stacks.
* lib/copy-file.c (copy_file_preserving): Use a 32KiB malloced buffer
* modules/copy-file: Depend on xalloc
* lib/md2.c: Likewise
* lib/md4.c: Likewise
* lib/md5.c: Likewise
* lib/sha1.c: Likewise
* lib/sha256.c: Likewise
* lib/sha512.c: Likewise
Thanks. I applied that with git am to the
change set 3 or 4 down the stack, rebased it to the top,
and tested with this:

./gnulib-tool --create-testdir --with-tests --test crypto/md2 crypto/md4 \
crypto/md5 crypto/sha1 crypto/sha256 crypto/sha512

It passed, so I pushed it.
Bruno Haible
2009-10-24 14:04:01 UTC
Permalink
+ * lib/copy-file.c (copy_file_preserving): Used a 32KiB malloced buffer.
Fine with me too. Yes, 32 KB is more than you can safely allocate on the stack
in a multithreaded program: The default thread stack size is:
- glibc i386, x86_64 7.4 MB
- Tru64 5.1 5.2 MB
- Cygwin 1.8 MB
- Solaris 7..10 1 MB
- MacOS X 10.5 460 KB
- AIX 5 98 KB
- OpenBSD 4.0 64 KB
- HP-UX 11 16 KB

And the default stack size for sigaltstack, SIGSTKSZ, is
- only 16 KB on some platforms: IRIX, OSF/1, Haiku.
- only 8 KB on some platforms: glibc, NetBSD, OpenBSD, HP-UX, Solaris.
- only 4 KB on some platforms: AIX.

Bruno


=============== Program for determining the default thread stack size =========
#include <alloca.h>
#include <pthread.h>
#include <stdio.h>
void* threadfunc (void*p) {
int n = 0;
for (;;) {
printf("Allocated %d bytes\n", n);
fflush(stdout);
n += 128;
*((volatile char *) alloca(128)) = 0;
}
}

int main()
{
pthread_t thread;
pthread_create(&thread, NULL, threadfunc, NULL);
for (;;) {}
}
Paolo Bonzini
2009-10-24 17:42:56 UTC
Permalink
Post by Bruno Haible
+       * lib/copy-file.c (copy_file_preserving): Used a 32KiB malloced buffer.
Fine with me too. Yes, 32 KB is more than you can safely allocate on the stack
 - glibc i386, x86_64    7.4 MB
Even when -lrt is added?

Paolo
Bruno Haible
2009-10-25 20:37:46 UTC
Permalink
Post by Paolo Bonzini
Post by Bruno Haible
 - glibc i386, x86_64    7.4 MB
Even when -lrt is added?
Yes, sure, I get the same results whether the program is linked with -lpthread
or with -lpthread -lrt. librt is not meant to modify the behaviour of
libpthread.

Bruno
Jim Meyering
2009-10-23 16:02:45 UTC
Permalink
Pádraig Brady wrote:
...
Post by Pádraig Brady
diff --git a/lib/copy-file.c b/lib/copy-file.c
...
Post by Pádraig Brady
+enum { IO_SIZE = 32*1024 };
One more nit. Officially, we prefer to put a space on each side of every
binary operator:

enum { IO_SIZE = 32 * 1024 };
Bruno Haible
2009-10-25 21:41:00 UTC
Permalink
I went for `core-count'. This is the first version of the new program,
it is a simple wrapper around the gnulib nproc module
This program (and the underlying gnulib 'nproc' module) is IMO too simplistic.

First of all, is the program meant to be a hardware inspection tool (like
"hwinfo --cpu")? Or is meant to be an auxiliary program for helping shell
scripts that want to dispatch tasks onto a maximum number of processors?

If it is meant to be a hardware inspection tool, then it should IMO try to
reflect current multiprocessing hardware architectures. I mean the fact that
a computer's central unit can have several thread execution units, combined
at three different levels:
- Several CPU modules in a single computer,
- Several CPU chips in a single package (multi-chip module [2]),
- Several execution cores on a single chip (multi-core [3]),
- Several thread execution units per CPU core (hyper-threading [4]).
See the first paragraph of [1].

If it is meant as a tool for helping the parallelization of tasks at the
shell script level, then it needs to take into account
1) the fact that the current process may be limited to a certain subset
of the available CPUs. See the Linux/NetBSD function
pthread_setaffinity_np [5][6] and the IRIX notion of a CPU that is
available only to root processes [7].
2) the wish of users to not use all processors at once. Users may want to
save 1 CPU for their GUI interactions. This can most comfortably be
done through an environment variable, such as OMP_NUM_THREADS. [8]
An implementation that considers 1) and 2) can be found in OpenMP. So,
why not simply use the function omp_set_num_threads() that is provided
by GCC in its libgomp library (in a compiler agnostic way through the
AC_OPENMP macro of Autoconf)?

Bruno


[1] http://en.wikipedia.org/wiki/Multiprocessing
[2] http://en.wikipedia.org/wiki/Multi-chip_module
[3] http://en.wikipedia.org/wiki/Multi-core_(computing)
[4] http://en.wikipedia.org/wiki/Hyper-threading
[5] http://www.kernel.org/doc/man-pages/online/pages/man3/pthread_setaffinity_np.3.html
[6] http://www.daemon-systems.org/man/pthread_getaffinity_np.3.html
[7] http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?cmd=getdoc&coll=0650&db=man&fname=2%20sysmp
[8] http://gcc.gnu.org/onlinedocs/libgomp/OMP_005fNUM_005fTHREADS.html
Giuseppe Scrivano
2009-10-25 23:45:53 UTC
Permalink
Post by Bruno Haible
This program (and the underlying gnulib 'nproc' module) is IMO too simplistic.
First of all, is the program meant to be a hardware inspection tool (like
"hwinfo --cpu")? Or is meant to be an auxiliary program for helping shell
scripts that want to dispatch tasks onto a maximum number of processors?
No, it should not be a hardware inspection tool but a portable tool to
help shell scripts to have an idea of how many processes can be executed
at the same time. If we get too much into details then we loose
portability (at least at a cheap price).
Post by Bruno Haible
If it is meant as a tool for helping the parallelization of tasks at the
shell script level, then it needs to take into account
1) the fact that the current process may be limited to a certain subset
of the available CPUs. See the Linux/NetBSD function
pthread_setaffinity_np [5][6] and the IRIX notion of a CPU that is
available only to root processes [7].
Probably it is better to add an option, making a difference between the
number of "real" and "effective" cores. Where it is not possible to
know correctly the latter, they'll coincide.
Post by Bruno Haible
2) the wish of users to not use all processors at once. Users may want to
save 1 CPU for their GUI interactions. This can most comfortably be
done through an environment variable, such as
OMP_NUM_THREADS. [8]
What about leave to the user the decisione to use less threads/processes
than core-count reports?

For example, assuming that `sort' will soon get the --threads option and
an user decides to use all cores except one to sort a file, then it can
be done as:

sort --threads="$(($(core-count) - 1))" huge_file


Cheers,
Giuseppe
Gilles Espinasse
2009-10-26 07:07:42 UTC
Permalink
----- Original Message -----
From: "Giuseppe Scrivano" <***@gnu.org>
To: "Bruno Haible" <***@clisp.org>
Cc: <bug-***@gnu.org>; <bug-***@gnu.org>; "Jim Meyering"
<***@meyering.net>
Sent: Monday, October 26, 2009 12:45 AM
Subject: Re: [PATCH] core-count: A new program to count the number of
cpucores


...
Post by Giuseppe Scrivano
Post by Bruno Haible
If it is meant as a tool for helping the parallelization of tasks at the
shell script level, then it needs to take into account
1) the fact that the current process may be limited to a certain subset
of the available CPUs. See the Linux/NetBSD function
pthread_setaffinity_np [5][6] and the IRIX notion of a CPU that is
available only to root processes [7].
Probably it is better to add an option, making a difference between the
number of "real" and "effective" cores. Where it is not possible to
know correctly the latter, they'll coincide.
Post by Bruno Haible
2) the wish of users to not use all processors at once. Users may want to
save 1 CPU for their GUI interactions. This can most comfortably be
done through an environment variable, such as
OMP_NUM_THREADS. [8]
What about leave to the user the decisione to use less threads/processes
than core-count reports?
For example, assuming that `sort' will soon get the --threads option and
an user decides to use all cores except one to sort a file, then it can
sort --threads="$(($(core-count) - 1))" huge_file
Cheers,
Giuseppe
that assume --threads is able to handle negative count or 0
Would not be more efficient to handle the count in core-count?
That mean core-count should always report a minimum of 1 and discount from
total the number passed as an option.

Probably the first usage of core-count will be to feed make -j

Gilles
Pádraig Brady
2009-10-26 10:33:21 UTC
Permalink
Post by Gilles Espinasse
----- Original Message -----
Post by Giuseppe Scrivano
For example, assuming that `sort' will soon get the --threads option and
an user decides to use all cores except one to sort a file, then it can
sort --threads="$(($(core-count) - 1))" huge_file
that assume --threads is able to handle negative count or 0
Would not be more efficient to handle the count in core-count?
That mean core-count should always report a minimum of 1 and discount from
total the number passed as an option.
Hmm it's a bit surprising that min()/max() are not available
as $((shell arithmetic)) or in `expr`. Consequently I agree that
adding the option you suggest is useful. What will we call it though?
Maybe:

--ignore=N If possible reduce the count by N
Post by Gilles Espinasse
Probably the first usage of core-count will be to feed make -j
Note for commands that do split the data internally
and process in parallel, I dislike the --threads name
since it implies a particular implementation.
So how about sort -j,--jobs to match `make`?
Though personally I like --mproc ?

cheers,
Pádraig.
Paolo Bonzini
2009-10-26 10:53:34 UTC
Permalink
Post by Pádraig Brady
So how about sort -j,--jobs to match `make`?
Agreed. However, I think that for coreutils programs it should be the
default to use threads whenever possible.

Paolo
Paolo Bonzini
2009-10-26 10:53:34 UTC
Permalink
Post by Pádraig Brady
So how about sort -j,--jobs to match `make`?
Agreed. However, I think that for coreutils programs it should be the
default to use threads whenever possible.

Paolo
Giuseppe Scrivano
2009-10-26 20:53:07 UTC
Permalink
Post by Pádraig Brady
Hmm it's a bit surprising that min()/max() are not available
as $((shell arithmetic)) or in `expr`. Consequently I agree that
adding the option you suggest is useful. What will we call it though?
I remember a recent discussion about adding min/max to sort. Is still
this feature desired?
Post by Pádraig Brady
--ignore=N If possible reduce the count by N
Even if it looks a bit ugly, IMO this new option can help to put less
logic in every shell script using it.


Cheers,
Giuseppe
Bruno Haible
2009-10-26 23:31:28 UTC
Permalink
Post by Pádraig Brady
Hmm it's a bit surprising that min()/max() are not available
as $((shell arithmetic)) or in `expr`. Consequently I agree that
adding the option you suggest is useful.
But min() and max() are available through the 'test'
program or shell built-in command. If the user can write

n=`count-cores`
test ! $n -gt 5 || n=5
test ! $n -lt 2 || n=2

there is no big advantage in additional optional that allow him to
write

n=`count-cores --min=2 --max=5`

Bruno
Pádraig Brady
2009-10-27 10:31:31 UTC
Permalink
Post by Bruno Haible
Post by Pádraig Brady
Hmm it's a bit surprising that min()/max() are not available
as $((shell arithmetic)) or in `expr`. Consequently I agree that
adding the option you suggest is useful.
But min() and max() are available through the 'test'
program or shell built-in command. If the user can write
n=`count-cores`
test ! $n -gt 5 || n=5
test ! $n -lt 2 || n=2
there is no big advantage in additional optional that allow him to
write
n=`count-cores --min=2 --max=5`
No big advantage no. But when you also add in the relative calculation
for getting n-1 for example, then it's get's a little too messy in shell
for a very common case.

cheers,
Pádraig.
Bruno Haible
2009-10-26 23:27:40 UTC
Permalink
Hi Giuseppe,
Post by Giuseppe Scrivano
No, it should not be a hardware inspection tool but a portable
tool to help shell scripts to have an idea of how many
processes can be executed at the same time. If we get too
much into details then we loose portability
Good. This is important info; IMO it belongs in the
coreutils.texi documentation.
Post by Giuseppe Scrivano
Post by Bruno Haible
1) the fact that the current process may be limited to a certain subset
of the available CPUs. See the Linux/NetBSD function
pthread_setaffinity_np [5][6] and the IRIX notion of a CPU that is
available only to root processes [7].
Probably it is better to add an option, making a difference
between the number of "real" and "effective" cores.
Yes, and for the intended use that you described above
the number of "available" cores should the result without
an option, whereas the "installed" cores would require
an option. (I would not want to use the terms "real" and
"effective" here because that brings up wrong associations
with user ids and group ids.)
Post by Giuseppe Scrivano
Post by Bruno Haible
2) the wish of users to not use all processors at once. Users may want to
save 1 CPU for their GUI interactions. This can most comfortably be
done through an environment variable, such as
OMP_NUM_THREADS. [8]
What about leave to the user the decisione to use less
threads/processes than core-count reports?
This is precisely what can be achieved with the
OMP_NUM_THREADS variable. Say, he has a CPU with 4 cores,
wants to reserve 1 for his GUI and 1 for ***@home, then
he will launch the ***@home process with OMP_NUM_THREADS=1
and all other processes with OMP_NUM_THREADS=2.
Post by Giuseppe Scrivano
For example, assuming that `sort' will soon get the --threads
option and an user decides to use all cores except one to sort
sort --threads="$(($(core-count) - 1))" huge_file
Or possibly by:
env OMP_NUM_THREADS="$(($(core-count) - 1))" sort huge_file
no?

Some programs, like 'msgmerge' from GNU gettext, already pay
attention to the OMP_NUM_THREADS variable - a convention shared
by all programs that rely on OpenMP. Can you make the 'sort'
program use the same convention?

Bruno
Giuseppe Scrivano
2009-10-27 08:42:41 UTC
Permalink
Hi Bruno,
Post by Bruno Haible
Post by Giuseppe Scrivano
No, it should not be a hardware inspection tool but a portable
tool to help shell scripts to have an idea of how many
processes can be executed at the same time. If we get too
much into details then we loose portability
Good. This is important info; IMO it belongs in the
coreutils.texi documentation.
Thanks, I will add this note to the documentation.
Post by Bruno Haible
Yes, and for the intended use that you described above
the number of "available" cores should the result without
an option, whereas the "installed" cores would require
an option. (I would not want to use the terms "real" and
"effective" here because that brings up wrong associations
with user ids and group ids.)
Ok, that makes sense.
Post by Bruno Haible
Post by Giuseppe Scrivano
What about leave to the user the decisione to use less
threads/processes than core-count reports?
This is precisely what can be achieved with the
OMP_NUM_THREADS variable. Say, he has a CPU with 4 cores,
and all other processes with OMP_NUM_THREADS=2.
Post by Giuseppe Scrivano
For example, assuming that `sort' will soon get the --threads
option and an user decides to use all cores except one to sort
sort --threads="$(($(core-count) - 1))" huge_file
env OMP_NUM_THREADS="$(($(core-count) - 1))" sort huge_file
no?
Some programs, like 'msgmerge' from GNU gettext, already pay
attention to the OMP_NUM_THREADS variable - a convention shared
by all programs that rely on OpenMP. Can you make the 'sort'
program use the same convention?
I am not working on the multi-threaded sort, but if somebody asks I can
make it read OMP_NUM_THREADS.
If is is already used by other GNU programs, then it seems a good idea
to use this value when --threads is not specified on the command line.


Regards,
Giuseppe
Paolo Bonzini
2009-10-27 09:46:51 UTC
Permalink
Post by Giuseppe Scrivano
Post by Bruno Haible
Some programs, like 'msgmerge' from GNU gettext, already pay
attention to the OMP_NUM_THREADS variable - a convention shared
by all programs that rely on OpenMP. Can you make the 'sort'
program use the same convention?
I am not working on the multi-threaded sort, but if somebody asks I can
make it read OMP_NUM_THREADS.
If is is already used by other GNU programs, then it seems a good idea
to use this value when --threads is not specified on the command line.
I agree. This also implies that no OMP_NUM_THREADS and no --threads
_will_ activate threading by default.

Of course this should only apply if its effect is not externally
observable; if I have a very small file B and a very large file A, and I
can get

$ md5sum --threads A B
abcdabcdabcdabcdabcdabcdabcdabcd B
12341234123412341234123412341234 A

Then the option would be necessary.

Paolo
Pádraig Brady
2009-10-27 10:36:56 UTC
Permalink
Post by Paolo Bonzini
Post by Giuseppe Scrivano
Post by Bruno Haible
Some programs, like 'msgmerge' from GNU gettext, already pay
attention to the OMP_NUM_THREADS variable - a convention shared
by all programs that rely on OpenMP. Can you make the 'sort'
program use the same convention?
I am not working on the multi-threaded sort, but if somebody asks I can
make it read OMP_NUM_THREADS.
If is is already used by other GNU programs, then it seems a good idea
to use this value when --threads is not specified on the command line.
I agree. This also implies that no OMP_NUM_THREADS and no --threads
_will_ activate threading by default.
Of course this should only apply if its effect is not externally
observable; if I have a very small file B and a very large file A, and I
can get
$ md5sum --threads A B
abcdabcdabcdabcdabcdabcdabcdabcd B
12341234123412341234123412341234 A
Then the option would be necessary.
Good point Paolo.
I think that's another argument for splitting separate files to different threads.

cheers,
Pádraig.
Pádraig Brady
2009-10-27 10:55:00 UTC
Permalink
Post by Pádraig Brady
Post by Paolo Bonzini
Of course this should only apply if its effect is not externally
observable; if I have a very small file B and a very large file A, and I
can get
$ md5sum --threads A B
abcdabcdabcdabcdabcdabcdabcdabcd B
12341234123412341234123412341234 A
Then the option would be necessary.
Good point Paolo.
I think that's another argument for splitting separate files to different threads.
Grr. An argument for _not_ splitting.
As per the reasons I gave here:
http://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00179.html

cheers,
Pádraig.
Paolo Bonzini
2009-10-27 11:20:43 UTC
Permalink
Post by Paolo Bonzini
$ md5sum --threads A B
abcdabcdabcdabcdabcdabcdabcdabcd B
12341234123412341234123412341234 A
Grr. An argument for_not_ splitting.
It is indeed that way.

In http://lists.gnu.org/archive/html/bug-coreutils/2009-10/msg00179.html
Now it's a different story if the data within a file
could be processed in parallel. I.E. if the digest
algorithms themselves could be parallelized.
And indeed in this case the decision would be a no-brainer. That is the
case for sort, for example.

Maybe we want a --parallel option (too bad -p is taken) for xargs that
forces the creation of the number of processes passed with -P or taken
from nproc (for example by starting "md5sum $1 $5 $9 ...", "md5sum $2 $6
$10 ...", etc.)?

That would be an interesting alternative to this core-count proposal...

Paolo
Pádraig Brady
2009-10-27 12:16:31 UTC
Permalink
Post by Paolo Bonzini
Maybe we want a --parallel option (too bad -p is taken) for xargs that
forces the creation of the number of processes passed with -P or taken
from nproc (for example by starting "md5sum $1 $5 $9 ...", "md5sum $2 $6
$10 ...", etc.)?
That would be an interesting alternative to this core-count proposal...
I'm not sure what you mean here.
I already suggested to the xargs maintainer that `xargs -P`
should be equivalent to xargs -P$(nproc).
`nproc` as an external command would still be useful though.

cheers,
Pádraig.
Paolo Bonzini
2009-10-27 12:22:12 UTC
Permalink
Post by Pádraig Brady
I already suggested to the xargs maintainer that `xargs -P`
should be equivalent to xargs -P$(nproc).
I was thinking of an additional option that would automatically decrease
-n so that the requested number of processes is started (then of course
the load may not be well balanced).
Post by Pádraig Brady
`nproc` as an external command would still be useful though.
Possibly, but much less.

Paolo
Pádraig Brady
2009-10-27 16:46:41 UTC
Permalink
Post by Paolo Bonzini
Post by Pádraig Brady
I already suggested to the xargs maintainer that `xargs -P`
should be equivalent to xargs -P$(nproc).
I was thinking of an additional option that would automatically decrease
-n so that the requested number of processes is started (then of course
the load may not be well balanced).
So you mean, rather than the current situation of:

$ yes . | head -n13 | xargs -n4 -P2
. . . .
. . . .
. . . .
.

xargs could try to distribute like:

$ yes . | head -n13 | xargs -n4 -P2
. . . .
. . . .
. . .
. .

Hmm, it might be able to do that unconditionally?

cheers,
Pádraig.
Paolo Bonzini
2009-10-27 19:00:54 UTC
Permalink
Post by Pádraig Brady
Post by Paolo Bonzini
I was thinking of an additional option that would automatically decrease
-n so that the requested number of processes is started (then of course
the load may not be well balanced).
$ yes . | head -n13 | xargs -n4 -P2
. . . .
. . . .
. . . .
.
$ yes . | head -n13 | xargs -n4 -P2
. . . .
. . . .
. . .
. .
No, more like

seq 1 13 | xargs --parallel -P4
1 5 9 13
2 6 10
3 7 11
4 8 12

(Note there's no -n). Same for

seq 1 13 | xargs --parallel

on a 4-core machine. This is _by design_ rearranging files, so it
requires an option.

Paolo
Pádraig Brady
2009-11-03 11:27:15 UTC
Permalink
Post by Paolo Bonzini
Post by Pádraig Brady
Post by Paolo Bonzini
I was thinking of an additional option that would automatically decrease
-n so that the requested number of processes is started (then of course
the load may not be well balanced).
$ yes . | head -n13 | xargs -n4 -P2
. . . .
. . . .
. . . .
.
$ yes . | head -n13 | xargs -n4 -P2
. . . .
. . . .
. . .
. .
No, more like
seq 1 13 | xargs --parallel -P4
1 5 9 13
2 6 10
3 7 11
4 8 12
(Note there's no -n). Same for
seq 1 13 | xargs --parallel
on a 4-core machine. This is _by design_ rearranging files, so it
requires an option.
Right, you're not auto decreasing -n, but when we read all args and
we pass arguments round robin, the args will be distrubuted evenly to
each parallel process. Does this really require a new option though?
When -P is used, the arguments could be processed in any order anyway.

Passing args round robin means each process would get MAX(max_args,
num_args/nproc). The downside to this is that there would be a bit
more latency introduced as max_args*nproc would need to be read before
starting a process, rather than just max_args. Also interleaving
arguments like this might be undesirable for other reasons?
Both these are minor issues I think. We could of course reduce
max_args to max_args/nproc to address the minor latency issue. Note
currently `find` sets a limit of 128KiB of args to each process which
could be about 2000 files for example:
$ find /usr/share/ | head -n2000 | wc -c
131337

If we did a more invasive change we could help latency a lot I think.
We could set O_NONBLOCK on stdin, and on EWOULDBLOCK, share what we
have out to the available processes and then exec. I.E. auto reduce -n
to num_args/nproc when we block. This would both result in less interleaving
of args and would mean xargs would exec the processes without delay.
This would be beneficial even without -P, like in the following example
where we wouldn't wait for all input before displaying output:
(seq 10; sleep 3; seq 10) | xargs

cheers,
Pádraig.
Paolo Bonzini
2009-11-03 12:41:06 UTC
Permalink
Post by Pádraig Brady
Post by Paolo Bonzini
seq 1 13 | xargs --parallel -P4
1 5 9 13
2 6 10
3 7 11
4 8 12
(Note there's no -n). Same for
seq 1 13 | xargs --parallel
on a 4-core machine. This is _by design_ rearranging files, so it
requires an option.
Right, you're not auto decreasing -n, but when we read all args and
we pass arguments round robin, the args will be distrubuted evenly to
each parallel process. Does this really require a new option though?
When -P is used, the arguments could be processed in any order anyway.
Especially since -P is not POSIX, I'd say no. Good catch.

Paolo
Paolo Bonzini
2009-10-27 19:00:54 UTC
Permalink
Post by Pádraig Brady
Post by Paolo Bonzini
I was thinking of an additional option that would automatically decrease
-n so that the requested number of processes is started (then of course
the load may not be well balanced).
$ yes . | head -n13 | xargs -n4 -P2
. . . .
. . . .
. . . .
.
$ yes . | head -n13 | xargs -n4 -P2
. . . .
. . . .
. . .
. .
No, more like

seq 1 13 | xargs --parallel -P4
1 5 9 13
2 6 10
3 7 11
4 8 12

(Note there's no -n). Same for

seq 1 13 | xargs --parallel

on a 4-core machine. This is _by design_ rearranging files, so it
requires an option.

Paolo
Pádraig Brady
2009-11-03 11:35:05 UTC
Permalink
Post by Pádraig Brady
Post by Paolo Bonzini
Maybe we want a --parallel option (too bad -p is taken) for xargs that
forces the creation of the number of processes passed with -P or taken
from nproc (for example by starting "md5sum $1 $5 $9 ...", "md5sum $2 $6
$10 ...", etc.)?
That would be an interesting alternative to this core-count proposal...
I'm not sure what you mean here.
I already suggested to the xargs maintainer that `xargs -P`
should be equivalent to xargs -P$(nproc).
`nproc` as an external command would still be useful though.
Here's a patch for that.
It needs to be updated to reference the new gnulib
when Bruno's nproc update hits gnulib.

cheers,
Pádraig.
Ralf Wildenhues
2009-11-03 22:11:33 UTC
Permalink
--- a/doc/find.texi
+++ b/doc/find.texi
@@ -3521,6 +3521,15 @@ Use at most @var{max-args} arguments per command line. Fewer than
With this, `xargs -P0' is ambiguous, and could be `-0 -P' or
`--max-procs=0'.

Cheers,
Ralf
Pádraig Brady
2009-11-04 00:24:10 UTC
Permalink
Post by Ralf Wildenhues
--- a/doc/find.texi
+++ b/doc/find.texi
@@ -3521,6 +3521,15 @@ Use at most @var{max-args} arguments per command line. Fewer than
With this, `xargs -P0' is ambiguous, and could be `-0 -P' or
`--max-procs=0'.
Ah right, optional args should be avoided if possible.
I even documented this years ago with xargs as an example :)
http://www.pixelbeat.org/talks/iitui/options.html
The existing xargs options with optional args are deprecated.

BTW, it wouldn't be ambiguous to the program, nor would it
be different than the existing meaning, but as you say,
users could mistakenly do -P0 when they meant -0P.
So I'll make the arg mandatory, but what to choose?
"n" is all I can come up with in my half awake state.
I'll sleep on it.

cheers,
Pádraig.
Paolo Bonzini
2009-11-04 00:38:33 UTC
Permalink
Post by Pádraig Brady
BTW, it wouldn't be ambiguous to the program, nor would it
be different than the existing meaning, but as you say,
users could mistakenly do -P0 when they meant -0P.
So I'll make the arg mandatory, but what to choose?
"n" is all I can come up with in my half awake state.
I'll sleep on it.
I propose that --parallel is the same as -P<num-procs>.

I would go a step further and deprecate --num-procs=NNN while making
--parallel[=NNN] the new "long" version of -P. Long options (unlike
short options) are safer when it comes to optional arguments, so
--parallel's argument could indeed be optional (while -P would keep the
mandatory argument). The name change would be needed however to have an
optional argument.

Paolo
Pádraig Brady
2009-11-04 16:38:38 UTC
Permalink
Post by Paolo Bonzini
Post by Pádraig Brady
BTW, it wouldn't be ambiguous to the program, nor would it
be different than the existing meaning, but as you say,
users could mistakenly do -P0 when they meant -0P.
So I'll make the arg mandatory, but what to choose?
"n" is all I can come up with in my half awake state.
I'll sleep on it.
I propose that --parallel is the same as -P<num-procs>.
I would go a step further and deprecate --num-procs=NNN while making
--parallel[=NNN] the new "long" version of -P. Long options (unlike
short options) are safer when it comes to optional arguments, so
--parallel's argument could indeed be optional (while -P would keep the
mandatory argument). The name change would be needed however to have an
optional argument.
Right, that would mean that `xargs --parallel 4` would give an explicit error,
while if we change the existing options to have an optional param then
existing scripts with `xargs -P 4` or `xargs --max-procs 4` would now fail.

So there's the new --parallel option above or the alternative
of adding say an "auto" param to the existing --max-procs|-P option.

1. --parallel

advantages
nicer name IMHO
simpler syntax when you want to use all available processors
disadvantages
an optional param could be very slightly confusing
not having an optional param on the -P option is confusing
it's a new option for people to notice/disregard

2. --max-procs=auto|-Pauto

advantages
symmetric less confusing params
an extension of an existing option
disadvantages
syntax is not as nice IMHO

3. --max-procs=$(nproc)

A third option is to leave this out of xargs altogether
so that one would need to do xargs -P$(nproc).
$(nproc) should be at least as available as a new xargs I think.
If we do this then I'll still send a patch that mentions
$(nproc) in the xargs docs.

I'm leaning towards 3. now given the slight awkwardness of 1. and 2.

cheers,
Pádraig.
James Youngman
2011-05-31 00:28:33 UTC
Permalink
[ CC += bug-findutils, += Paolo, -= bug-coreutils ]
Post by Pádraig Brady
Post by Pádraig Brady
Post by Paolo Bonzini
Maybe we want a --parallel option (too bad -p is taken) for xargs that
forces the creation of the number of processes passed with -P or taken
from nproc (for example by starting "md5sum $1 $5 $9 ...", "md5sum $2 $6
$10 ...", etc.)?
That would be an interesting alternative to this core-count proposal...
I'm not sure what you mean here.
I already suggested to the xargs maintainer that `xargs -P`
should be equivalent to xargs -P$(nproc).
`nproc` as an external command would still be useful though.
Here's a patch for that.
It needs to be updated to reference the new gnulib
when Bruno's nproc update hits gnulib.
I'm sorry, I totally missed this email because although it was sent to
me it got buried under a ton of gnulib stuff.

As I understand things the intent of the patch is to (optionally I
suppose) make xargs keener to keep the CPUs busy (or whatever else
we're parallelising over I guess), even if it needs to launch
processes with short command lines to do so? That is With -P N, if
there are less than N running children and xargs reads an argument it
should launch a new child process? If that's the approximate idea, I
think that there could be use cases for this.

Could we discuss this aspect on bug-findutils instead? Hence I
changed the subject and the CC list...

Thanks,
James.
Pádraig Brady
2011-05-31 09:16:33 UTC
Permalink
Post by James Youngman
[ CC += bug-findutils, += Paolo, -= bug-coreutils ]
Post by Pádraig Brady
Post by Pádraig Brady
Post by Paolo Bonzini
Maybe we want a --parallel option (too bad -p is taken) for xargs that
forces the creation of the number of processes passed with -P or taken
from nproc (for example by starting "md5sum $1 $5 $9 ...", "md5sum $2 $6
$10 ...", etc.)?
That would be an interesting alternative to this core-count proposal...
I'm not sure what you mean here.
I already suggested to the xargs maintainer that `xargs -P`
should be equivalent to xargs -P$(nproc).
`nproc` as an external command would still be useful though.
Here's a patch for that.
It needs to be updated to reference the new gnulib
when Bruno's nproc update hits gnulib.
I'm sorry, I totally missed this email because although it was sent to
me it got buried under a ton of gnulib stuff.
As I understand things the intent of the patch is to (optionally I
suppose) make xargs keener to keep the CPUs busy (or whatever else
we're parallelising over I guess), even if it needs to launch
processes with short command lines to do so? That is With -P N, if
there are less than N running children and xargs reads an argument it
should launch a new child process? If that's the approximate idea, I
think that there could be use cases for this.
Could we discuss this aspect on bug-findutils instead? Hence I
changed the subject and the CC list...
The patch presented was flawed due to optional parameter processing.
It was just to do support `xargs -P $(nproc)` without needing `nproc`.

Also discussed was changing distribution of input to children.
I.E. by adding a --parallel option which on a machine with 4 available
cores would:

$ seq 1 13 | xargs --parallel
1 5 9 13
2 6 10
3 7 11
4 8 12

Note -P $(nproc) is implicit in the above.
I also discussed a mechanism using non blocking reading of input,
to more efficiently split the input parameters across children.

memory definitely hazy on this one :)

cheers,
Pádraig.

Bruno Haible
2009-10-27 23:16:39 UTC
Permalink
Post by Pádraig Brady
Post by Pádraig Brady
Post by Paolo Bonzini
Of course this should only apply if its effect is not externally
observable; if I have a very small file B and a very large file A, and I
can get
$ md5sum --threads A B
abcdabcdabcdabcdabcdabcdabcdabcd B
12341234123412341234123412341234 A
Then the option would be necessary.
Good point Paolo.
I think that's another argument for splitting separate files to different threads.
Grr. An argument for _not_ splitting.
Huh? You can make md5sum handle each file in parallel and still present the
output in the order given on the command line. To achieve this, each thread
will process one file and submit the result to the main thread before exiting.
The main thread will collect the results and output the result for argument
k only after the results for 1...k-1 have been collected and output.

Bruno
Giuseppe Scrivano
2009-10-27 23:38:38 UTC
Permalink
Post by Bruno Haible
Post by Pádraig Brady
Post by Pádraig Brady
Post by Paolo Bonzini
Of course this should only apply if its effect is not externally
observable; if I have a very small file B and a very large file A, and I
can get
$ md5sum --threads A B
abcdabcdabcdabcdabcdabcdabcdabcd B
12341234123412341234123412341234 A
Then the option would be necessary.
Good point Paolo.
I think that's another argument for splitting separate files to different threads.
Grr. An argument for _not_ splitting.
Huh? You can make md5sum handle each file in parallel and still present the
output in the order given on the command line. To achieve this, each thread
will process one file and submit the result to the main thread before exiting.
The main thread will collect the results and output the result for argument
k only after the results for 1...k-1 have been collected and output.
The implementation of --threads for md5sum, that I posted some days ago,
collects data for any file before flush it, still it can be improved to
flush immediately when 1..k files are ready.

Cheers,
Giuseppe
Giuseppe Scrivano
2009-10-31 16:46:25 UTC
Permalink
Hi,

I included what we have discussed into my patch. I renamed the new
program to `nproc', now it accepts two options: --available and
--installed.
By default --available is used, if --available is not know then
--installed is used.

I added another test to ensure nproc --available <= nproc --installed.

Any comment?


Cheers,
Giuseppe
Jim Meyering
2009-10-31 17:34:34 UTC
Permalink
Post by Giuseppe Scrivano
I included what we have discussed into my patch. I renamed the new
program to `nproc', now it accepts two options: --available and
--installed.
By default --available is used, if --available is not know then
--installed is used.
I added another test to ensure nproc --available <= nproc --installed.
Any comment?
Sure. That didn't apply via "git am FILE", so I applied via patch.
Please rebase against latest, before posting.

Here's a quick and superficial review.

A few changes were required for "make syntax-check" and
./configure --enable-gcc-warnings:

...
Post by Giuseppe Scrivano
+void
+usage (int status)
+{
+ if (status != EXIT_SUCCESS)
+ fprintf (stderr, _("Try `%s --help' for more information.\n"),
+ program_name);
+ else
+ {
+ printf (_("Usage: %s [OPTION]...\n"), program_name);
+ fputs (_("\
+Print the number of online cpu cores.\n\
+\n\
+"), stdout);
+ fputs (_("\
+ --available report the number of processors available to the\n \
s/report/print/
Post by Giuseppe Scrivano
+ current process.\n \
+ --installed return the number of installed processors\n\
s/return/print/
Post by Giuseppe Scrivano
+"), stdout);
+
+ fputs (HELP_OPTION_DESCRIPTION, stdout);
+ fputs (VERSION_OPTION_DESCRIPTION, stdout);
+ emit_ancillary_info ();
+ }
+ exit (status);
+}
+
+/* Compute the number of available processors. Return 0 on error. */
+
+static u_long
u_long? No thanks.
There are no other uses of that type in the coreutils.

This first was for a syntax-check violation.

diff --git a/README b/README
index 7545eab..0951b62 100644
--- a/README
+++ b/README
@@ -11,7 +11,7 @@ The programs that can be built with this package are:
csplit cut date dd df dir dircolors dirname du echo env expand expr
factor false fmt fold groups head hostid hostname id install join kill
link ln logname ls md5sum mkdir mkfifo mknod mktemp mv nice nl nohup
- od paste pathchk pinky pr printenv printf ptx pwd readlink rm rmdir
+ nproc od paste pathchk pinky pr printenv printf ptx pwd readlink rm rmdir
runcon seq sha1sum sha224sum sha256sum sha384sum sha512sum shred shuf
sleep sort split stat stdbuf stty su sum sync tac tail tee test timeout
touch tr true truncate tsort tty uname unexpand uniq unlink uptime users
diff --git a/src/nproc.c b/src/nproc.c
index 2449fc9..fbf2e59 100644
--- a/src/nproc.c
+++ b/src/nproc.c
@@ -73,10 +73,10 @@ Print the number of online cpu cores.\n\

/* Compute the number of available processors. Return 0 on error. */

-static u_long
-nproc_available ()
+static unsigned long
+nproc_available (void)
{
- u_long nproc = 0;
+ unsigned long nproc = 0;

#ifdef CPU_SETSIZE
size_t j;
@@ -100,12 +100,12 @@ nproc_available ()
can't be computed rollback to the installed processors. The result is
guaranteed to be at least 1. */

-static u_long
+static unsigned long
nproc (bool available)
{
if (available)
{
- u_long available_proc = nproc_available ();
+ unsigned long available_proc = nproc_available ();
if (available_proc)
return available_proc;
}
@@ -149,7 +149,7 @@ main (int argc, char **argv)
}
}

- printf ("%u\n", nproc (available));
+ printf ("%lu\n", nproc (available));

exit (EXIT_SUCCESS);
}
Giuseppe Scrivano
2009-10-31 19:46:00 UTC
Permalink
Hi Jim,

thanks for your quick review.
Post by Jim Meyering
Post by Giuseppe Scrivano
I included what we have discussed into my patch. I renamed the new
program to `nproc', now it accepts two options: --available and
--installed.
By default --available is used, if --available is not know then
--installed is used.
I added another test to ensure nproc --available <= nproc --installed.
Any comment?
Sure. That didn't apply via "git am FILE", so I applied via patch.
Please rebase against latest, before posting.
I promise to be more careful next time, though it is a good thing that
NEWS was changed in the last week :-)
Post by Jim Meyering
Here's a quick and superficial review.
Sorry, I had to catch these problems before post.


Regards,
Giuseppe
Jim Meyering
2009-10-31 23:34:40 UTC
Permalink
Giuseppe Scrivano wrote:
...
Pádraig Brady
2009-11-01 00:41:41 UTC
Permalink
Thanks for continuing with this.
I'm not sure we agreed on the name but I like nproc at least :)
+Print the number of processors available to the current process. It
+may be less than the number of installed processors.
+If this information is not accessible, then nproc returs the number of
s/returs/returns/
+installed processors. By default --available is used.
+
+Print the number of installed processors.
I went digging for what --available and --installed correspond to..
I thought first that they were respectively:

$ getconf -a | grep NPROC
_NPROCESSORS_CONF 1
_NPROCESSORS_ONLN 1

num_processors() already uses _NPROCESSORS_ONLN (online processors)
so I then wondered how this be different to that returned by
pthread_getaffinity_np() ?

A quick google for cpuset shows:
http://www.kernel.org/doc/man-pages/online/pages/man7/cpuset.7.html

Also this is what sysconf seems to query for the variables above:
$ strace -e open getconf _NPROCESSORS_ONLN
open("/proc/stat"
$ strace -e open getconf _NPROCESSORS_CONF
open("/sys/devices/system/cpu"

So looking at the /proc/stat code:
http://lxr.linux.no/#linux+v2.6.31/fs/proc/stat.c
Shows it calls for_each_online_cpu()
Which according to the following is each CPU available to scheduler:
http://lxr.linux.no/#linux+v2.6.31/include/linux/cpumask.h#L451
However that's system wide and a particular process
could be in a smaller cpuset.

pthread_getaffinity_np instead calls sched_getaffinity which
can return a smaller set as seen here:
http://lxr.linux.no/#linux+v2.6.31/kernel/sched.c#L6484

So in summary, this is a cool addition Giuseppe!

I do wonder though whether it would be better
to have num_processors() try to return this by default?
I.E. can you think of a use case where someone would
want to use the --installed option? BTW "installed"
differs from the terms discussed here, and perhaps
"online" would be better (if we did want to expose it at all).

Also I'm wondering why you used the pthread interface to this?
I didn't notice pthread_getaffinity_np() in POSIX for example
(is that what the _np represents?), so why not call sched_getaffinity
directly without needing to link with the pthread library.
Giuseppe Scrivano
2009-11-01 02:40:14 UTC
Permalink
Hi Pádraig,
Post by Pádraig Brady
I do wonder though whether it would be better
to have num_processors() try to return this by default?
num_processors is going to be used by programs as nproc will be used by
scripts; all considerations we made for nproc can be applied to
num_processors.
Post by Pádraig Brady
I.E. can you think of a use case where someone would
want to use the --installed option? BTW "installed"
differs from the terms discussed here, and perhaps
"online" would be better (if we did want to expose it at all).
I don't see any common use case, except a quick way to see how many
processors can potentially be available.
Post by Pádraig Brady
Also I'm wondering why you used the pthread interface to this?
I didn't notice pthread_getaffinity_np() in POSIX for example
(is that what the _np represents?), so why not call sched_getaffinity
directly without needing to link with the pthread library.
Thanks, I am attaching a new version that uses sched instead of pthread.
The _np suffix means "non portable".
Post by Pádraig Brady
Also perhaps we should be comparing to /proc/stat just in case
grep '^cpu[0-9]' /proc/stat | wc -l
Pádraig Brady
2009-11-01 10:29:29 UTC
Permalink
Post by Jim Meyering
Hi Pádraig,
Post by Pádraig Brady
I do wonder though whether it would be better
to have num_processors() try to return this by default?
num_processors is going to be used by programs as nproc will be used by
scripts; all considerations we made for nproc can be applied to
num_processors.
Right. So in that case I would push the sched_getaffinity()
down into num_processors in gnulib.

Whether to drop the --installed (and consequently --available)
options to nproc, is debatable. If we leave it in then gnulib
would need a method to select between them. Let's leave it
selectable for now unless others think it's superfluous.
In any case num_processors should return the "available"
count by default.

cheers,
Pádraig.
Bruno Haible
2009-11-01 13:58:42 UTC
Permalink
Post by Pádraig Brady
num_processors() already uses _NPROCESSORS_ONLN (online processors)
so I then wondered how this be different to that returned by
pthread_getaffinity_np() ?
http://www.kernel.org/doc/man-pages/online/pages/man7/cpuset.7.html
$ strace -e open getconf _NPROCESSORS_ONLN
open("/proc/stat"
$ strace -e open getconf _NPROCESSORS_CONF
open("/sys/devices/system/cpu"
http://lxr.linux.no/#linux+v2.6.31/fs/proc/stat.c
Shows it calls for_each_online_cpu()
http://lxr.linux.no/#linux+v2.6.31/include/linux/cpumask.h#L451
However that's system wide and a particular process
could be in a smaller cpuset.
pthread_getaffinity_np instead calls sched_getaffinity which
http://lxr.linux.no/#linux+v2.6.31/kernel/sched.c#L6484
Thanks for presenting these investigations.
Post by Pádraig Brady
I do wonder though whether it would be better
to have num_processors() try to return this by default?
Certainly, yes. The implementation of omp_get_num_threads() in
GCC's libgomp does the same thing.
Post by Pádraig Brady
Also I'm wondering why you used the pthread interface to this?
I didn't notice pthread_getaffinity_np() in POSIX for example
(is that what the _np represents?), so why not call sched_getaffinity
directly without needing to link with the pthread library.
Pádraig Brady
2009-11-01 23:16:24 UTC
Permalink
Here is a proposed change to the gnulib 'nproc' module. It will
require changes (simplification) on Giuseppe's side, of course.
Wow, this is great stuff Bruno, thanks!
*** lib/nproc.c.orig 2009-11-01 14:55:37.000000000 +0100
--- lib/nproc.c 2009-11-01 14:54:52.000000000 +0100
+ if (query == NPROC_CURRENT_OVERRIDABLE)
+ {
+ /* Test the environment variable OMP_NUM_THREADS, recognized also by all
+ programs that are based on OpenMP. The OpenMP spec says that the
+ value assigned to the environment variable "may have leading and
+ trailing white space". */
+ const char *envvalue = getenv ("OMP_NUM_THREADS");
+
+ if (envvalue != NULL)
+ {
+ while (*envvalue != '\0' && c_isspace (*envvalue))
+ envvalue++;
A pedantic comment. Could one instead assume strtoul() skips leading whitespace?
+ #if HAVE_PTHREAD_AFFINITY_NP && defined __GLIBC__ && 0
+ #elif HAVE_PTHREAD_AFFINITY_NP && defined __NetBSD__ && 0
If you put the "0" first then vim at least will highlight
the section as a comment.
*** m4/nproc.m4.orig 2009-11-01 14:55:37.000000000 +0100
--- m4/nproc.m4 2009-11-01 14:31:13.000000000 +0100
! AC_CHECK_FUNCS([sched_getaffinity sched_getaffinity_np \
! pstat_getdynamic sysmp sysctl])
])
Will this result it a compile failure on glibc-2.3.[23]
where sched_getaffinity() has a different prototype?
If so it might be nice to not define HAVE_SCHED_GETAFFINITY
in that case, or maybe it's not worth worrying about?

thanks again,
Pádraig.
Bruno Haible
2009-11-02 01:11:59 UTC
Permalink
Post by Pádraig Brady
+ while (*envvalue != '\0' && c_isspace (*envvalue))
+ envvalue++;
A pedantic comment. Could one instead assume strtoul() skips leading whitespace?
But then strtoul would also skip a sign, and a value of, say, "+4" is not
allowed by the OpenMP spec.
Post by Pádraig Brady
+ #if HAVE_PTHREAD_AFFINITY_NP && defined __GLIBC__ && 0
+ #elif HAVE_PTHREAD_AFFINITY_NP && defined __NetBSD__ && 0
If you put the "0" first then vim at least will highlight
the section as a comment.
Same for kate. But the '&& 0' is the last modification that one does to
the code, to enable or disable it, that's why it is at the end. The same
style is also used in lib/fts.c and lib/wait-process.c.
Post by Pádraig Brady
*** m4/nproc.m4.orig 2009-11-01 14:55:37.000000000 +0100
--- m4/nproc.m4 2009-11-01 14:31:13.000000000 +0100
! AC_CHECK_FUNCS([sched_getaffinity sched_getaffinity_np \
! pstat_getdynamic sysmp sysctl])
])
Will this result in a compile failure on glibc-2.3.[23]
where sched_getaffinity() has a different prototype?
If so it might be nice to not define HAVE_SCHED_GETAFFINITY
in that case
Good point. I'll use HAVE_SCHED_GETAFFINITY_LIKE_GLIBC instead of
HAVE_SCHED_GETAFFINITY, defined as follows:


--- m4/nproc.m4.orig 2009-11-02 02:09:26.000000000 +0100
+++ m4/nproc.m4 2009-11-02 02:07:29.000000000 +0100
@@ -1,4 +1,4 @@
-# nproc.m4 serial 3
+# nproc.m4 serial 4
dnl Copyright (C) 2009 Free Software Foundation, Inc.
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
@@ -12,6 +12,9 @@
# Prerequisites of lib/nproc.c.
AC_DEFUN([gl_PREREQ_NPROC],
[
+ dnl Persuade glibc <sched.h> to declare CPU_SETSIZE, CPU_ISSET etc.
+ AC_REQUIRE([AC_USE_SYSTEM_EXTENSIONS])
+
AC_CHECK_HEADERS([sys/pstat.h sys/sysmp.h sys/param.h],,,
[AC_INCLUDES_DEFAULT])
dnl <sys/sysctl.h> requires <sys/param.h> on OpenBSD 4.0.
@@ -21,5 +24,30 @@
# include <sys/param.h>
#endif
])
- AC_CHECK_FUNCS([pstat_getdynamic sysmp sysctl])
+
+ AC_CHECK_FUNCS([sched_getaffinity sched_getaffinity_np \
+ pstat_getdynamic sysmp sysctl])
+
+ dnl Test whether sched_getaffinity has the expected declaration.
+ dnl glibc 2.3.[0-2]:
+ dnl int sched_getaffinity (pid_t, unsigned int, unsigned long int *);
+ dnl glibc 2.3.3:
+ dnl int sched_getaffinity (pid_t, cpu_set_t *);
+ dnl glibc >= 2.3.4:
+ dnl int sched_getaffinity (pid_t, size_t, cpu_set_t *);
+ if test $ac_cv_func_sched_getaffinity = yes; then
+ AC_CACHE_CHECK([for glibc compatible sched_getaffinity],
+ [gl_cv_func_sched_getaffinity3],
+ [AC_COMPILE_IFELSE(
+ [AC_LANG_PROGRAM(
+ [[#include <sched.h>]],
+ [[sched_getaffinity (0, 0, (cpu_set_t *) 0);]])],
+ [gl_cv_func_sched_getaffinity3=yes],
+ [gl_cv_func_sched_getaffinity3=no])
+ ])
+ if test $gl_cv_func_sched_getaffinity3 = yes; then
+ AC_DEFINE([HAVE_SCHED_GETAFFINITY_LIKE_GLIBC], [1],
+ [Define to 1 if sched_getaffinity has a glibc compatible declaration.])
+ fi
+ fi
])
Bruno Haible
2009-11-04 08:12:26 UTC
Permalink
There were no further comments except Pádraig's one, so I committed the
change:

2009-11-04 Bruno Haible <***@clisp.org>

Make num_processors more flexible and consistent.
* lib/nproc.h (enum nproc_query): New type.
(num_processors): Add a 'query' argument.
* lib/nproc.c: Include <stdlib.h>, <sched.h>, c-ctype.h.
(num_processors): Add a 'query' argument. Test the value of the
OMP_NUM_THREADS environment variable if requested. On Linux, NetBSD,
mingw, count the number of CPUs available for the current process.
* m4/nproc.m4 (gl_PREREQ_NPROC): Require AC_USE_SYSTEM_EXTENSIONS.
Check for sched_getaffinity and sched_getaffinity_np.
* modules/nproc (Depends-on): Add c-ctype, extensions.
* NEWS: Mention the change.

See http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=4e4fd5b6c34d828725994b7f444e18fcf3f85589
Giuseppe Scrivano
2009-11-04 21:37:27 UTC
Permalink
Post by Bruno Haible
There were no further comments except Pádraig's one, so I committed the
Make num_processors more flexible and consistent.
* lib/nproc.h (enum nproc_query): New type.
(num_processors): Add a 'query' argument.
* lib/nproc.c: Include <stdlib.h>, <sched.h>, c-ctype.h.
(num_processors): Add a 'query' argument. Test the value of the
OMP_NUM_THREADS environment variable if requested. On Linux, NetBSD,
mingw, count the number of CPUs available for the current process.
* m4/nproc.m4 (gl_PREREQ_NPROC): Require AC_USE_SYSTEM_EXTENSIONS.
Check for sched_getaffinity and sched_getaffinity_np.
* modules/nproc (Depends-on): Add c-ctype, extensions.
* NEWS: Mention the change.
I have updated the new nproc program to use this change in gnulib.

Thanks to Bruno, now nproc has not any logic inside but it is a mere
wrapper around the gnulib module.

I used as arguments to the new program the same names used by the
`nproc_query' enum, except using --overridable instead of
--current-overridable, avoiding the same prefix allows to use the
shorter forms $(nproc --c) and $(nproc --o).


What do you think?


Giuseppe
Paolo Bonzini
2009-11-04 23:05:17 UTC
Permalink
Post by Giuseppe Scrivano
I have updated the new nproc program to use this change in gnulib.
Thanks to Bruno, now nproc has not any logic inside but it is a mere
wrapper around the gnulib module.
I used as arguments to the new program the same names used by the
`nproc_query' enum, except using --overridable instead of
--current-overridable, avoiding the same prefix allows to use the
shorter forms $(nproc --c) and $(nproc --o).
I think --overridable should be the default, and I'm not sure it should
have an option for it.

Paolo
Pádraig Brady
2009-11-04 23:19:49 UTC
Permalink
Subject: [PATCH] nproc: A new program to count the number of processors
s/number of/available/
* AUTHORS: Add my name.
* NEWS: Mention it.
* README: Likewise.
* bootstrap.conf (gnulib_modules): Add nproc.
* doc/coreutils.texi (nproc invocation): Add nproc info.
* po/POTFILES.in: Add src/nproc.c.
* src/Makefile.am (EXTRA_PROGRAMS): Add nproc.
* src/nproc.c: New file.
* man/.gitignore: Exclude nproc.1
* src/.gitignore: Exclude nproc
s/processors/available processors/
+
+
+inspection tool but a portable way to get how many processes
+potentially can be executed in parallel. The result is guaranteed to
@command{nproc} prints the number of processing units available
to the current process. The result is guaranteed to be greater than zero.
+Print the number of installed processors.
Print the number of installed processors on the system which may
be greater than is available to the current process.



I'm always wary about adding options.
How about we drop the following 2 options and just do
NPROC_CURRENT_OVERRIDABLE by default. We can document
here that we honor OMP_NUM_THREADS and in the unlikely
event it is set and we want to ignore it, one can always
do `OMP_NUM_THREADS= nproc`.
+Print the number of processors available to the current process. It
+may be less than the number of installed processors.
+If this information is not accessible, then nproc returns the number of
+installed processors. By default --current is used.
+
The other option we discussed that might be good to add
just because it would be required often and is awkward
to do in shell is --ignore (I'm not sure about the name).

@item --ignore=@var{number}
@opindex --overridable
The number of processing units to disregard if possible.

thanks!
Pádraig.
Giuseppe Scrivano
2009-11-05 00:53:49 UTC
Permalink
Pádraig and Paolo, thanks for your comments.

I agree with you, there is not need both for --overridable and
--current, since --current can be emulated passing a null
OMP_NUM_THREADS to --overridable. In the version I attached, I dropped
--overridable and moved its semantic to --current.

Also, I added a new --ignore option.

Regards,
Giuseppe
Pádraig Brady
2009-11-05 10:20:47 UTC
Permalink
Post by Giuseppe Scrivano
Pádraig and Paolo, thanks for your comments.
I agree with you, there is not need both for --overridable and
--current, since --current can be emulated passing a null
OMP_NUM_THREADS to --overridable. In the version I attached, I dropped
--overridable and moved its semantic to --current.
Also, I added a new --ignore option.
Thanks. Nearly there :)

Sorry about all the interface discussions, but interface
to coreutils commands is very important given the size of the audience
and the affect it has on all future shell scripts etc.

How about removing the --current option as it's now redundant.
A whole option to do nothing seems wrong. Also I think "available" is
more understandable than "current". Perhaps the various (future) sets
might be better as an option parameter. I.E how about:

nproc --count={all,available}

with --count=available being the default operation.

cheers,
Pádraig.
Giuseppe Scrivano
2009-11-05 11:22:49 UTC
Permalink
Hi Pádraig,
Post by Pádraig Brady
Sorry about all the interface discussions, but interface
to coreutils commands is very important given the size of the audience
and the affect it has on all future shell scripts etc.
it is fine, except the interface and the program name there was not
anything to discuss ;-)
Post by Pádraig Brady
How about removing the --current option as it's now redundant.
A whole option to do nothing seems wrong. Also I think "available" is
more understandable than "current". Perhaps the various (future) sets
nproc --count={all,available}
with --count=available being the default operation.
Right, "available" looks better than "current", I'll change it.

I would avoid this interface, as it causes longer commands while using
two separate options can be shortened by getopt. I don't think that
"--count" improves the readability (considering that nproc can't do
anything else) so much to sacrifice that possibility. What do you
think?


Regards,
Giuseppe
Pádraig Brady
2009-11-05 12:08:27 UTC
Permalink
Post by Jim Meyering
Hi Pádraig,
Post by Pádraig Brady
How about removing the --current option as it's now redundant.
A whole option to do nothing seems wrong. Also I think "available" is
more understandable than "current". Perhaps the various (future) sets
nproc --count={all,available}
with --count=available being the default operation.
Right, "available" looks better than "current", I'll change it.
cool
Post by Jim Meyering
I would avoid this interface, as it causes longer commands while using
two separate options can be shortened by getopt. I don't think that
"--count" improves the readability (considering that nproc can't do
anything else) so much to sacrifice that possibility. What do you
think?
Well --available and --all are mutually exclusive and related.
That fact is obvious if they're parameters to a single option.
But I do take your point that --count is a bit redundant,
and I don't see nproc getting many other options, so OK
leave them as separate options.

I'll hope to commit this soon.

cheers,
Pádraig.
Giuseppe Scrivano
2009-11-05 22:29:25 UTC
Permalink
Hi Pádraig,
Post by Pádraig Brady
Well --available and --all are mutually exclusive and related.
That fact is obvious if they're parameters to a single option.
But I do take your point that --count is a bit redundant,
and I don't see nproc getting many other options, so OK
leave them as separate options.
I'll hope to commit this soon.
I amended this change and the bug in the texinfo documentation reported
by Paolo. I hope it is fine now.

Cheers,
Giuseppe
Pádraig Brady
2009-11-06 11:36:37 UTC
Permalink
Attached is an updated version in which I

Removed the --available option.
It just didn't seem to fit when I was documenting it
The alias argument is a bit of a stretch since they're
disabled in shell scripts and I can't see anyone aliasing
nproc anyway. Also some people may think they need to
specify --available.

Mentioned online processors in the help,
and reworded most of it.

Added a man page

Diagnose --ignore="invalid"

move tests from nproc/ to misc/

remove the cpuinfo test as that compares against
the number of online processors which nproc has no
way to determine independently at present. One could
add NPROC_ONLINE to the gnulib interface corresponding
to an --online option but I don't think that's required.
Maybe for completeness?

I inverted the "avail" test. available <= all

cheers,
Pádraig.
Jim Meyering
2009-11-06 12:22:57 UTC
Permalink
Post by Pádraig Brady
Attached is an updated version in which I
Removed the --available option.
It just didn't seem to fit when I was documenting it
The alias argument is a bit of a stretch since they're
disabled in shell scripts and I can't see anyone aliasing
nproc anyway. Also some people may think they need to
specify --available.
Mentioned online processors in the help,
and reworded most of it.
Added a man page
Diagnose --ignore="invalid"
move tests from nproc/ to misc/
remove the cpuinfo test as that compares against
the number of online processors which nproc has no
way to determine independently at present. One could
add NPROC_ONLINE to the gnulib interface corresponding
to an --online option but I don't think that's required.
...

Sounds fine to me.
Thanks again for handling this.
Post by Pádraig Brady
diff --git a/tests/misc/nproc-positive b/tests/misc/nproc-positive
...
Post by Pádraig Brady
+for i in '-1000' '0' '1' ' 2' '1000'; do
Please remove all of these single quotes. They're not needed.
Post by Pádraig Brady
+ procs=$(OMP_NUM_THREADS="$i" nproc)
Same for the double quotes around $i, since
all values are well-behaved.
Post by Pádraig Brady
+ test "$procs" -gt 0 || fail=1
+done
+
+for i in '0' ' 1' '1000'; do
+ procs=$(nproc --ignore="$i")
+ test "$procs" -gt 0 || fail=1
+done
+
+for i in '-1' 'N'; do
+ nproc --ignore="$i" && fail=1
+done
+
+Exit $fail
Giuseppe Scrivano
2009-11-05 22:55:59 UTC
Permalink
Hello,
Why have an option for the default operation at all? If --available is
the same as specifying no option and the only other mode of operation is
--all, only the --all option should be recognised. There is no need for
--available.
it is not very common case but it may be useful to override an alias:

$ OMP_NUM_THREADS=1 nproc
1

$ alias nproc="nproc --all"

$ OMP_NUM_THREADS=1 nproc
2

$ OMP_NUM_THREADS=1 nproc --available
1


Beside this one, I don't see other cases.

Cheers,
Giuseppe
Erik Auerswald
2009-11-05 17:43:39 UTC
Permalink
Hi,
Post by Pádraig Brady
Well --available and --all are mutually exclusive and related.
That fact is obvious if they're parameters to a single option.
But I do take your point that --count is a bit redundant,
and I don't see nproc getting many other options, so OK
leave them as separate options.
Why have an option for the default operation at all? If --available is
the same as specifying no option and the only other mode of operation is
--all, only the --all option should be recognised. There is no need for
--available.

Br,
Erik
Bruno Haible
2009-11-05 00:54:17 UTC
Permalink
+Print the number of installed processors.
+
+Print the number of processors available to the current process. It
+may be less than the number of installed processors.
+If this information is not accessible, then nproc returns the number of
+installed processors. By default --current is used.
+
Well, probably I didn't explain my intentions clearly. The intent of
having 3 possible behaviours of the gnulib num_processors function
was not that the nprocs commands would export these 3 behaviours to
the command line. Rather I expected that you would offer the --all
option and choose among --current and --overridable for the case when
no option is passed. The expectation was that your choice among
NPROC_CURRENT and NPROC_CURRENT_OVERRIDABLE depends on
- the majority vote among coreutils developers (I proposed to
follow OMP_NUM_THREADS but you may not all be convinced),
- security considerations, for example when executing as root
you may want to ignore the environment variable or use its
value only if it is less than the number of installed processors,
or similar considerations.

Library APIs often offer more variants and choice than is reasonable
at the command-line level :-)

Bruno
Paolo Bonzini
2009-11-05 09:24:27 UTC
Permalink
Post by Bruno Haible
- the majority vote among coreutils developers (I proposed to
follow OMP_NUM_THREADS but you may not all be convinced),
I think everybody is except possibly Jim who did not pound in.
Cut-and-pasto.

Paolo
Ludovic Courtès
2009-11-05 16:35:39 UTC
Permalink
Hi,
Post by Giuseppe Scrivano
+/* Compute the number of available processors. Return 0 on error. */
+
+static u_long
+nproc_available ()
FWIW the hwloc library (http://www.open-mpi.org/projects/hwloc/) has
code to do that portably on a number of platforms, using either
sysconf(3) or platform-specific techniques.

Thanks,
Ludo’.
Continue reading on narkive:
Loading...