There are a couple of problems with filters trying to sleep. First,
when it is time to shut down nbdkit, we wait until all pending
transactions have had a chance to wind down. But consider what
happens if one or more of those pending transactions are blocked in a
sleep. POSIX says nanosleep is interrupted with EINTR if that thread
handles a signal, but wiring up signal masks just to ensure a specific
thread will get the signal is not pretty, and that means the thread
processing SIGINT is NOT the thread blocked in nanosleep. Couple that
with the fact that if multiple threads are sleeping, only one thread
needs to process the signal, so the rest continue to sleep. Thus,
even though we know the user wants nbdkit to quit, we are blocked
waiting for a potentially long sleep to complete. This problem can be
solved by realizing we already have a pipe-to-self to learn about a
quit request or the end of a particular connection, and check for
activities on those pipes in parallel with a timeout through pselect
or ppoll to break our wait as soon as we know there is no reason to
continue on with the transaction.
Second, if we ever add support for a signal that is not fatal to
nbdkit (such as SIGHUP triggering a config reload), but don't mask
which thread can process that signal, then if a thread servicing
nanosleep gets the signal the sleep will terminate early. But in that
case, we want execution to continue on. This problem can be solved by
either setting a signal mask so as to avoid processing our non-fatal
signal in the same thread as any filter/plugin processing, or by
paying attention to EINTR results and restarting a sleep. And rather
than making each sleeping client reimplement that logic, we might as
well stick it in on our common helper.
Thus, this patch introduces nbdkit_nanosleep, and implements it under
the hood as a ppoll that redirects signals but wakes up as needed.
Including <time.h> for struct timespec in nbdkit-common.h is difficult
(C++ is not required to provide it), so we just take the two arguments
in pieces; we can rely on int being 32-bits, so don't need nsec to be
a long, and in turn can avoid EINVAL for nsec outside the
[0,1000000000) range by updating sec as needed. And sleeping longer
than INT_MAX seconds is indistinguishable from having no timeout, at
least when compared to human lifetimes, so we reject larger values.
Although it is more likely that filters will be the ones actually
trying to slow I/O (and all that uses the new API in this commit), it
is feasible that a third-party plugin may also want access to this
behavior, so I added it in common and documented it for plugins.
Note that ppoll is not yet POSIX [1], so we may have to provide
fallbacks to older interfaces. But rather than write those now, I
just punted on the compilation, and we'll deal with it when we
actually encounter platforms needing it.
[1]
http://austingroupbugs.net/view.php?id=1263
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
docs/nbdkit-plugin.pod | 28 +++++++++++++++++++
configure.ac | 1 +
include/nbdkit-common.h | 1 +
filters/delay/delay.c | 14 ++--------
filters/rate/rate.c | 10 +++----
server/public.c | 61 +++++++++++++++++++++++++++++++++++++++++
server/nbdkit.syms | 1 +
7 files changed, 99 insertions(+), 17 deletions(-)
diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod
index 9510253f..f36778cf 100644
--- a/docs/nbdkit-plugin.pod
+++ b/docs/nbdkit-plugin.pod
@@ -342,6 +342,34 @@ and returns C<NULL>.
The returned string must be freed by the caller.
+=head2 C<nbdkit_nanosleep>
+
+ int nbdkit_nanosleep (unsigned sec, unsigned nsec);
+
+The utility function C<nbdkit_nanosleep> suspends the current thread,
+and returns 0 if it slept at least as many seconds and nanoseconds as
+requested, or -1 after calling C<nbdkit_error> if there is no point in
+continuing the current command. Attempts to sleep more than
+C<INT_MAX> seconds are treated as an error.
+
+This is similar to L<nanosleep(3)>, although you specify the
+components as separate parameters rather than as a C<struct timespec>.
+This wrapper provides two benefits over the system library: in one
+direction, the system library has no easy way to abort a sleep early
+if other information determines that there is no point in finishing
+the sleep (handling a signal in the same thread as the sleep will do
+that, but you don't have full control over the signal masks of other
+threads to ensure that your thread will get the intended interrupting
+signal). In the other direction, the system library has no easy way
+to avoid aborting a sleep early (you can restart the sleep with any
+remaining unslept time, but calculating this gets tedious; or you can
+use signal masks to avoid handling a signal, but risk making your
+thread non-responsive to signals that were important after all,
+stalling a timely shutdown of nbdkit). The system call L<ppoll(2)>
+can solve these issues, but requires access to internal file
+descriptors that the plugin does not need access to, hence this
+function exists to do the work on your behalf.
+
=head2 umask
All plugins will see a L<umask(2)> of C<0022>.
diff --git a/configure.ac b/configure.ac
index 054ad64a..2a34bf8a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -198,6 +198,7 @@ AC_CHECK_FUNCS([\
get_current_dir_name \
mkostemp \
pipe2 \
+ ppoll \
posix_fadvise])
dnl Check whether printf("%m") works
diff --git a/include/nbdkit-common.h b/include/nbdkit-common.h
index 5257d992..e004aa18 100644
--- a/include/nbdkit-common.h
+++ b/include/nbdkit-common.h
@@ -82,6 +82,7 @@ extern int64_t nbdkit_parse_size (const char *str);
extern int nbdkit_parse_bool (const char *str);
extern int nbdkit_read_password (const char *value, char **password);
extern char *nbdkit_realpath (const char *path);
+extern int nbdkit_nanosleep (unsigned sec, unsigned nsec);
struct nbdkit_extents;
extern int nbdkit_add_extent (struct nbdkit_extents *,
diff --git a/filters/delay/delay.c b/filters/delay/delay.c
index 498ab14d..c92a12d7 100644
--- a/filters/delay/delay.c
+++ b/filters/delay/delay.c
@@ -37,7 +37,6 @@
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
-#include <time.h>
#include <nbdkit-filter.h>
@@ -83,16 +82,9 @@ parse_delay (const char *key, const char *value)
static int
delay (int ms, int *err)
{
- if (ms > 0) {
- const struct timespec ts = {
- .tv_sec = ms / 1000,
- .tv_nsec = (ms % 1000) * 1000000,
- };
- if (nanosleep (&ts, NULL) == -1) {
- nbdkit_error ("nanosleep: %m");
- *err = errno;
- return -1;
- }
+ if (ms > 0 && nbdkit_nanosleep (ms / 1000, (ms % 1000) * 1000000) == -1) {
+ *err = errno;
+ return -1;
}
return 0;
}
diff --git a/filters/rate/rate.c b/filters/rate/rate.c
index dbd92ad6..dca5e9fc 100644
--- a/filters/rate/rate.c
+++ b/filters/rate/rate.c
@@ -262,12 +262,10 @@ maybe_sleep (struct bucket *bucket, pthread_mutex_t *lock, uint32_t
count,
bits = bucket_run (bucket, bits, &ts);
}
- if (bits > 0)
- if (nanosleep (&ts, NULL) == -1) {
- nbdkit_error ("nanosleep: %m");
- *err = errno;
- return -1;
- }
+ if (bits > 0 && nbdkit_nanosleep (ts.tv_sec, ts.tv_nsec) == -1) {
+ *err = errno;
+ return -1;
+ }
}
return 0;
}
diff --git a/server/public.c b/server/public.c
index 523a31f9..8f860d73 100644
--- a/server/public.c
+++ b/server/public.c
@@ -47,6 +47,8 @@
#include <limits.h>
#include <termios.h>
#include <errno.h>
+#include <poll.h>
+#include <signal.h>
#include "get-current-dir-name.h"
@@ -297,3 +299,62 @@ nbdkit_realpath (const char *path)
return ret;
}
+
+
+int
+nbdkit_nanosleep (unsigned sec, unsigned nsec)
+{
+#ifndef HAVE_PPOLL
+# error "Please port this to your platform"
+ /* Porting ideas, in order of preference:
+ * - POSIX requires pselect; it's a bit clunkier to set up the poll,
+ * but the same ability to atomically mask all signals and operate
+ * on struct timespec makes it similar to the preferred ppoll interface
+ * - calculate an end time target, then use poll in a loop on EINTR with
+ * a recalculation of the timeout to still reach the end time
+ */
+ nbdkit_error ("nbdkit_nanosleep not yet ported to systems without ppoll");
+ errno = ENOSYS;
+ return -1;
+#else
+ struct timespec ts;
+ struct connection *conn = threadlocal_get_conn ();
+ struct pollfd fds[2] = {
+ [0].fd = quit_fd,
+ [0].events = POLLIN,
+ [1].fd = conn ? conn->status_pipe[0] : -1,
+ [1].events = POLLIN,
+ };
+ sigset_t all;
+
+ if (sec >= INT_MAX - nsec / 1000000000) {
+ nbdkit_error ("sleep request is too long");
+ errno = EINVAL;
+ return -1;
+ }
+ ts.tv_sec = sec + nsec / 1000000000;
+ ts.tv_nsec = nsec % 1000000000;
+
+ /* Block all signals to this thread during the poll, so we don't
+ * have to worry about EINTR
+ */
+ if (sigfillset(&all))
+ abort ();
+ switch (ppoll (fds, 2, &ts, &all)) {
+ case -1:
+ assert (errno != EINTR);
+ nbdkit_error ("poll: %m");
+ return -1;
+ case 0:
+ return 0;
+ }
+
+ /* We don't have to read the pipe-to-self; if poll returned an
+ * event, we know the connection should be shutting down.
+ */
+ assert (quit || (conn && connection_get_status (conn) < 1));
+ nbdkit_error ("aborting sleep to shut down");
+ errno = ESHUTDOWN;
+ return -1;
+#endif
+}
diff --git a/server/nbdkit.syms b/server/nbdkit.syms
index f8eef214..2a024ed5 100644
--- a/server/nbdkit.syms
+++ b/server/nbdkit.syms
@@ -46,6 +46,7 @@
nbdkit_extents_free;
nbdkit_extents_new;
nbdkit_get_extent;
+ nbdkit_nanosleep;
nbdkit_parse_bool;
nbdkit_parse_size;
nbdkit_read_password;
--
2.20.1