On Wed, Mar 15, 2023 at 12:01:55PM +0100, Laszlo Ersek wrote:
While nbd_internal_fork_safe_perror() must indeed call write(), and
arguably justifiedly ignores the return value of write(), we can still
make the write operations slightly more robust and convenient. Let's do
that by introducing xwritel():
- Let the caller pass a list of NUL-terminated strings, via stdarg /
ellipsis notation in the parameter list.
- Handle partial writes.
- Cope with EINTR and EAGAIN errors. (A busy loop around EAGAIN on a
non-blocking file is not great in the general case, but it's good enough
for writing out diagnostics before giving up.)
- In the common case, handle an nbd_internal_fork_safe_perror() call with
a single xwritel() -> writev() call chain, rather than with four
separate write() calls. In practice, this tends to make the error
message written to a regular file contiguous, even if other threads are
writing to the same file. Multiple separate write() calls tend to
interleave chunks of data from different threads.
As a side bonus, remove the path in nbd_internal_fork_safe_perror() where
at least one of the first two write() syscalls fails, and overwrites
"errno", before we get to formatting the error string from "errno".
All nice benefits, even if we don't normally exercise the code.
Thanks to Eric Blake for helping me understand the scope of Austin Group
bug reports.
Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
---
Notes:
v4:
- Rework with <stdarg.h> and writev().
- Don't split the output into chunks of SSIZE_MAX bytes. In v3, the goal
of that chunking was to avoid implementation-defined behavior.
However, POSIX requires writev() to fail cleanly when more than
SSIZE_MAX bytes would be transfered in a single call. Hence the
original goal (avoiding implementation-defined behavior) is ensured
simply by switching to writev(). The SSIZE_MAX limit is not expected
to be hit in practice (_POSIX_SSIZE_MAX is 32767).
Concur.
- As a "bonus", remove the pre-patch possibility to trample
"errno"
before formatting the error string.
Nice find; and one I missed in my earlier review.
- Refresh the commit message.
- The "contiguous output" from a single xwritel() -> writev() call (as
opposed to the "interleaved output" from multiple xwrite() -> write()
calls in v3) is easily testable in practice (on my end) with the
following small patch, even though this "contiguity" is of course not
guaranteed:
> diff --git a/generator/states-connect-socket-activation.c
b/generator/states-connect-socket-activation.c
> index e4b3b565ae2e..c66c638d331f 100644
> --- a/generator/states-connect-socket-activation.c
> +++ b/generator/states-connect-socket-activation.c
> @@ -179,6 +179,8 @@ CONNECT_SA.START:
> * socket).
> */
> int flags = fcntl (s, F_GETFD, 0);
> + flags = -1;
> + errno = EBADF;
> if (flags == -1) {
> nbd_internal_fork_safe_perror ("fcntl: F_GETFD");
> _exit (126);
It results in the following snippet in
"tests/connect-systemd-socket-activation.log":
> libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition:
CONNECT.CONNECTING -> MAGIC.START
> fcntl: F_GETFD: Bad file descriptor
> libnbd: debug: nbd1: nbd_connect_systemd_socket_activation: transition:
MAGIC.START -> MAGIC.RECV_MAGIC
Note that the child process's output is on an isolated line.
Without the patch, it had a high likelihood (but not guarantee) of
interleaving. And writev() is not a bulletproof avoidance of
interleaving (if we hit a short write and retry the tail, interleaving
is possible) - but we are very unlikely to see that in practice.
- Do not pick up R-b tags from Eric and Rich due to significant changes
in v4.
context:-U5
lib/utils.c | 87 +++++++++++++++++++-
1 file changed, 83 insertions(+), 4 deletions(-)
diff --git a/lib/utils.c b/lib/utils.c
index 6df4f14ce9f4..62b4bfdda5c3 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -23,11 +23,14 @@
#include <string.h>
#include <unistd.h>
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
+#include <stdarg.h>
+#include <sys/uio.h>
+#include "array-size.h"
#include "minmax.h"
#include "internal.h"
void
@@ -179,33 +182,109 @@ nbd_internal_fork_safe_itoa (long v, char *buf, size_t bufsize)
#if defined (__GNUC__)
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-result"
#endif
+/* "Best effort" function for writing out a list of NUL-terminated strings to
a
+ * file descriptor (without the NUL-terminators). The list is terminated with
+ * (char *)NULL. Partial writes, and EINTR and EAGAIN failures are handled
+ * internally. No value is returned; only call this function for writing
+ * diagnostic data on error paths, when giving up on a higher-level action
+ * anyway.
+ *
+ * No more than 16 strings, excluding the NULL terminator, will be written. (As
+ * of POSIX Issue 7 + TC2, _XOPEN_IOV_MAX is 16.)
+ *
+ * The function is supposed to remain async-signal-safe.
+ *
+ * (The va_*() macros, while not marked async-signal-safe in Issue 7 + TC2, are
+ * considered such, per <
https://www.austingroupbugs.net/view.php?id=711>, which
+ * is binding for Issue 7 implementations via the Interpretations Track.
+ *
+ * Furthermore, writev(), while also not marked async-signal-safe in Issue 7 +
+ * TC2, is considered such, per
+ * <
https://www.austingroupbugs.net/view.php?id=1455>, which is slated for
+ * inclusion in Issue 7 TC3 (if there's going to be a TC3), and in Issue 8.)
Correct summary, and matches our off-list collaboration on determining
what the Austin Group guarantees (while still waiting for their
release of POSIX Issue 8 later this year).
+ */
+static void
+xwritel (int fildes, ...)
+{
Good candidate for __attribute__ ((sentinel)), so that -Wformat marks
callers that fail to supply a trailing NULL argument.
+ /* Open-code the current value of _XOPEN_IOV_MAX, in order to
contain stack
+ * footprint, should _XOPEN_IOV_MAX grow in the future.
+ */
+ struct iovec iovec[16], *filled, *end, *pos;
+ va_list ap;
+ char *arg;
+
+ /* Translate the variable argument list to IO vectors. Note that we cast away
+ * const-ness intentionally.
+ */
+ filled = iovec;
+ end = iovec + ARRAY_SIZE (iovec);
+ va_start (ap, fildes);
+ while (filled < end && (arg = va_arg (ap, char *)) != NULL)
+ *filled++ = (struct iovec){ .iov_base = arg, .iov_len = strlen (arg) };
+ va_end (ap);
+
+ /* Write out the IO vectors. */
+ pos = iovec;
+ while (pos < filled) {
+ ssize_t written;
+
+ /* Skip any empty vectors at the front. */
+ if (pos->iov_len == 0) {
+ ++pos;
+ continue;
+ }
In practice, writev() will handle 0-length iovs on all file types; but
I agree with your effort to skip them since POSIX only guarantees
behavior on regular files (and our typical fd of stderr is not always
a regular file).
+
+ /* Write out the vectors. */
+ do
+ written = writev (fildes, pos, filled - pos);
+ while (written == -1 && (errno == EINTR || errno == EAGAIN));
+
+ if (written == -1)
+ return;
+
+ /* Consume the vectors that have been written out (fully, or in part). Note
+ * that "written" is positive here.
The POSIX wording is a bit tricky to read in this regards, but I think
you are correct that write() (and therefore writev()) will never
return 0 if passed a non-zero length on input: either a short write
happens because of a signal before anything is written (return is -1,
errno is EINTR or EAGAIN), or the short write occurred after partial
write (return must be positive); the only time return can be 0 is if
length was 0 but we don't have that issue.
+ */
+ do {
+ size_t advance;
+
+ advance = MIN (written, pos->iov_len);
+ /* Note that "advance" is positive here iff "pos->iov_len"
is positive. */
+ pos->iov_base = (char *)pos->iov_base + advance;
+ pos->iov_len -= advance;
+ written -= advance;
+
+ /* At least one of "written" and "pos->iov_len" is zero
here. */
+ if (pos->iov_len == 0)
+ ++pos;
+ } while (written > 0);
+ }
+}
Nice! Took me more than one pass to fully understand it, but I agree
that your documented loop invariants hold, and that it does indeed do
a best effort vectored write.
+
/* Fork-safe version of perror. ONLY use this after fork and before
* exec, the rest of the time use set_error().
*/
void
nbd_internal_fork_safe_perror (const char *s)
{
const int err = errno;
const char *m = NULL;
char buf[32];
- write (2, s, strlen (s));
- write (2, ": ", 2);
#ifdef HAVE_STRERRORDESC_NP
m = strerrordesc_np (errno);
#else
#if HAVE_SYS_ERRLIST /* NB Don't use #ifdef */
m = errno >= 0 && errno < sys_nerr ? sys_errlist[errno] : NULL;
#endif
#endif
if (!m)
m = nbd_internal_fork_safe_itoa ((long) errno, buf, sizeof buf);
The bonus bug you fixed here could be independently fixed by
s/errno/err/ without the rest of your patch, but now that nothing else
touches errno prior to assigning to m, I don't see the need to make
that change now.
- write (2, m, strlen (m));
- write (2, "\n", 1);
+ xwritel (STDERR_FILENO, s, ": ", m, "\n", (char *)NULL);
I also like the change of s/2/STDERR_FILENO/ that you snuck in here.
The only change I recommend is the addition of the __attribute__; but
with or without it, I'm happy with:
Reviewed-by: Eric Blake <eblake(a)redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org