[PATCH libnbd v4] lib/errors.c: Fix assert fail in exit path in multi-threaded code
by Richard W.M. Jones
When a highly multi-threaded program such as nbdcopy encounters an
error, there is a race condition in the library which can cause an
assertion failure and thus a core dump:
(1) An error occurs on one of the threads. nbdcopy calls exit(3).
(2) In lib/errors.c, the destructor calls pthread_key_delete.
(3) Another thread which is still running also encounters an error,
and inside libnbd the library calls set_error().
(4) The call to set_error() calls pthread_getspecific which returns
NULL (since the key has already been destroyed in step (2)), and this
causes us to call pthread_setspecific which returns EINVAL because
glibc is able to detect invalid use of a pthread_key_t after it has
been destroyed. In any case, the error message is lost, and any
subsequent call to nbd_get_error() will return NULL.
(5) We enter the %DEAD state, where there is an assertion that
nbd_get_error() is not NULL. This assertion is supposed to be for
checking that the code called set_error() before entering the %DEAD
state. Although we did call set_error(), the error message was lost
at step (4) and nbd_get_error() will always return NULL. This
assertion failure causes a crash.
There aren't any good ways to fix this, so I chose the same method as
used by libvirt in this commit:
https://gitlab.com/libvirt/libvirt/-/commit/8e44e5593eb9b89fbc0b54fde15f1...
(a) Use '-z nodelete' to prevent the library from being unloaded on
dlclose().
(b) Do not call pthread_key_destroy (thus leaking the key).
(c) When threads exit they are still able to call free_errors_key
because it is still present in memory.
Thanks: Daniel P. Berrangé, Eric Blake
---
configure.ac | 9 +++++++++
lib/Makefile.am | 1 +
lib/errors.c | 6 +++++-
3 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/configure.ac b/configure.ac
index b6d60c3df6..f495926eff 100644
--- a/configure.ac
+++ b/configure.ac
@@ -607,6 +607,14 @@ AS_IF([test "x$enable_golang" != "xno"],[
],[GOLANG=no])
AM_CONDITIONAL([HAVE_GOLANG],[test "x$GOLANG" != "xno"])
+AC_MSG_CHECKING([for how to mark DSO non-deletable at runtime])
+NODELETE=
+`$LD --help 2>&1 | grep -- "-z nodelete" >/dev/null` && \
+ NODELETE="-Wl,-z -Wl,nodelete"
+AC_MSG_RESULT([$NODELETE])
+AC_SUBST([NODELETE])
+
+AC_MSG_CHECKING([for how to set DSO symbol versions])
case $host_os in
darwin*)
VERSION_SCRIPT=
@@ -615,6 +623,7 @@ case $host_os in
VERSION_SCRIPT="-Wl,--version-script=${srcdir}/libnbd.syms"
;;
esac
+AC_MSG_RESULT([$VERSION_SCRIPT])
AC_SUBST([VERSION_SCRIPT])
dnl Produce output files.
diff --git a/lib/Makefile.am b/lib/Makefile.am
index 52b525819b..c886be7da0 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -78,6 +78,7 @@ libnbd_la_LIBADD = \
$(NULL)
libnbd_la_LDFLAGS = \
$(PTHREAD_LIBS) \
+ $(NODELETE) \
$(VERSION_SCRIPT) \
-version-info 0:0:0 \
$(NULL)
diff --git a/lib/errors.c b/lib/errors.c
index 8b77650ef3..6fbfaacd34 100644
--- a/lib/errors.c
+++ b/lib/errors.c
@@ -69,7 +69,11 @@ errors_key_destroy (void)
free (last_error->error);
free (last_error);
}
- pthread_key_delete (errors_key);
+
+ /* We could do this, but that causes a race condition described here:
+ * https://listman.redhat.com/archives/libguestfs/2023-March/031002.html
+ */
+ //pthread_key_delete (errors_key);
}
/* This is called when a thread exits, to free the thread-local data
--
2.39.2
1 year, 9 months
[PATCH libnbd v3] lib/errors.c: Fix assert fail in exit path in multi-threaded code
by Richard W.M. Jones
This version simply removes the call to pthread_key_destroy. It fixes
the problem and allows us to leave the asserts alone so we can still
catch real errors.
Of course this leaks pthread_key_t in the case where you dlclose() the
library. glibc has a limit of 128 thread-specific keys (and the first
32 are somehow "fast" in a way I could quite follow), so that's a
thing.
Rich.
1 year, 9 months
[libnbd PATCH v2] lib/errors.c: Fix assert fail in exit path in multi-threaded code
by Eric Blake
When a highly multi-threaded program such as nbdcopy encounters an
error, there is a race condition in the library which can cause an
assertion failure and thus a core dump:
(1) An error occurs on one of the threads. nbdcopy calls exit(3).
(2) In lib/errors.c, the destructor calls pthread_key_delete.
(3) Another thread which is still running also encounters an error,
and inside libnbd the library calls set_error().
(4) The call to set_error() calls pthread_getspecific. This is
undefined behavior per POSIX, since the key has already been destroyed
in step (2), but glibc handles it by returning NULL with an EINVAL
error (POSIX recommends, but can't mandate, that course of action for
technical reasons). In any case, the error message is lost, and any
subsequent call to nbd_get_error() will return NULL.
(5) We enter the %DEAD state, where there is an assertion that
nbd_get_error() is not NULL. This assertion is supposed to be for
checking that the code called set_error() before entering the %DEAD
state. Although we did call set_error(), the error message was lost
at step (4) and nbd_get_error() will always return NULL. This
assertion failure causes a crash.
There aren't any good ways to fix this. We have proven that the
assertions are too tight (it IS possible for one thread's first use of
libnbd API, which causes the allocation of a thread-local error
context, to occur after another thread has already triggered the
destructor via exit(3)), so remove those. Meanwhile, since POSIX says
any use of a destroyed key is undefined, add some atomic bookkeeping
such that we track a reference count of how many threads have
allocated an error context and subsequently had the per-thread data
destructor called. Only call pthread_key_delete() if all threads had
the chance to exit; leaking small amounts of memory is preferable to
undefined behavior.
Affected tests that now produce the new leak message:
- copy/copy-nbd-error.sh (known issue - we probably want to improve
nbdcopy to gracefully issue NBD_CMD_DISC on error, rather than
outright calling exit(), to work around the assertion failure in
nbdkit 1.32.5)
- various fuse/* (not sure if we can address that; cleaning up FUSE is
tricky)
This reverts a small part of commit 831cbf7ee14c ("generator: Allow
DEAD state actions to run").
Thanks: Richard W.M. Jones <rjones(a)redhat.com>
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
Taking Rich's analysis and weakening of the assertions, but tweaking
the errors.c to avoid undefined behavior.
generator/state_machine_generator.ml | 5 +--
generator/states.c | 1 -
lib/errors.c | 50 ++++++++++++++++++----------
3 files changed, 34 insertions(+), 22 deletions(-)
diff --git a/generator/state_machine_generator.ml b/generator/state_machine_generator.ml
index 43c1ce4c..9f94bed1 100644
--- a/generator/state_machine_generator.ml
+++ b/generator/state_machine_generator.ml
@@ -449,10 +449,7 @@ let
pr " abort (); /* Should never happen, but keeps GCC happy. */\n";
pr " }\n";
pr "\n";
- pr " if (r == -1) {\n";
- pr " assert (nbd_get_error () != NULL);\n";
- pr " return -1;\n";
- pr " }\n";
+ pr " if (r == -1) return -1;\n";
pr " } while (!blocked);\n";
pr " return 0;\n";
pr "}\n";
diff --git a/generator/states.c b/generator/states.c
index fa0f8d71..58a896ca 100644
--- a/generator/states.c
+++ b/generator/states.c
@@ -192,7 +192,6 @@ STATE_MACHINE {
DEAD:
/* The caller should have used set_error() before reaching here */
- assert (nbd_get_error ());
abort_option (h);
nbd_internal_abort_commands (h, &h->cmds_to_issue);
nbd_internal_abort_commands (h, &h->cmds_in_flight);
diff --git a/lib/errors.c b/lib/errors.c
index 8b77650e..133c752b 100644
--- a/lib/errors.c
+++ b/lib/errors.c
@@ -34,6 +34,8 @@ struct last_error {
/* Thread-local storage of the last error. */
static pthread_key_t errors_key;
+/* Zero if errors_key is invalid, else 1 + threads using errors_key. */
+static _Atomic int key_use_count;
static void free_errors_key (void *vp) LIBNBD_ATTRIBUTE_NONNULL (1);
@@ -51,6 +53,7 @@ errors_key_create (void)
strerror (err));
abort ();
}
+ key_use_count++;
}
/* Destroy the thread-local key when the library is unloaded. */
@@ -59,16 +62,16 @@ static void errors_key_destroy (void) __attribute__ ((destructor));
static void
errors_key_destroy (void)
{
- struct last_error *last_error = pthread_getspecific (errors_key);
-
/* "No destructor functions shall be invoked by
* pthread_key_delete(). Any destructor function that may have been
* associated with key shall no longer be called upon thread exit."
+ * If any threads were still using the key, the memory they allocated
+ * will be leaked; this is unusual enough to diagnose.
*/
- if (last_error != NULL) {
- free (last_error->error);
- free (last_error);
- }
+ if (key_use_count > 1)
+ fprintf (stderr, "%s: %s: %d threads leaking thread-local data\n",
+ "libnbd", "errors_key_destroy", key_use_count - 1);
+ key_use_count = 0;
pthread_key_delete (errors_key);
}
@@ -80,6 +83,8 @@ free_errors_key (void *vp)
{
struct last_error *last_error = vp;
+ assert (key_use_count > 1);
+ key_use_count--;
free (last_error->error);
free (last_error);
}
@@ -87,16 +92,23 @@ free_errors_key (void *vp)
static struct last_error *
allocate_last_error_on_demand (void)
{
- struct last_error *last_error = pthread_getspecific (errors_key);
+ struct last_error *last_error = NULL;
- if (!last_error) {
- last_error = calloc (1, sizeof *last_error);
- if (last_error) {
- int err = pthread_setspecific (errors_key, last_error);
- if (err != 0) {
- /* This is not supposed to happen (XXX). */
- fprintf (stderr, "%s: %s: %s\n", "libnbd", "pthread_setspecific",
- strerror (err));
+ if (key_use_count) {
+ last_error = pthread_getspecific (errors_key);
+ if (!last_error) {
+ last_error = calloc (1, sizeof *last_error);
+ if (last_error) {
+ int err = pthread_setspecific (errors_key, last_error);
+ key_use_count++;
+ if (err != 0) {
+ /* This is not supposed to happen (XXX). */
+ fprintf (stderr, "%s: %s: %s\n", "libnbd", "pthread_setspecific",
+ strerror (err));
+ free (last_error);
+ last_error = NULL;
+ key_use_count--;
+ }
}
}
}
@@ -146,8 +158,10 @@ nbd_internal_get_error_context (void)
const char *
nbd_get_error (void)
{
- struct last_error *last_error = pthread_getspecific (errors_key);
+ struct last_error *last_error = NULL;
+ if (key_use_count)
+ last_error = pthread_getspecific (errors_key);
if (!last_error)
return NULL;
return last_error->error;
@@ -156,8 +170,10 @@ nbd_get_error (void)
int
nbd_get_errno (void)
{
- struct last_error *last_error = pthread_getspecific (errors_key);
+ struct last_error *last_error = NULL;
+ if (key_use_count)
+ last_error = pthread_getspecific (errors_key);
if (!last_error)
return 0;
return last_error->errnum;
--
2.39.2
1 year, 9 months
[PATCH libnbd] lib/errors.c: Fix assert fail in exit path in multi-threaded code
by Richard W.M. Jones
When a highly multi-threaded program such as nbdcopy encounters an
error, there is a race condition in the library which can cause an
assertion failure and thus a core dump:
(1) An error occurs on one of the threads. nbdcopy calls exit(3).
(2) In lib/errors.c, the destructor calls pthread_key_delete.
(3) Another thread which is still running also encounters an error,
and inside libnbd the library calls set_error().
(4) The call to set_error() calls pthread_getspecific which returns
NULL (since the key has already been destroyed in step (2)), and this
causes us to call pthread_setspecific which returns EINVAL because
glibc is able to detect invalid use of a pthread_key_t after it has
been destroyed. In any case, the error message is lost, and any
subsequent call to nbd_get_error() will return NULL.
(5) We enter the %DEAD state, where there is an assertion that
nbd_get_error() is not NULL. This assertion is supposed to be for
checking that the code called set_error() before entering the %DEAD
state. Although we did call set_error(), the error message was lost
at step (4) and nbd_get_error() will always return NULL. This
assertion failure causes a crash.
There aren't any good ways to fix this. I chose to remove the
assertions that might trigger, and ignore pthread_getspecific
returning EINVAL.
This reverts a small part of commit 831cbf7ee14c ("generator: Allow
DEAD state actions to run").
---
generator/state_machine_generator.ml | 5 +----
generator/states.c | 2 --
lib/errors.c | 5 ++++-
3 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/generator/state_machine_generator.ml b/generator/state_machine_generator.ml
index 43c1ce4c14..9f94bed1f3 100644
--- a/generator/state_machine_generator.ml
+++ b/generator/state_machine_generator.ml
@@ -449,10 +449,7 @@ let
pr " abort (); /* Should never happen, but keeps GCC happy. */\n";
pr " }\n";
pr "\n";
- pr " if (r == -1) {\n";
- pr " assert (nbd_get_error () != NULL);\n";
- pr " return -1;\n";
- pr " }\n";
+ pr " if (r == -1) return -1;\n";
pr " } while (!blocked);\n";
pr " return 0;\n";
pr "}\n";
diff --git a/generator/states.c b/generator/states.c
index fa0f8d716e..c0cf5a7f26 100644
--- a/generator/states.c
+++ b/generator/states.c
@@ -191,8 +191,6 @@ STATE_MACHINE {
return 0;
DEAD:
- /* The caller should have used set_error() before reaching here */
- assert (nbd_get_error ());
abort_option (h);
nbd_internal_abort_commands (h, &h->cmds_to_issue);
nbd_internal_abort_commands (h, &h->cmds_in_flight);
diff --git a/lib/errors.c b/lib/errors.c
index 8b77650ef3..9142a0843e 100644
--- a/lib/errors.c
+++ b/lib/errors.c
@@ -93,7 +93,10 @@ allocate_last_error_on_demand (void)
last_error = calloc (1, sizeof *last_error);
if (last_error) {
int err = pthread_setspecific (errors_key, last_error);
- if (err != 0) {
+ /* Ignore EINVAL because that can happen in a race with other
+ * threads when we are exiting.
+ */
+ if (err != 0 && err != EINVAL) {
/* This is not supposed to happen (XXX). */
fprintf (stderr, "%s: %s: %s\n", "libnbd", "pthread_setspecific",
strerror (err));
--
2.39.2
1 year, 9 months
[PATCH common] mlcustomize: Add accessors for block driver priority list
by Richard W.M. Jones
When injecting virtio-win drivers, allow the list of block drivers
that we search to be modified.
---
mlcustomize/inject_virtio_win.ml | 12 +++++++++---
mlcustomize/inject_virtio_win.mli | 10 ++++++++++
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/mlcustomize/inject_virtio_win.ml b/mlcustomize/inject_virtio_win.ml
index 4e977b3..2a7c742 100644
--- a/mlcustomize/inject_virtio_win.ml
+++ b/mlcustomize/inject_virtio_win.ml
@@ -49,6 +49,9 @@ type t = {
of libosinfo. Although this behaviour is documented, IMHO it has
always been a bad idea. We should change this in future to allow
the user to select where they want to get drivers from. XXX *)
+
+ mutable block_driver_priority : string list
+ (** List of block drivers *)
}
type block_type = Virtio_blk | IDE
@@ -107,7 +110,11 @@ and get_inspection g root =
{ g; root;
i_arch; i_major_version; i_minor_version; i_osinfo;
i_product_variant; i_windows_current_control_set; i_windows_systemroot;
- virtio_win = ""; was_set = false }
+ virtio_win = ""; was_set = false;
+ block_driver_priority = ["virtio_blk"; "vrtioblk"; "viostor"] }
+
+let get_block_driver_priority t = t.block_driver_priority
+let set_block_driver_priority t v = t.block_driver_priority <- v
let scsi_class_guid = "{4D36E97B-E325-11CE-BFC1-08002BE10318}"
let viostor_legacy_pciid = "VEN_1AF4&DEV_1001&SUBSYS_00021AF4&REV_00"
@@ -176,14 +183,13 @@ let rec inject_virtio_win_drivers ({ g } as t) reg =
else (
(* Can we install the block driver? *)
let block : block_type =
- let filenames = ["virtio_blk"; "vrtioblk"; "viostor"] in
let viostor_driver = try (
Some (
List.find (
fun driver_file ->
let source = driverdir // driver_file ^ ".sys" in
g#exists source
- ) filenames
+ ) t.block_driver_priority
)
) with Not_found -> None in
match viostor_driver with
diff --git a/mlcustomize/inject_virtio_win.mli b/mlcustomize/inject_virtio_win.mli
index 0ced02e..7bd9b9f 100644
--- a/mlcustomize/inject_virtio_win.mli
+++ b/mlcustomize/inject_virtio_win.mli
@@ -64,6 +64,16 @@ val from_environment : Guestfs.guestfs -> string -> string -> t
This should only be used by [virt-v2v] and is considered a legacy method. *)
+val get_block_driver_priority : t -> string list
+val set_block_driver_priority : t -> string list -> unit
+(** Get or set the current block driver priority list. This is
+ a list of virtio-win block driver names (eg. ["viostor"]) that
+ we search until we come to the first [name ^ ".sys"] that
+ we find, and that is the block driver which gets installed.
+
+ This module contains a default priority list which should
+ be suitable for most use cases. *)
+
val inject_virtio_win_drivers : t -> Registry.t -> virtio_win_installed
(** [inject_virtio_win_drivers t reg]
installs virtio drivers from the driver directory or driver
--
2.39.2
1 year, 9 months
[PATCH] docs: Prefer 'cookie' over 'handle'
by Eric Blake
In libnbd, we quickly learned that distinguishing between 'handle'
(verb for acting on an object) and 'handle' (noun describing which
object to act on) could get confusing; we solved it by renaming the
latter to 'cookie'. Copy that approach into the NBD spec, and make it
obvious that a cookie is opaque data from the point of view of the
server. Makes no difference to implementations (other than older code
still using 'handle' may be slightly harder to tie back to the spec).
Suggested-by: Richard W.M. Jones <rjones(a)redhat.com>
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
doc/proto.md | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/doc/proto.md b/doc/proto.md
index 3a96703..abb23e8 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -301,11 +301,11 @@ may be handled by the server asynchronously), and structured reply
chunks from one request may be interleaved with reply messages from
other requests; however, there may be constraints that prevent
arbitrary reordering of structured reply chunks within a given reply.
-Clients SHOULD use a handle that is distinct from all other currently
-pending transactions, but MAY reuse handles that are no longer in
-flight; handles need not be consecutive. In each reply message
+Clients SHOULD use a cookie that is distinct from all other currently
+pending transactions, but MAY reuse cookies that are no longer in
+flight; cookies need not be consecutive. In each reply message
(whether simple or structured), the server MUST use the same value for
-handle as was sent by the client in the corresponding request. In
+cookie as was sent by the client in the corresponding request. In
this way, the client can correlate which request is receiving a
response.
@@ -349,7 +349,7 @@ The request message, sent by the client, looks as follows:
C: 32 bits, 0x25609513, magic (`NBD_REQUEST_MAGIC`)
C: 16 bits, command flags
C: 16 bits, type
-C: 64 bits, handle
+C: 64 bits, cookie
C: 64 bits, offset (unsigned)
C: 32 bits, length (unsigned)
C: (*length* bytes of data if the request is of type `NBD_CMD_WRITE`)
@@ -366,7 +366,7 @@ follows:
S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
`NBD_REPLY_MAGIC`)
S: 32 bits, error (MAY be zero)
-S: 64 bits, handle
+S: 64 bits, cookie
S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
*error* is zero)
@@ -381,7 +381,7 @@ server must initiate a hard disconnect). Second, there is no way to
efficiently skip over portions of a sparse file that are known to
contain all zeroes. Finally, it is not possible to reliably decode
the server traffic without also having context of what pending read
-requests were sent by the client, to see which *handle* values will
+requests were sent by the client, to see which *cookie* values will
have accompanying payload on success. Therefore structured replies
are also permitted if negotiated.
@@ -398,7 +398,7 @@ sending errors via a structured reply, as the error can then be
accompanied by a string payload to present to a human user.
A structured reply MAY occupy multiple structured chunk messages
-(all with the same value for "handle"), and the
+(all with the same value for "cookie"), and the
`NBD_REPLY_FLAG_DONE` reply flag is used to identify the final
chunk. Unless further documented by individual requests below,
the chunks MAY be sent in any order, except that the chunk with
@@ -418,7 +418,7 @@ A structured reply chunk message looks as follows:
S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
S: 16 bits, flags
S: 16 bits, type
-S: 64 bits, handle
+S: 64 bits, cookie
S: 32 bits, length of payload (unsigned)
S: *length* bytes of payload data (if *length* is nonzero)
--
2.39.2
1 year, 9 months
[cross-project PATCH v2] NBD 64-bit extensions
by Eric Blake
This is a cover letter for a set of multi-project patch series all
designed to implement 64-bit operations in NBD, and demonstrate
interoperability of the new extension between projects.
v1 of the project was attempted nearly a year ago:
https://lists.nongnu.org/archive/html/qemu-devel/2021-12/msg00453.html
Since then, I've addressed a lot of preliminary cleanups in libnbd to
make it easier to test things, and incorporated review comments issued
on v1, including:
- no orthogonality between simple/structured replies and 64-bit mode:
once extended headers are negotiated, all transmission traffic (both
from client to server and server to client) uses just one header
size
- add support for the client to pass a payload on commands to the
server, and demonstrate it by further implementing a way to pass a
flag with NBD_CMD_BLOCK_STATUS that says the client is passing a
payload to request status of just a subset of the negotiated
contexts, rather than all possible contexts that were earlier
negotiated during NBD_OPT_SET_META_CONTEXT
- tweaks to the header layouts: tweak block status to provide 64-bit
flags values (although "base:allocation" will remain usable with
just 32-bit flags); reply with offset rather than padding and with
fields rearranged for maximal sharing between client and server
layouts
- word-smithing of the NBD proposed protocol; split things into
several smaller patches, where we can choose how much to take
- more unit tests added in qemu and libnbd
- the series end with RFC patches on whether to support 64-bit hole
responses to NBD_CMD_READ, even though our current limitations say
servers don't have to accept more than a 32M read request and
therefore will never have that big of a hole to read)
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
1 year, 9 months