[PATCH 0/7] Support Windows BitLocker (RHBZ#1808977).
by Richard W.M. Jones
These commits, along with the associated changes to common:
https://www.redhat.com/archives/libguestfs/2020-March/msg00286.html
support the transparent decryption and inspection of Windows guests
encrypted with BitLocker encryption.
To do the BitLocker decryption requires cryptsetup 2.3.0 (although
cryptsetup 2.3 is not required for existing LUKS use). It also
requires a new-ish Linux kernel, I believe 5.3+
This starts by adding new APIs cryptsetup-open and cryptsetup-close.
As well as dropping the "luks-" prefix, this allows specifying an
optional type parameter, allowing you to select BitLocker encryption.
Although also the new API does not require this parameter, and will
autodetect the correct type of decryption to apply in every known
case.
The main work is then in updating list-filesystems and ensuring that
inspection still works.
This really needs but also lacks tests. See commit message of patch
1/7 for why. I believe it's impossible for us to prove unit testing
right now, but hopefully this situation can improve in future.
Rich.
4 years, 1 month
[PATCH v4 00/34] Configurable policy for handling deprecated interfaces
by Markus Armbruster
This series extends QMP introspection to cover deprecation.
Additionally, new option -compat lets you configure what to do when
deprecated interfaces get used. This is intended for testing users of
the management interfaces. It is experimental.
-compat deprecated-input=<in-policy> configures what to do when
deprecated input is received. Available policies:
* accept: Accept deprecated commands and arguments (default)
* reject: Reject them
* crash: Crash
-compat deprecated-output=<out-policy> configures what to do when
deprecated output is sent. Available output policies:
* accept: Emit deprecated command results and events (default)
* hide: Suppress them
For now, -compat covers only deprecated syntactic aspects of QMP. We
may want to extend it to cover semantic aspects, CLI, and experimental
features.
PATCH 01-04: Documentation fixes
PATCH 05-10: Test improvements
PATCH 11-24: Add feature flags to remaining user-defined types and to
struct members
PATCH 25-26: New special feature 'deprecated', visible in
introspection
PATCH 27-34: New -compat to set policy for handling stuff marked with
feature 'deprecated'
v4:
PATCH 05+07: Temporary memory leak plugged [Marc-André]
PATCH 23: Rewritten [Marc-André]
PATCH 24: Comment typo [Marc-André]
PATCH 30: Memory leaks plugged
v3:
* Rebased, non-trivial conflicts in PATCH 01+26+27+34 due to RST
conversion and code motion
* PATCH 28-29: Old PATCH 28 split up to ease review
* PATCH 30-31: New
* PATCH 32-33: Old PATCH 29 split up to ease review
Comparison to RFC (24 Oct 2019):
* Cover arguments and results in addition to commands and events
* Half-baked "[RFC PATCH 18/19] qapi: Include a warning in the
response to a deprecated command" dropped
See also last item of
Subject: Minutes of KVM Forum BoF on deprecating stuff
Date: Fri, 26 Oct 2018 16:03:51 +0200
Message-ID: <87mur0ls8o.fsf(a)dusky.pond.sub.org>
https://lists.nongnu.org/archive/html/qemu-devel/2018-10/msg05828.html
Cc: Lukáš Doktor <ldoktor(a)redhat.com>
Cc: libguestfs(a)redhat.com
Cc: libvir-list(a)redhat.com
Cc: Daniel P. Berrange <berrange(a)redhat.com>
Cc: Peter Krempa <pkrempa(a)redhat.com>
Cc: Kevin Wolf <kwolf(a)redhat.com>
Markus Armbruster (34):
qemu-doc: Belatedly document QMP command arg & result deprecation
qapi: Belatedly update doc comment for @wait deprecation
docs/devel/qapi-code-gen: Clarify allow-oob introspection
docs/devel/qapi-code-gen: Document 'features' introspection
tests/test-qmp-cmds: Factor out qmp_dispatch() test helpers
tests/test-qmp-cmds: Check responses more thoroughly
tests/test-qmp-cmds: Simplify test data setup
tests/test-qmp-event: Simplify test data setup
tests/test-qmp-event: Use qobject_is_equal()
tests/test-qmp-event: Check event is actually emitted
qapi/schema: Clean up around QAPISchemaEntity.connect_doc()
qapi: Add feature flags to remaining definitions
qapi: Consistently put @features parameter right after @ifcond
qapi/introspect: Rename *qlit* to reduce confusion
qapi/introspect: Factor out _make_tree()
qapi/schema: Change _make_features() to a take feature list
qapi/schema: Reorder classes so related ones are together
qapi/schema: Rename QAPISchemaObjectType{Variant,Variants}
qapi/schema: Call QAPIDoc.connect_member() in just one place
qapi: Add feature flags to struct members
qapi: Inline do_qmp_dispatch() into qmp_dispatch()
qapi: Simplify how qmp_dispatch() deals with QCO_NO_SUCCESS_RESP
qapi: Simplify how qmp_dispatch() gets the request ID
qapi: Replace qmp_dispatch()'s TODO comment by an explanation
qapi: New special feature flag "deprecated"
qapi: Mark deprecated QMP parts with feature 'deprecated'
qemu-options: New -compat to set policy for deprecated interfaces
qapi: Implement deprecated-output=hide for QMP command results
qapi: Implement deprecated-output=hide for QMP events
qapi: Implement deprecated-output=hide for QMP event data
qapi: Implement deprecated-output=hide for QMP introspection
qapi: Implement deprecated-input=reject for QMP commands
qapi: Implement deprecated-input=reject for QMP command arguments
qapi: New -compat deprecated-input=crash
docs/devel/qapi-code-gen.txt | 79 ++-
docs/system/deprecated.rst | 48 +-
tests/qapi-schema/doc-good.texi | 32 ++
qapi/block-core.json | 48 +-
qapi/block.json | 30 +-
qapi/char.json | 1 +
qapi/compat.json | 52 ++
qapi/control.json | 11 +-
qapi/introspect.json | 28 +-
qapi/machine.json | 34 +-
qapi/migration.json | 36 +-
qapi/misc.json | 13 +-
qapi/qapi-schema.json | 1 +
include/qapi/compat-policy.h | 20 +
include/qapi/qmp/dispatch.h | 1 +
include/qapi/qobject-input-visitor.h | 9 +
include/qapi/qobject-output-visitor.h | 9 +
include/qapi/visitor-impl.h | 3 +
include/qapi/visitor.h | 9 +
monitor/monitor-internal.h | 3 -
monitor/misc.c | 2 -
monitor/qmp-cmds-control.c | 102 +++-
qapi/qapi-visit-core.c | 9 +
qapi/qmp-dispatch.c | 149 +++---
qapi/qobject-input-visitor.c | 29 ++
qapi/qobject-output-visitor.c | 20 +
qemu-storage-daemon.c | 2 -
softmmu/vl.c | 17 +
tests/test-qmp-cmds.c | 249 +++++----
tests/test-qmp-event.c | 203 +++-----
qapi/Makefile.objs | 8 +-
qapi/trace-events | 1 +
qemu-options.hx | 22 +
scripts/qapi/commands.py | 20 +-
scripts/qapi/doc.py | 16 +-
scripts/qapi/events.py | 24 +-
scripts/qapi/expr.py | 14 +-
scripts/qapi/introspect.py | 104 ++--
scripts/qapi/schema.py | 488 ++++++++++--------
scripts/qapi/types.py | 8 +-
scripts/qapi/visit.py | 28 +-
tests/Makefile.include | 1 +
tests/qapi-schema/alternate-base.err | 2 +-
tests/qapi-schema/doc-good.json | 22 +-
tests/qapi-schema/doc-good.out | 18 +
.../qapi-schema/features-deprecated-type.err | 2 +
.../qapi-schema/features-deprecated-type.json | 3 +
.../qapi-schema/features-deprecated-type.out | 0
tests/qapi-schema/qapi-schema-test.json | 51 +-
tests/qapi-schema/qapi-schema-test.out | 48 +-
tests/qapi-schema/test-qapi.py | 26 +-
51 files changed, 1393 insertions(+), 762 deletions(-)
create mode 100644 qapi/compat.json
create mode 100644 include/qapi/compat-policy.h
create mode 100644 tests/qapi-schema/features-deprecated-type.err
create mode 100644 tests/qapi-schema/features-deprecated-type.json
create mode 100644 tests/qapi-schema/features-deprecated-type.out
--
2.21.1
4 years, 4 months
[nbdkit PATCH] retry: Tweak error message
by Eric Blake
The retry filter defaults to 5 retries, but when run with verbose
details produces some confusing output:
$ rm -f /tmp/inject; (sleep 5s; touch /tmp/inject)& ./nbdkit -f -v -U - \
null 1G --filter=retry --filter=noextents --filter=error error-rate=100% \
error-file=/tmp/inject --filter=delay rdelay=1 \
--run 'qemu-img convert $nbd out.img'
...
nbdkit: null[1]: debug: noextents: pread count=2097152 offset=14680064
nbdkit: null[1]: debug: error: pread count=2097152 offset=14680064
nbdkit: null[1]: error: injecting EIO error into pread
nbdkit: null[1]: debug: retry 6: original errno = 5
nbdkit: null[1]: debug: could not recover after 5 retries
The code is correct (there were six attempts, but that corresponds to
5 retries after the initial attempt; comments mention that retry=0
turns the filter into a no-op but even that requires an initial
attempt). So all we need to adjust is the output.
Fixes: f0f0ec49
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
I'm pushing this in response to https://bugzilla.redhat.com/1819240
At the same time, I'm looking at whether qemu-img should be tweaked
to have a mode where it gives up on the first EIO error, rather than
plowing on through the rest of the image even after the first error.
filters/retry/retry.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/filters/retry/retry.c b/filters/retry/retry.c
index 11e5313b..db91d7ca 100644
--- a/filters/retry/retry.c
+++ b/filters/retry/retry.c
@@ -173,7 +173,7 @@ do_retry (struct retry_handle *h,
again:
/* Log the original errno since it will be lost when we retry. */
- nbdkit_debug ("retry %d: original errno = %d", data->retry+1, *err);
+ nbdkit_debug ("retry attempt %d: original errno = %d", data->retry, *err);
if (data->retry >= retries) {
nbdkit_debug ("could not recover after %d retries", retries);
--
2.26.0.rc2
4 years, 5 months
[libnbd PATCH 0/2] fix hangs against nbdkit 1.2
by Eric Blake
nbdkit 1.2 as a server waits for read() to see EOF, even after the
client has sent NBD_CMD_DISC. That was fixed in mbdkit 1.4; and most
modern NBD servers are smarter than this (they close() the write end
of their traffic soon after NBD_CMD_DISC). But it's easy enough to
revert nbdkit commit c70616f8 to get back to a server with the same
behavior as the older nbdkit, at which point both 'cd /path/to/libnbd
&& make check PATH=/path/to/nbdkit' and 'cd /path/to/nbdkit &&
/path/to/libnbd/run make check' will hang without this series.
In short, this series is restoring the shutdown(SHUT_WR) call that got
lost from plugins/nbd/nbd.c when nbdkit switched to libnbd in commit
ab7760fc.
Eric Blake (2):
sockets: Add .shut_writes callback
states: Add state for shutdown/gnutls_bye after NBD_CMD_DISC
lib/internal.h | 4 +++-
generator/state_machine.ml | 19 +++++++++++++++++--
generator/states-issue-command.c | 20 ++++++++++++++++++--
lib/crypto.c | 21 +++++++++++++++++----
lib/socket.c | 12 +++++++++++-
5 files changed, 66 insertions(+), 10 deletions(-)
--
2.26.0.rc2
4 years, 5 months
[nbdkit PATCH v2] nbd: Avoid stuck poll() in nbdplug_close_handle()
by Eric Blake
We have been seeing sporadic hangs on test-nbd-tls-psk.sh and
test-nbd-tls.sh (on my machine, running those two in a loop with
commits 0a76cae4 and 09e34ba2 reverted would fail within 100
attempts), where even though the client to the 'nbdkit nbd' process
has cleanly exited, things are stalled in .close where nbd is trying
to pthread_join() the reader thread, while the reader thread is itself
blocked on a poll() that will never make additional progress. In
hindsight, the problem is obvious: we used the wrong libnbd API
(synchronous instead of async), which often results in:
nbd .close nbd reader server
nbd_shutdown poll
- send NBD_CMD_DISC
receive NBD_CMD_DISC
shutdown(SHUT_WR)
wake up on EOF
nbd_aio_notify_read
- move state to CLOSED
nbd_aio_is_dead -> true
break loop
- poll
pthread_join
close fds
nbd_close
see EOF from client
but sometimes things happen differently:
nbd .close nbd reader server
nbd_shutdown poll
- send NBD_CMD_DISC
- poll
receive NBD_CMD_DISC
shutdown(SHUT_WR)
- wake up on EOF
- move state to CLOSED
pthread_join
stuck
As can be seen, having two threads compete on poll() on the same fd is
unwise. Worse, if the server uses a means that will cause us to see
EOF even though the socket is still open (such as shutdown(SHUT_WR) on
plaintext, or gnutls_bye() with TLS), but waits for us to close the
socket first, we end up with two processes deadlocked on each other.
(nbdkit as server has used gnutls_bye since commit bd9a52e0; qemu
however does not use it.)
Other commits such as 14ba4154, c70616f8, and 430f8141 show that
getting the shutdown sequence race-free has not been trivial.
Fixes: ab7760fc
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
plugins/nbd/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index ba1e7188..0ea2a4a2 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -532,7 +532,7 @@ nbdplug_open (int readonly)
static void
nbdplug_close_handle (struct handle *h)
{
- if (nbd_shutdown (h->nbd, 0) == -1)
+ if (nbd_aio_disconnect (h->nbd, 0) == -1)
nbdkit_debug ("failed to clean up handle: %s", nbd_get_error ());
if ((errno = pthread_join (h->reader, NULL)))
nbdkit_debug ("failed to join reader thread: %m");
--
2.26.0.rc2
4 years, 5 months
[nbdkit PATCH 0/2] Improve shutdown race in nbd plugin
by Eric Blake
I still need more soak time on testing, to see whether I can:
a) reproduce the hang with patch 2 not applied
b) feel confident that patch 2 is sufficient to fix the race, or else
determine that I also need to augment the loop condition in the reader
thread to additionally break out of the loop when the pipe-to-self
sees EOF even when nbd_aio_is_dead() has not yet been satisfied
I'm also fairly confident that I'll need to patch libnbd to call
shutdown(SHUT_WR) on plaintext and gnutls_bye(GNUTLS_SHUT_WR) on TLS
connections after sending NBD_CMD_DISC, as that is something we lowt
when the nbd plugin switched to libnbd, even after documenting in
commit 430f8141 why it is important for preventing shutdown deadlocks.
Eric Blake (2):
nbd: Don't reference stale errno in reader loop
nbd: Reorder cleanup to avoid getting stuck in poll()
plugins/nbd/nbd.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--
2.26.0.rc2
4 years, 5 months
Re: [Libguestfs] Anyone seen build hangs (esp armv7, s390x) in Fedora?
by Richard W.M. Jones
[Dropping devel, adding libguestfs]
This can be reproduced on x86-64 so I can reproduce it locally. It
only appears to happen when the tests are run under rpmbuild, not when
I run them as ‘make check’, but I'm unclear why this is.
As Eric described earlier, the test runs two copies of nbdkit and a
client, connected like this:
qemu-img info ===> nbdkit nbd ===> nbdkit example1
[3] [2] [1]
These are started in order [1], [2] then [3]. When the client
(process [3]) completes it exits and then the test harness kills
processes [1] and [2] in that order.
The stack trace of [2] at the hang is:
Thread 3 (Thread 0x7fabbf4f7700 (LWP 3955842)):
#0 0x00007fabc05c0f0f in poll () from /lib64/libc.so.6
#1 0x00007fabc090abba in poll (__timeout=-1, __nfds=2, __fds=0x7fabbf4f6bb0) at /usr/include/bits/poll2.h:46
#2 nbdplug_reader (handle=0x5584020e09b0) at nbd.c:323
#3 0x00007fabc069d472 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fabc05cc063 in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fabbfcf8700 (LWP 3955793)):
#0 0x00007fabc069eab7 in __pthread_clockjoin_ex () from /lib64/libpthread.so.0
#1 0x00007fabc090af2b in nbdplug_close_handle (h=0x5584020e09b0) at nbd.c:538
#2 0x00005583f90caee0 in backend_close (b=<optimized out>) at backend.c:247
#3 0x00005583f90cdbf1 in free_connection (conn=0x5584020df890) at connections.c:359
#4 handle_single_connection (sockin=<optimized out>, sockout=<optimized out>) at connections.c:230
#5 0x00005583f90d63e8 in start_thread (datav=0x5584020bf1b0) at sockets.c:356
#6 0x00007fabc069d472 in start_thread () from /lib64/libpthread.so.0
#7 0x00007fabc05cc063 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fabc002ca40 (LWP 3955770)):
#0 0x00007fabc06a3b02 in pthread_cond_wait@(a)GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00005583f90d6dbf in accept_incoming_connections (socks=0x5584020bf010, nr_socks=1) at sockets.c:501
#2 0x00005583f90ca441 in start_serving () at main.c:936
#3 main (argc=<optimized out>, argv=<optimized out>) at main.c:702
The stack trace of [1] at the hang is:
Thread 2 (Thread 0x7f30ac822700 (LWP 3955798)):
#0 0x00007f30b32a384c in recv () from /lib64/libpthread.so.0
#1 0x00007f30b3342c8f in _gnutls_stream_read (ms=0x7f30ac821a2c, pull_func=0x7f30b336a570 <system_read>, size=5, bufel=<synthetic pointer>, session=0x55c5cf561620) at buffers.c:346
#2 _gnutls_read (ms=0x7f30ac821a2c, pull_func=0x7f30b336a570 <system_read>, size=5, bufel=<synthetic pointer>, session=0x55c5cf561620) at buffers.c:426
#3 _gnutls_io_read_buffered (session=session@entry=0x55c5cf561620, total=5, recv_type=recv_type@entry=4294967295, ms=0x7f30ac821a2c) at buffers.c:582
#4 0x00007f30b3338f3f in recv_headers (ms=<optimized out>, record=0x7f30ac821a80, htype=4294967295, type=GNUTLS_ALERT, record_params=0x55c5cf565f60, session=0x55c5cf561620) at record.c:1172
#5 _gnutls_recv_in_buffers (session=session@entry=0x55c5cf561620, type=type@entry=GNUTLS_ALERT, htype=htype@entry=4294967295, ms=<optimized out>, ms@entry=0) at record.c:1307
#6 0x00007f30b333b555 in _gnutls_recv_int (session=session@entry=0x55c5cf561620, type=type@entry=GNUTLS_ALERT, data=data@entry=0x0, data_size=data_size@entry=0, seq=seq@entry=0x0, ms=0) at record.c:1773
#7 0x00007f30b333b703 in gnutls_bye (session=session@entry=0x55c5cf561620, how=how@entry=GNUTLS_SHUT_RDWR) at record.c:312
#8 0x000055c5c57af171 in crypto_close () at crypto.c:407
#9 0x000055c5c57aea58 in free_connection (conn=0x55c5cf560500) at connections.c:339
#10 handle_single_connection (sockin=<optimized out>, sockout=<optimized out>) at connections.c:230
#11 0x000055c5c57b73e8 in start_thread (datav=0x55c5cf541550) at sockets.c:356
#12 0x00007f30b3299472 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f30b31c8063 in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f30b2c28a40 (LWP 3955740)):
#0 0x00007f30b329fb02 in pthread_cond_wait@(a)GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000055c5c57b7dbf in accept_incoming_connections (socks=0x55c5cf5413b0, nr_socks=1) at sockets.c:501
#2 0x000055c5c57ab441 in start_serving () at main.c:936
#3 main (argc=<optimized out>, argv=<optimized out>) at main.c:702
It seems like process [2] is hanging in pthread_join, waiting on
thread 3 (the reader loop) to finish. The reader loop is stuck on
poll.
Meanwhile process [1] is trying to do a clean TLS disconnect. This
involves waiting on process [2] to write something, which it is never
going to do.
I don't think this is actually anything to do with libnbd not cleanly
shutting down TLS connections. If libnbd had closed the socket
abruptly then process [1] wouldn't have to wait. I think this might
be a bug in the nbd plugin.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
4 years, 5 months
[nbdkit PATCH] tests: Swap nbdkit process order in test-nbd-tls-psk.sh
by Eric Blake
We're still seeing sporadic failures of 'nbdkit nbd tls=', and I'm
still trying to come up with a root cause fix (it may involve smarter
use of gnutls_bye() in libnbd). In the meantime, here's what we know:
when the hang/failure happens, the 'nbdkit nbd tls=' client process is
stuck in a poll() waiting to see EOF from the server, while the
'nbdkit example1' server process is stuck in a read waiting to see if
the client will do a clean shutdown of the gnutls session. Sending
SIGTERM to the client is not going to break the poll, but if we
instead kill the server, that will cause the client to respond
(perhaps with an error message that we ignore, but better than
hanging).
So, by rearranging the order in which we call our start_nbdkit
function, we change the order in which we send SIGTERM to the two
processes. And in turn, this becomes the first testsuite coverage of
the 'nbd retry=' parameter, added back in commit 0bb76bc7.
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
My current setup does not seem to be hitting the testsuite
hang/failure as frequently as Rich's setup, so for now I'm posting
this in the hopes that we can see if it reduces the rate of testsuite
failures.
tests/test-nbd-tls-psk.sh | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tests/test-nbd-tls-psk.sh b/tests/test-nbd-tls-psk.sh
index 7a477da9..547064ab 100755
--- a/tests/test-nbd-tls-psk.sh
+++ b/tests/test-nbd-tls-psk.sh
@@ -1,6 +1,6 @@
#!/usr/bin/env bash
# nbdkit
-# Copyright (C) 2019 Red Hat Inc.
+# Copyright (C) 2019-2020 Red Hat Inc.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
@@ -64,14 +64,14 @@ files="$sock1 $sock2 $pid1 $pid2 nbd-tls-psk.out"
rm -f $files
cleanup_fn rm -f $files
+# Run nbd plugin as intermediary; also test our retry code
+start_nbdkit -P "$pid2" -U "$sock2" --tls=off nbd retry=5 \
+ tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1"
+
# Run encrypted server
start_nbdkit -P "$pid1" -U "$sock1" \
--tls=require --tls-psk=keys.psk example1
-# Run nbd plugin as intermediary
-start_nbdkit -P "$pid2" -U "$sock2" --tls=off \
- nbd tls=require tls-psk=keys.psk tls-username=qemu socket="$sock1"
-
# Run unencrypted client
qemu-img info --output=json -f raw "nbd+unix:///?socket=$sock2" > nbd-tls-psk.out
--
2.26.0.rc2
4 years, 5 months