S3 plugin test case breaks test suite / development workflow...
by Laszlo Ersek
Hi,
I'm writing this about a specific problem and about a general problem.
* The specific problem is that commit 5130c43bc1f9 ("S3 plugin: add
support for accessing multiple objects", 2022-05-12) introduced a
dependency on the "botocore" python module, and now "make check" fails
for me, because this module is unavailable on my system.
> Traceback (most recent call last):
> File "[...]/nbdkit/plugins/S3/nbdkit-S3-plugin", line 43, in <module>
> from botocore.exceptions import ClientError
> ModuleNotFoundError: No module named 'botocore'
Now, while the README file says, from commit 6715c3d8b3e6 ("New plugin:
S3 plugin for accessing disks stored in AWS S3 and Ceph.", 2020-11-14):
> For the Python plugin:
>
> - python interpreter (version 3 only)
>
> - python development libraries
>
> - python unittest to run the test suite
>
> - boto3 is required to run the S3 plugin written in Python
the "boto3" dependency had never been a "hard" one until commit
5130c43bc1f9. The language suggests that running the S3 plugin -- even
for testing -- is optional.
Therefore, commit 5130c43bc1f9 should have updated the test cases
"test-S3.sh" and "test-S3-unit.sh" with a proper "requires" line (I
assume anyway).
FWIW, the following trick in "test-S3.sh":
> # There is a fake boto3 module in test-S3/ which we use as a test
> # harness for the plugin.
Does not work.
... So, after all, the bug in commit 5130c43bc1f9 may have been that it
did not add the ClientError exception type to the fake module in
"tests/test-S3/boto3". I'm unsure; but please fix it anyway.
* The generic problem is that I need to write this separate error report
email, rather than commenting directly under the submission -- on the
mailing list -- or the pull request -- on gitlab -- that ended up as
commit 5130c43bc1f9. For the life of me, I just can't figure out *where*
commit 5130c43bc1f9 was originally reviewed.
I think that's wrong.
... Ultimately, I've found the patch in the MR listing at
<https://gitlab.com/nbdkit/nbdkit/-/merge_requests?scope=all&state=all>,
namely in MR#9 -- <https://gitlab.com/nbdkit/nbdkit/-/merge_requests/9>.
But this merge request has status *closed*, not *merged*. So even though
a commit is given, I can't find the associated discussion because (a)
MR#9 is not listed in the list of *merged* merge requests (only the
"rejected" ones), (b) the commit message itself does not reference MR#9.
I think we should improve our process here.
Thanks,
Laszlo
2 years, 7 months
Communication issues between NBD driver and NBDKit server
by Nikolaus Rath
Hi,
I am observing some strange errors when using the Kernel's NBD driver with NBDkit.
On the kernel side, I see:
May 15 16:16:11 vostro.rath.org kernel: nbd0: detected capacity change from 0 to 104857600
May 15 16:16:11 vostro.rath.org kernel: nbd1: detected capacity change from 0 to 104857600
May 15 16:18:23 vostro.rath.org kernel: block nbd0: Possible stuck request 00000000ae5feee7: control (write@4836316160,32768B). Runtime 30 seconds
May 15 16:18:25 vostro.rath.org kernel: block nbd0: Possible stuck request 000000007094eddc: control (write@5372947456,10240B). Runtime 30 seconds
May 15 16:18:27 vostro.rath.org kernel: block nbd0: Suspicious reply 89 (status 0 flags 0)
May 15 16:18:31 vostro.rath.org kernel: block nbd0: Possible stuck request 0000000075f8b9bc: control (write@8057764864,32768B). Runtime 30 seconds
May 15 16:18:41 vostro.rath.org kernel: block nbd0: Possible stuck request 000000002d1b3e8b: control (write@14499979264,32768B). Runtime 30 seconds
[...]
And userspace ('zfs snapshot" in this instance) is stuck afterwards.
On the NBDkit side, there seemingly are write errors when replying back to the kernel:
$ nbdkit --unix /tmp/tmpi5o59_y_/nbd_socket_sb --foreground --filter=exitlast --filter=stats --threads 16 S3 size=50G bucket=nikratio-backup key=sb statsfile=/tmp/tmpi5o59_y_/stats_sb.txt object-size=32K &
$ nbd-client -unix /tmp/tmpi5o59_y_/nbd_socket_sb /dev/nbd2
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..size = 51200MB
Connected /dev/nbd0
[....]
nbdkit: python.10: error: write reply: NBD_CMD_WRITE: Broken pipe
What's the best way to narrow down who's the culprit here (kernel vs NBD server)?
Best,
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
2 years, 7 months
[nbdkit PATCH] vddk: advise user on obscure thumbprint mismatch error condition
by Laszlo Ersek
If the thumbprint parameter is wrong, it's only reported in
VixDiskLib_Open(), and then with the non-descript VIX_E_FAIL error code.
If the user typed or cut-and-pasted the thumbprint incorrectly, said
"Unkown error" message is not helpful for fixing the nbkit command line.
Hint at the thumbprint as the potential culprit.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1905772
Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
---
plugins/vddk/vddk-structs.h | 1 +
plugins/vddk/vddk.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/plugins/vddk/vddk-structs.h b/plugins/vddk/vddk-structs.h
index 799c4aecc5b8..4c7c6fe2e4fc 100644
--- a/plugins/vddk/vddk-structs.h
+++ b/plugins/vddk/vddk-structs.h
@@ -43,6 +43,7 @@
typedef uint64_t VixError;
#define VIX_OK 0
+#define VIX_E_FAIL 1
#define VIX_E_NOT_SUPPORTED 6
#define VIX_ASYNC 25000
diff --git a/plugins/vddk/vddk.c b/plugins/vddk/vddk.c
index 2ea071d641e6..dbd3fdbe09af 100644
--- a/plugins/vddk/vddk.c
+++ b/plugins/vddk/vddk.c
@@ -769,6 +769,26 @@ vddk_open (int readonly)
VDDK_CALL_END (VixDiskLib_Open, 0);
if (err != VIX_OK) {
VDDK_ERROR (err, "VixDiskLib_Open: %s", filename);
+
+ /* Attempt to advise the user on the extremely helpful "Unknown error"
+ * result of VixDiskLib_Open(). The one reason we've seen for this error
+ * mode is a thumbprint mismatch (RHBZ#1905772). Note that:
+ *
+ * (1) The thumbprint (as a part of "h->params") is passed to
+ * VixDiskLib_ConnectEx() above, but the fingerprint mismatch is
+ * detected only inside VixDiskLib_Open().
+ *
+ * (2) "thumb_print" may be NULL -- vddk_config_complete() is correct not to
+ * require a non-NULL "thumb_print" for a remote connection; the sample
+ * program "vixDiskLibSample.cpp" in vddk-7.0.3 explicitly permits
+ * "-thumb" to be absent.
+ */
+ if (is_remote && err == VIX_E_FAIL)
+ nbdkit_error ("Please verify whether the \"thumbprint\" parameter (%s) "
+ "matches the SHA1 fingerprint of the remote VMware "
+ "server. Refer to nbdkit-vddk-plugin(1) section "
+ "\"THUMBPRINTS\" for details.",
+ thumb_print == NULL ? "not specified" : thumb_print);
goto err2;
}
base-commit: 5007409b03486fa4b43526412d3db8de50325efd
--
2.19.1.3.g30247aa5d201
2 years, 7 months
[PATCH v2v] convert: Ignore /dev/mapper/osprober-* devices when trimming
by Richard W.M. Jones
These devices can be left around by either grub2 or the osprober tool.
They are read-only mirrors of existing filesystems and it appears we
can safely ignore them.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2003503
---
convert/convert.ml | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/convert/convert.ml b/convert/convert.ml
index 87fca7252b..997f6b08bd 100644
--- a/convert/convert.ml
+++ b/convert/convert.ml
@@ -194,10 +194,16 @@ and do_fstrim g inspect =
(* Get all filesystems. *)
let fses = g#list_filesystems () in
+ (* Ignore unknown/swap devices. *)
let fses = List.filter_map (
function (_, ("unknown"|"swap")) -> None | (dev, _) -> Some dev
) fses in
+ (* Ignore filesystems left around by osprober (RHBZ#2003503). *)
+ let fses =
+ List.filter (fun dev -> not (String.is_prefix dev "/dev/mapper/osprober-"))
+ fses in
+
(* Trim the filesystems. *)
List.iter (
fun dev ->
--
2.35.1
2 years, 7 months
[PATCH] lib: Disable 5-level page tables when using -cpu max
by Richard W.M. Jones
In https://bugzilla.redhat.com/show_bug.cgi?id=2082806 we've been
tracking an insidious qemu bug which intermittently prevents the
libguestfs appliance from starting. The symptoms are that SeaBIOS
starts and displays its messages, but the kernel isn't reached. We
found that the kernel does in fact start, but when it tries to set up
page tables and jump to protected mode it gets a triple fault which
causes the emulated CPU in qemu to reset (qemu exits).
This seems to only affect TCG (not KVM).
Yesterday I found that this is caused by using -cpu max which enables
the "la57" feature (5-level page tables[0]), and that we can make the
problem go away using -cpu max,la57=off. Note that I still don't
fully understand the qemu bug, so this is only a workaround.
I chose to disable 5-level page tables for both TCG and KVM, partly to
make the patch simpler, and partly because I guess it's not a feature
(ie. 57 bit linear addresses) that is useful for the libguestfs
appliance case, where we have limited physical memory and no need to
run any programs with huge address spaces.
I tested this by running both the direct & libvirt paths overnight. I
expect that this patch will fail with old qemu/libvirt which doesn't
understand the "la57" feature, but this is only intended as a
temporary workaround.
[0] Article about 5-level page tables as background:
https://lwn.net/Articles/717293/
Thanks: Laszlo Ersek
Fixes: https://answers.launchpad.net/ubuntu/+source/libguestfs/+question/701625
---
lib/launch-direct.c | 15 +++++++++++++--
lib/launch-libvirt.c | 7 +++++++
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/lib/launch-direct.c b/lib/launch-direct.c
index c07a8d78f..ff0eaeb62 100644
--- a/lib/launch-direct.c
+++ b/lib/launch-direct.c
@@ -518,8 +518,19 @@ launch_direct (guestfs_h *g, void *datav, const char *arg)
} end_list ();
cpu_model = guestfs_int_get_cpu_model (has_kvm && !force_tcg);
- if (cpu_model)
- arg ("-cpu", cpu_model);
+ if (cpu_model) {
+#if defined(__x86_64__)
+ /* Temporary workaround for RHBZ#2082806 */
+ if (STREQ (cpu_model, "max")) {
+ start_list ("-cpu") {
+ append_list (cpu_model);
+ append_list ("la57=off");
+ } end_list ();
+ }
+ else
+#endif
+ arg ("-cpu", cpu_model);
+ }
if (g->smp > 1)
arg_format ("-smp", "%d", g->smp);
diff --git a/lib/launch-libvirt.c b/lib/launch-libvirt.c
index 87da2f40e..03d69e027 100644
--- a/lib/launch-libvirt.c
+++ b/lib/launch-libvirt.c
@@ -1185,6 +1185,13 @@ construct_libvirt_xml_cpu (guestfs_h *g,
else if (STREQ (cpu_model, "max")) {
/* https://bugzilla.redhat.com/show_bug.cgi?id=1935572#c11 */
attribute ("mode", "maximum");
+#if defined(__x86_64__)
+ /* Temporary workaround for RHBZ#2082806 */
+ start_element ("feature") {
+ attribute ("policy", "disable");
+ attribute ("name", "la57");
+ } end_element ();
+#endif
}
else
single_element ("model", cpu_model);
--
2.31.1
2 years, 7 months
[libguestfs PATCH 0/2] daemon/selinux-relabel: tolerate relabeling errors
by Laszlo Ersek
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1794518
In the "selinux-relabel" API, utilize the "-C" option of setfiles(8), if
it is available. (It's going to be part of the policycoreutils-3.4
release.) See patch#2 for a bit longer explanation.
Thanks
Laszlo
Laszlo Ersek (2):
daemon/selinux-relabel: generalize setfiles_has_m_option()
daemon/selinux-relabel: tolerate relabeling errors
daemon/selinux-relabel.c | 36 +++++++++++++-------
1 file changed, 24 insertions(+), 12 deletions(-)
base-commit: 08c4ac90f5a3c08b48444e2faf3d0f58d6ddc206
--
2.19.1.3.g30247aa5d201
2 years, 7 months
[PATCH nbdkit] vddk: Demote another useless phone-home error message to debug
by Richard W.M. Jones
Earlier commit df7957c8b8 ("vddk: Demote useless VMware error message
to a debug statement.") turned an error message from VMware's phone
home anti-feature into a debug message. It turns out there is more
than one of these messages.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2083617
Reported-by: Ming Xie
---
plugins/vddk/vddk.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/plugins/vddk/vddk.c b/plugins/vddk/vddk.c
index 51ef8f33..2ea071d6 100644
--- a/plugins/vddk/vddk.c
+++ b/plugins/vddk/vddk.c
@@ -513,11 +513,15 @@ error_function (const char *fs, va_list args)
trim (str);
- /* VDDK 7 added a useless error message about their "phone home"
- * system called CEIP which only panics users. Demote it to a debug
- * statement. https://bugzilla.redhat.com/show_bug.cgi?id=1834267
+ /* VDDK 7 added some useless error messages about their "phone home"
+ * system called CEIP which only panics users. Demote to a debug
+ * statement.
+ * https://bugzilla.redhat.com/show_bug.cgi?id=1834267
+ * https://bugzilla.redhat.com/show_bug.cgi?id=2083617
*/
- if (strstr (str, "Get CEIP status failed") != NULL) {
+ if (strstr (str, "Get CEIP status failed") != NULL ||
+ strstr (str, "VDDK_PhoneHome: Unable to load configuration "
+ "options from ") != NULL) {
nbdkit_debug ("%s", str);
return;
}
--
2.35.1
2 years, 7 months
[PATCH nbdkit] nbd: Hide some state machine debugging behind a debug flag
by Richard W.M. Jones
When running virt-p2v which uses this plugin, the log file is consumed
by messages about state machine transitions and so on. In a log file
that was shared with me, out of the 135,023 lines in total,
94,653 (70%) were:
nbdkit: debug: polling, dir=1
and 18,676 (14%) were:
nbdkit: debug: cookie X completed state machine, status 0
This commit changes the logging so these state machine transitions are
only printed when you use the debug flag “-D nbd.verbose=1”. I didn't
document this flag because it's likely only of use to developers who
are reading the code already.
There are some debug messages along error paths which are
(a) generally useful and (b) did not appear in the log file, so I left
those alone.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2083498
Reported-by: Ming Xie
---
plugins/nbd/nbd.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/plugins/nbd/nbd.c b/plugins/nbd/nbd.c
index 45bf05e9..d0d6544b 100644
--- a/plugins/nbd/nbd.c
+++ b/plugins/nbd/nbd.c
@@ -66,6 +66,9 @@
#define USE_VSOCK 1
#endif
+/* Use '-D nbd.verbose=1' for verbose messages about the state machine. */
+NBDKIT_DLL_PUBLIC int nbd_debug_verbose = 0;
+
/* The per-transaction details */
struct transaction {
int64_t cookie;
@@ -421,7 +424,8 @@ nbdplug_reader (void *handle)
{
struct handle *h = handle;
- nbdkit_debug ("nbd: started reader thread");
+ if (nbd_debug_verbose)
+ nbdkit_debug ("nbd: started reader thread");
while (!nbd_aio_is_dead (h->nbd) && !nbd_aio_is_closed (h->nbd)) {
int r;
@@ -433,7 +437,8 @@ nbdplug_reader (void *handle)
unsigned dir;
dir = nbd_aio_get_direction (h->nbd);
- nbdkit_debug ("polling, dir=%d", dir);
+ if (nbd_debug_verbose)
+ nbdkit_debug ("polling, dir=%d", dir);
if (dir & LIBNBD_AIO_DIRECTION_READ)
fds[0].events |= POLLIN;
if (dir & LIBNBD_AIO_DIRECTION_WRITE)
@@ -466,8 +471,11 @@ nbdplug_reader (void *handle)
}
}
- nbdkit_debug ("state machine changed to %s", nbd_connection_state (h->nbd));
- nbdkit_debug ("exiting reader thread");
+ if (nbd_debug_verbose) {
+ nbdkit_debug ("state machine changed to %s",
+ nbd_connection_state (h->nbd));
+ nbdkit_debug ("exiting reader thread");
+ }
return NULL;
}
@@ -481,8 +489,9 @@ nbdplug_notify (void *opaque, int *error)
* updated by nbdplug_register, but it's only an informational
* message.
*/
- nbdkit_debug ("cookie %" PRId64 " completed state machine, status %d",
- trans->cookie, *error);
+ if (nbd_debug_verbose)
+ nbdkit_debug ("cookie %" PRId64 " completed state machine, status %d",
+ trans->cookie, *error);
trans->err = *error;
if (sem_post (&trans->sem)) {
nbdkit_error ("failed to post semaphore: %m");
@@ -514,7 +523,8 @@ nbdplug_register (struct handle *h, struct transaction *trans, int64_t cookie)
return;
}
- nbdkit_debug ("cookie %" PRId64 " started by state machine", cookie);
+ if (nbd_debug_verbose)
+ nbdkit_debug ("cookie %" PRId64 " started by state machine", cookie);
trans->cookie = cookie;
if (write (h->fds[1], &c, 1) == -1 && errno != EAGAIN)
--
2.35.1
2 years, 7 months