[cross-project PATCH v2] NBD 64-bit extensions
by Eric Blake
This is a cover letter for a set of multi-project patch series all
designed to implement 64-bit operations in NBD, and demonstrate
interoperability of the new extension between projects.
v1 of the project was attempted nearly a year ago:
https://lists.nongnu.org/archive/html/qemu-devel/2021-12/msg00453.html
Since then, I've addressed a lot of preliminary cleanups in libnbd to
make it easier to test things, and incorporated review comments issued
on v1, including:
- no orthogonality between simple/structured replies and 64-bit mode:
once extended headers are negotiated, all transmission traffic (both
from client to server and server to client) uses just one header
size
- add support for the client to pass a payload on commands to the
server, and demonstrate it by further implementing a way to pass a
flag with NBD_CMD_BLOCK_STATUS that says the client is passing a
payload to request status of just a subset of the negotiated
contexts, rather than all possible contexts that were earlier
negotiated during NBD_OPT_SET_META_CONTEXT
- tweaks to the header layouts: tweak block status to provide 64-bit
flags values (although "base:allocation" will remain usable with
just 32-bit flags); reply with offset rather than padding and with
fields rearranged for maximal sharing between client and server
layouts
- word-smithing of the NBD proposed protocol; split things into
several smaller patches, where we can choose how much to take
- more unit tests added in qemu and libnbd
- the series end with RFC patches on whether to support 64-bit hole
responses to NBD_CMD_READ, even though our current limitations say
servers don't have to accept more than a 32M read request and
therefore will never have that big of a hole to read)
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
1 year, 8 months
[libnbd PATCH v2 0/3] Improve nbdsh -u handling
by Eric Blake
v1 was here:
https://listman.redhat.com/archives/libguestfs/2022-October/030216.html
Since then, I've incorporated changes based on Rich's feedback:
swap order of patches 2 and 3
less change in patch 1 (including no unsafe eval(%s) for --uri)
in patch 2, include -c in list of snippets to store, and use dict of
lambdas to map back to the desired action
Eric Blake (3):
nbdsh: Refactor handling of -c
nbdsh: Allow -u interleaved with -c
nbdsh: Improve --help and initial banner contents.
python/nbdsh.py | 142 +++++++++++++++++++++++++++------------------
sh/test-context.sh | 26 ++++-----
sh/test-error.sh | 37 +++++++-----
3 files changed, 119 insertions(+), 86 deletions(-)
--
2.38.1
1 year, 9 months
[PATCH v2v] New virt-v2v-inspector tool
by Richard W.M. Jones
In Kubernetes and tools like Kubevirt, it's not possible to create
some disks and then attach to them (in order to populate them with
data) in one step. This makes virt-v2v conversions awkward because
ideally we would like the output mode (-o kubevirt) to both create the
target disks and populate them at the same time.
So to work around this problem, we need a tool which can inspect the
virt-v2v source hypervisor before we do the conversion in order to
find out how many disks are needed and their sizes. Then we can
create the target disks, and then we can run a second container with
virt-v2v attached to the disks to do the conversion and populate the
output.
n
This is a proposed tool to do this. It essentially uses the same -i*
options as virt-v2v (and no -o* options) and outputs various useful
metadata. Example:
$ ./run virt-v2v-inspector --quiet -i disk /var/tmp/fedora-32.img
virt-v2v-inspector: The QEMU Guest Agent will be installed for this guest
at first boot.
virt-v2v-inspector: warning: /files/boot/grub2/device.map/hd0 references
unknown device "vda". You may have to fix this entry manually after
conversion.
<?xml version='1.0' encoding='utf-8'?>
<v2v-inspection>
<!-- generated by virt-v2v-inspector 2.1.9local,libvirt -->
<program>virt-v2v-inspector</program>
<package>virt-v2v</package>
<version>2.1.9</version>
<disks>
<disk index='0'>
<virtual-size>6442450944</virtual-size>
<allocated estimated='true'>1400897536</allocated>
</disk>
</disks>
<operatingsystem>
<name>linux</name>
<distro>fedora</distro>
<osinfo>fedora32</osinfo>
<arch>x86_64</arch>
<major_version>32</major_version>
<minor_version>0</minor_version>
<package_format>rpm</package_format>
<package_management>dnf</package_management>
<product_name>Fedora 32 (Thirty Two)</product_name>
</operatingsystem>
</v2v-inspection>
There should be sufficient information in the <disks> section to
allocate target disks, plus additional information is printed which
might be useful.
Note that we do a full conversion in order to generate this
information. In particular it's not possible to generate the
<allocated/> estimate without this. It's plausible we could have a
--no-convert option, but I'm not sure it's worthwhile: it would only
save a little time, but would make everything less accurate, plus
maybe it is a good idea to find out if conversion is going to work
before we create the target disks?
I chose XML instead of JSON for output. XML allows us to annotate
elements with attributes like "estimated='true'". It also lets us
represent 64 bit number accurately, where JSON cannot represent such
numbers.
One obvious problem is that (without --quiet) the program mixes up
informational output with the final document, which is a bit of a
pain. Ideas here?
Rich.
1 year, 12 months
Failing guestfish command in
by Addison Gourluck
Hey,
I'm trying to use guestfish to run a few lvm-related commands on an image, but I'm getting errors. The error message suggested that I should send the error output to this mailing list.
The simplest version of the command I'm running is as follows:
```
PSEUDO_DISABLED=1 guestfish -v --format=raw -a MY_IMAGE << EOF
run
pvcreate /dev/sda
EOF
```
and the error that I'm getting is quite long, but it seems to start failing about here:
```
supermin: deleting initramfs files
supermin: chroot
Starting /init script ...
mount: only root can use "--types" option (effective UID is 65534)
/init: line 38: /proc/cmdline: No such file or directory
mount: only root can use "--types" option (effective UID is 65534)
mount: only root can use "--options" option (effective UID is 65534)
dd: failed to open '/dev/urandom': No such file or directory
Failed to redirect standard streams to /dev/null: No such file or directory
```
The errors at this point are quite obviously due to the mount commands somehow failing, and most errors are complaining about /dev/ not existing, and other fatal failures.
This is especially unusual, because running this command normally on Ubuntu 20.04 works fine, but when it runs as part of my build process, it fails. The build tool that I'm using is Yocto's bitbake. I don't think it even begins to try to run the LVM commands, as it seems to fail before it even gets there.
Interestingly, if I clean out the files at /tmp/.guestfs-1000/, then run the test tool like this "TMPDIR=/tmp libguestfs-test-tool", then also run my build with "TMPDIR=/tmp" set, it will succeed normally. The files in /tmp are identical in every way (I shasumed them), except for two: "/tmp/.guestfs-1000/appliance.d/initrd" and "/tmp/.guestfs-1000/appliance.d/root". Their permissions and user/group are also identical.
The versions of relevant packages are as follows:
```
supermin 5.1.20
guestfish 1.40.2
QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.23)
Linux Kernel 5.15.0-53-generic
```
I'm happy to provide more information, or full debug logs, if someone can offer instructions on how best to share them.
Regards,
1 year, 12 months
[hivex PATCH] Add OCaml 5.0 support
by Kate Deplaix
Hi,
Here is a patch to add support to the upcoming OCaml 5.0 to hivex: https://github.com/kit-ty-kate/hivex/commit/e89b4ab014fa764bf392275ba0dfa...
This new release of the compiler will be released early next month.
Note: this patch breaks support for OCaml < 4.07. If you want to keep support I suppose you can add some hacks in the configure.ac file to add "module Stdlib = Pervasives" at the start of the "generator/generator.ml" file but I'm not sure how worth to you it is to support old +5 years old compilers.
Kind regards,
Kate
2 years
[nbdkit PATCH v2 0/2] Allow sh/eval plugins to disconnect
by Eric Blake
I think this addresses all the feedback given on v1. Instead of
packing things into 3 new status values, and using silent/noisy stderr
as the deciding factor between returning OK or ERROR (which in turn
required either code duplication or a nasty goto for control flow),
this version uses 5 new status values. I also ended up adding 'nbdkit
--dump-plugin eval | grep max_known_status' in able to more easily
probe for when nbdkit is new enough to care about the new status
values (which in turn will be handy in a patch I have pending for
libnbd).
Eric Blake (2):
sh: Advertise max known status in --dump-plugin
sh: Add exit status triggers for nbdkit_{shutdown,disconnect}
plugins/eval/nbdkit-eval-plugin.pod | 3 +-
plugins/sh/nbdkit-sh-plugin.pod | 46 +++++-
tests/Makefile.am | 4 +
plugins/sh/call.h | 11 +-
plugins/sh/call.c | 92 +++++------
plugins/sh/methods.c | 6 +-
tests/test-eval-disconnect.sh | 236 ++++++++++++++++++++++++++++
tests/test-eval-dump-plugin.sh | 43 +++++
8 files changed, 383 insertions(+), 58 deletions(-)
create mode 100755 tests/test-eval-disconnect.sh
create mode 100755 tests/test-eval-dump-plugin.sh
--
2.38.1
2 years
[libnbd PATCH] api: Generate flag values as unsigned
by Eric Blake
C says that performing bitwise math on ints can lead to corner-case
undefined values. When we document something as being usable as a
flag with bitwise-OR and similar, we should ensure the values are
unsigned, to avoid those corner-cases in C. Even our own codebase was
potentially falling foul of strict C rules, with code like
tests/errors-server-invalid-offset.c:
uint32_t strict;
strict = nbd_get_strict_mode (nbd) & ~LIBNBD_STRICT_BOUNDS;
In theory, this is an API change that might be detectable by someone
trying to probe which integer promotions occur on a given constant.
In practice, no one writes code that intentionally tries to exploit
whether a flag is signed or unsigned, and none of our values were
large enough to worry about sign-extension effects that occur when
flipping the sign bit of a signed value (we tend to target machines
with twos-complement signed values where the behavior is sane despite
being undefined by C), so this patch is therefore not an ABI change.
Generated code changes as follows:
| --- include/libnbd.h.bak 2022-11-11 11:15:54.929686924 -0600
| +++ include/libnbd.h 2022-11-11 11:15:56.652698919 -0600
| @@ -69,32 +69,32 @@
| #define LIBNBD_SIZE_MAXIMUM 2
| #define LIBNBD_SIZE_PAYLOAD 3
|
| -#define LIBNBD_CMD_FLAG_FUA 0x01
| -#define LIBNBD_CMD_FLAG_NO_HOLE 0x02
| ...
| -#define LIBNBD_ALLOW_TRANSPORT_VSOCK 0x04
| -#define LIBNBD_ALLOW_TRANSPORT_MASK 0x07
| +#define LIBNBD_CMD_FLAG_FUA 0x01U
| +#define LIBNBD_CMD_FLAG_NO_HOLE 0x02U
| ...
| +#define LIBNBD_ALLOW_TRANSPORT_VSOCK 0x04U
| +#define LIBNBD_ALLOW_TRANSPORT_MASK 0x07U
Thanks: Laszlo Ersek <lersek(a)redhat.com>
---
I'll push this along with the LIBNBD_STRICT_PAYLOAD series.
generator/C.ml | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/generator/C.ml b/generator/C.ml
index cfa2964c..f9171996 100644
--- a/generator/C.ml
+++ b/generator/C.ml
@@ -437,11 +437,11 @@ let
List.iter (
fun (flag, i) ->
let flag = sprintf "LIBNBD_%s_%s" flag_prefix flag in
- pr "#define %-40s 0x%02x\n" flag i;
+ pr "#define %-40s 0x%02xU\n" flag i;
mask := !mask lor i
) flags;
let flag = sprintf "LIBNBD_%s_MASK" flag_prefix in
- pr "#define %-40s 0x%02x\n" flag !mask;
+ pr "#define %-40s 0x%02xU\n" flag !mask;
pr "\n"
) all_flags;
List.iter (
--
2.38.1
2 years
[libnbd PATCH 0/2] default to client-side enforcing of max block size
by Eric Blake
I'm getting closer to posting v2 of my 64-bit NBD extension patches.
On the way, I noticed that libnbd was not obeying block-size maximum
constraints, which becomes all the more important when we want to
allow for even larger buffer sizes for servers that support it.
Eric Blake (2):
api: Add LIBNBD_STRICT_PAYLOAD to honor qemu 32M write limit
tests: Test server response to oversize requests
lib/internal.h | 4 +
generator/API.ml | 39 +++--
generator/states-newstyle-opt-export-name.c | 1 +
generator/states-newstyle-opt-go.c | 1 +
generator/states-oldstyle.c | 1 +
lib/flags.c | 25 ++++
lib/rw.c | 8 +-
tests/Makefile.am | 15 +-
tests/errors-client-oversize.c | 18 ++-
tests/errors-server-oversize.c | 151 ++++++++++++++++++++
tests/errors-server-unaligned.c | 2 +
.gitignore | 1 +
12 files changed, 255 insertions(+), 11 deletions(-)
create mode 100644 tests/errors-server-oversize.c
--
2.38.1
2 years
[p2v PATCH] disks.c: skip SCSI floppy drives with no disk inserted
by Laszlo Ersek
A SCSI floppy drive with no disk inserted looks like a normal /dev/sd*
node, but causes the nbdkit file plugin to fail with ENOMEDIUM. Filter out
such devices altogether -- unlike CD-ROMs (for which we create a device
model in the target, albeit with no medium inserted), empty floppies
should not be converted in any way.
https://bugzilla.redhat.com/show_bug.cgi?id=2140997
Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
---
Notes:
This patch still needs testing; the ISO image is at
<http://lacos.interhost.hu/exclude-floppies-rhbz-2140997/b19895a5acd1/live...>,
sha256 sum:
b0666a9140b03e12829982179bf7da2ac5477737fb53760d2e8c527d8a2bf55a.
disks.c | 58 ++++++++++++++++++++
1 file changed, 58 insertions(+)
diff --git a/disks.c b/disks.c
index 4eb006246d84..aafe467f9e9a 100644
--- a/disks.c
+++ b/disks.c
@@ -20,6 +20,7 @@
#include <dirent.h>
#include <errno.h>
#include <error.h>
+#include <fcntl.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
@@ -67,6 +68,57 @@ partition_parent (dev_t part_dev)
return makedev (parent_major, parent_minor);
}
+/**
+ * Return true if the named device (eg. C<dev == "sda">) is a Removable Media
+ * SCSI Disk with no media inserted. This covers floppy drives, but not CD-ROM
+ * drives (intentionally).
+ */
+static int
+device_has_no_media (const char *dev)
+{
+ int ret;
+ gchar *sysfs_pathname;
+ gchar *sysfs_contents;
+ gsize sysfs_size;
+ gchar *dev_pathname;
+ int dev_fd;
+
+ ret = 0;
+
+ if (!STRPREFIX (dev, "sd"))
+ return ret;
+
+ sysfs_pathname = g_strdup_printf ("/sys/block/%s/removable", dev);
+
+ if (!g_file_get_contents (sysfs_pathname, &sysfs_contents, &sysfs_size, NULL))
+ goto free_sysfs_pathname;
+
+ if (sysfs_size < 2 || sysfs_contents[0] != '1' || sysfs_contents[1] != '\n')
+ goto free_sysfs_contents;
+
+ dev_pathname = g_strdup_printf ("/dev/%s", dev);
+
+ dev_fd = open (dev_pathname, O_RDONLY | O_CLOEXEC);
+ if (dev_fd == -1) {
+ if (errno == ENOMEDIUM)
+ ret = 1;
+
+ goto free_dev_pathname;
+ }
+ close (dev_fd);
+
+free_dev_pathname:
+ g_free (dev_pathname);
+
+free_sysfs_contents:
+ g_free (sysfs_contents);
+
+free_sysfs_pathname:
+ g_free (sysfs_pathname);
+
+ return ret;
+}
+
/**
* Return true if the named device (eg. C<dev == "sda">) contains the
* root filesystem. C<root_device> is the major:minor of the root
@@ -139,6 +191,12 @@ find_all_disks (char ***disks, char ***removable)
STRPREFIX (d->d_name, "ubd") ||
STRPREFIX (d->d_name, "vd")) {
char *p;
+ /* Skip SCSI disk drives with removable media that have no media inserted
+ * -- effectively, empty floppy drives. Note that SCSI CD-ROMs are named
+ * C<sr*> and thus handled on the other branch.
+ */
+ if (device_has_no_media (d->d_name))
+ continue;
/* Skip the device containing the root filesystem. */
if (device_contains (d->d_name, root_device))
2 years