Kernel driver I/O block size hinting
by Richard W.M. Jones
This is a follow-up to this thread:
https://listman.redhat.com/archives/libguestfs/2022-June/thread.html#29210
about getting the kernel client (nbd.ko) to obey block size
constraints sent by the NBD server:
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#block-...
I was sent this very interesting design document about the original
intent behind the kernel's I/O limits:
https://people.redhat.com/msnitzer/docs/io-limits.txt
There are four or five kernel block layer settings we could usefully
adjust, and there are three NBD block size constraints, and in my
opinion there's not a very clear mapping between them. But I'll have
a go at what I think we should do.
- - -
(1) Kernel physical_block_size & logical_block_size: The example given
is of a hard disk with 4K physical sectors (AF) which can nevertheless
emulate 512-byte sectors. In this case you'd set physical_block_size
= 4K, logical_block_size = 512b.
Data structures (partition tables, etc) should be aligned to
physical_block_size to avoid unnecessary RMW cycles. But the
fundamental until of I/O is logical_block_size.
Current behaviour of nbd.ko is that logical_block_size ==
physical_block_size == the nbd-client "-b" option (default: 512 bytes,
contradicting the documentation).
I think we should set logical_block_size == physical_block_size ==
MAX (512, NBD minimum block size constraint).
What should happen to the nbd-client -b option?
(2) Kernel minimum_io_size: The documentation says this is the
"preferred minimum unit for random I/O".
Current behaviour of nbd.ko is this is not set.
I think the NBD's preferred block size should map to minimum_io_size.
(3) Kernel optimal_io_size: The documentation says this is the
"[preferred] streaming I/O [size]".
Current behaviour of nbd.ko is this is not set.
NBD doesn't really have the concept of streaming vs random I/O, so we
could either ignore this or set it to the same value as
minimum_io_size.
I have a kernel patch allowing nbd-client to set both minimum_io_size
and optimal_io_size from userspace.
(4) Kernel blk_queue_max_hw_sectors: This is documented as: "set max
sectors for a request ... Enables a low level driver to set a hard
upper limit, max_hw_sectors, on the size of requests."
Current behaviour of nbd.ko is that we set this to 65536 (sectors?
blocks?), which for 512b sectors is 32M.
I think we could set this to MIN (32M, NBD maximum block size constraint),
converting the result to sectors.
- - -
What do people think?
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
2 years, 6 months
[v2v PATCH 0/3] add LUKS-on-LVM test
by Laszlo Ersek
À propos: https://listman.redhat.com/archives/libguestfs/2022-June/029134.html
Port the corresponging commits from guestfs-tools to virt-v2v.
Thanks
Laszlo
Laszlo Ersek (2):
tests: rename "luks" to "lvm-on-luks"
tests: add LUKS-on-LVM test
Richard W.M. Jones (1):
test-data: Replace deprecated luks_open with cryptsetup_open.
.gitignore | 3 +-
test-data/phony-guests/Makefile.am | 15 +++--
test-data/phony-guests/guests.xml.in | 22 ++++++-
test-data/phony-guests/make-fedora-img.pl | 64 ++++++++++++++++++--
tests/Makefile.am | 6 +-
tests/{test-v2v-fedora-luks-conversion.sh => test-v2v-fedora-luks-on-lvm-conversion.sh} | 10 ++-
tests/{test-v2v-fedora-luks-conversion.sh => test-v2v-fedora-lvm-on-luks-conversion.sh} | 2 +-
7 files changed, 104 insertions(+), 18 deletions(-)
copy tests/{test-v2v-fedora-luks-conversion.sh => test-v2v-fedora-luks-on-lvm-conversion.sh} (77%)
rename tests/{test-v2v-fedora-luks-conversion.sh => test-v2v-fedora-lvm-on-luks-conversion.sh} (95%)
--
2.19.1.3.g30247aa5d201
2 years, 6 months
[PATCH virt-v2v 0/3] tests: Add a phony Fedora image for testing
by Richard W.M. Jones
When we split virt-v2v from libguestfs many moons ago, I copied the
test-data/ subdirectory over. I didn't modify it much, and it
contains much test data that is irrelevant to virt-v2v. (This change
does _not_ clean up any of that ...) However we did use the phony
Windows image (test-data/phony-guests/windows.img) to do a semblance
of testing Windows conversions, or as much as can be done without
having the proprietary operating system itself around.
We never used any of the Linux images, and in fact (before this
change) they could not be used. They simply don't appear close enough
to a real guest to work with virt-v2v. In particular they lack
installed kernels, modules, bootloader config and the program needed
to rebuild the initramfs.
This change fixes the Fedora image(s) so they can be used for testing,
and adds conversion of those to the testsuite.
I already pushed the first commit in this series since it was a big
binary blob update to the phony Fedora RPM database that was not
reviewable:
https://github.com/libguestfs/virt-v2v/commit/0d1b2ec1b733db1ca0bebf2e4a9...
Rich.
2 years, 6 months
How to speed up Kernel Client - S3 plugin use-case
by Nikolaus Rath
Hello,
I am trying to improve performance of the scenario where the kernel's
NBD client talks to NBDKit's S3 plugin.
For me, the main bottleneck is currently due to the fact that the kernel
aligns requests to only 512 B, no matter the blocksize reported by
nbdkit.
Using a 512 B object size is not feasible (due to latency and request
overhead). However, with a larger object size there are two conflicting
objectives:
1. To maximize parallelism (which is important to reduce the effects of
connection latency), it's best to limit the size of the kernel's NBD
requests to the object size.
2. To minimize un-aligned writes, it's best to allow arbitrarily large
NBD requests, because the larger the requests the larger the amount of
full blocks that are written. Unfortunately this means that all objects
touched by the request are written sequentially.
I see a number of ways to address that:
1. Change the kernel's NBD code to honor the blocksize reported by the
NBD server. This would be ideal, but I don't feel up to making this
happen. Theoretical solution only.
2. Change the S3 plugin to use multiple threads, so that it can upload
multiple objects in parallel even when they're part of the same NBD
request. The disadvantage is that this adds a second "layer" of
threads, in addition to those started by NBDkit itself.
3. Change NBDkit itself to split up requests *and* distribute them to
multiple threads. I believe this means changes to the core code
because the blocksize filter can't dispatch requests to multiple
threads.
What do people think is the best way to proceed? Is there a fourth
option that I might be missing?
Best,
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
2 years, 6 months
[v2v PATCH 0/4] convert_linux: install the QEMU guest agent with a firstboot script
by Laszlo Ersek
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2028764
I'm going to post the pre-requisite libguestfs-common and guestfs-tools
patches (one patch for each project) in response to this cover letter,
too.
I'm not sure why we want to perform the installation specifically at
*firstboot* <https://bugzilla.redhat.com/show_bug.cgi?id=2028764#c2>.
Virt-v2v already only supports conversions where host and guest arches
are identical; thus, the appliance kernel could run native guest
binaries (if necessary). I've read
<https://libguestfs.org/guestfs.3.html#running-commands>, but I'm not
overly convinced. Firstboot comes with *lots* of complications. Even
<https://libguestfs.org/virt-builder.1.html#installing-packages>
mentions some of them:
> The downsides are that it will take the guest a lot longer to boot
> first time, and there’s nothing much you can do if package
> installation fails (eg. if a network problem means the guest can't
> reach the package repositories).
Anyway, here goes.
Thanks,
Laszlo
Laszlo Ersek (4):
output/create_libvirt_xml: wire up the QEMU guest agent
windows_virtio: remove "install_linux_tools"
convert_linux: extract qemu-guest-agent package name
convert_linux: install the QEMU guest agent with a firstboot script
common | 2 +-
convert/convert_linux.ml | 102 ++++++++++++++++++--
convert/linux.ml | 35 -------
convert/linux.mli | 11 ---
convert/windows_virtio.ml | 42 --------
convert/windows_virtio.mli | 4 -
output/create_libvirt_xml.ml | 11 +++
tests/test-v2v-i-ova.xml | 4 +
8 files changed, 111 insertions(+), 100 deletions(-)
--
2.19.1.3.g30247aa5d201
2 years, 6 months
[nbdkit PATCH 0/3] Fix some NBDKIT_EMULATE_ bugs
by Eric Blake
Rich pointed out an assertion failure in the luks filter caused by a
bug in backend.c's handling of NBDKIT_EMULATE_ZERO for .can_zero;
while fixing it, I found a different bug in NBDKIT_EMULATE_CACHE.
Eric Blake (3):
server: Fix NBDKIT_ZERO_EMULATE from filters
server: Fix NBDKIT_CACHE_EMULATE
tests: Add regression test for NBDKIT_EMULATE_CACHE fix
docs/nbdkit-filter.pod | 8 ++--
docs/nbdkit-plugin.pod | 3 +-
tests/Makefile.am | 2 +
server/backend.c | 38 +++++++++++++++++-
filters/nozero/nozero.c | 39 ++----------------
tests/test-eval-cache.sh | 85 ++++++++++++++++++++++++++++++++++++++++
tests/test-nozero.sh | 78 ++++++++++++++++--------------------
7 files changed, 167 insertions(+), 86 deletions(-)
create mode 100755 tests/test-eval-cache.sh
--
2.36.1
2 years, 6 months