[libnbd PATCH] api: Speed up nbd_pread_structured when reading holes
by Eric Blake
Commit c7fbc71d missed an optimization: when we guarantee that a pread
buffer is pre-zeroed, we don't need to waste time memset()ing it when
the server sends a hole. But the next commit e0953cb7 lets the user
skip pre-zeroing, at which point the memset is required to avoid data
corruption. Thankfully, we are not leaking bogus bytes when a server
sends a hole, but we CAN speed up the case where we have pre-zeroed
the buffer to avoid a second memset(). Because the user can change
the state of pread_initialize on the fly (including while a pread is
still in flight), we must track the state per-command as it was before
we send the request to the server.
---
lib/internal.h | 1 +
generator/states-reply-structured.c | 5 +++--
lib/rw.c | 18 +++++++++++-------
3 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/lib/internal.h b/lib/internal.h
index 6aaced3..885cee1 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -353,6 +353,7 @@ struct command {
struct command_cb cb;
enum state state; /* State to resume with on next POLLIN */
bool data_seen; /* For read, true if at least one data chunk seen */
+ bool initialized; /* For read, true if getting a hole may skip memset */
uint32_t error; /* Local errno value */
};
diff --git a/generator/states-reply-structured.c b/generator/states-reply-structured.c
index 7001047..12c24f5 100644
--- a/generator/states-reply-structured.c
+++ b/generator/states-reply-structured.c
@@ -1,5 +1,5 @@
/* nbd client library in userspace: state machine
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -436,7 +436,8 @@ STATE_MACHINE {
* 0-length replies are broken. Still, it's easy enough to support
* them as an extension, and this works even when length == 0.
*/
- memset (cmd->data + offset, 0, length);
+ if (!cmd->initialized)
+ memset (cmd->data + offset, 0, length);
if (CALLBACK_IS_NOT_NULL (cmd->cb.fn.chunk)) {
int error = cmd->error;
diff --git a/lib/rw.c b/lib/rw.c
index c5d041d..f32481a 100644
--- a/lib/rw.c
+++ b/lib/rw.c
@@ -244,14 +244,18 @@ nbd_internal_command_common (struct nbd_handle *h,
if (cb)
cmd->cb = *cb;
- /* For NBD_CMD_READ, cmd->data was pre-zeroed in the prologue
- * created by the generator. Thus, if a (non-compliant) server with
- * structured replies fails to send back sufficient data to cover
- * the whole buffer, we still behave as if it had sent zeroes for
- * those portions, rather than leaking any uninitialized data, and
- * without having to complicate our state machine to track which
- * portions of the read buffer were actually populated.
+ /* For NBD_CMD_READ, cmd->data defaults to being pre-zeroed in the
+ * prologue created by the generator. Thus, if a (non-compliant)
+ * server with structured replies fails to send back sufficient data
+ * to cover the whole buffer, we still behave as if it had sent
+ * zeroes for those portions, rather than leaking any uninitialized
+ * data, and without having to complicate our state machine to track
+ * which portions of the read buffer were actually populated. But
+ * if the user opts in to disabling set_pread_initialize, then we
+ * need to memset zeroes as they are read (and the user gets their
+ * own garbage back in the case of a non-compliant server).
*/
+ cmd->initialized = h->pread_initialize;
/* Add the command to the end of the queue. Kick the state machine
* if there is no other command being processed, otherwise, it will
--
2.36.1
2 years, 5 months
[libguestfs PATCH] appliance, daemon: disable lvm2 devicesfile
by Laszlo Ersek
In guestfs-tools commit 4fe8a03cd2d3 ('sysprep: remove lvm2's default
"system.devices" file', 2022-04-11), we disabled the use of LVM2's new
"devicesfile" feature, which could interfere with the cloning of virtual
machines.
We suspected in
https://bugzilla.redhat.com/show_bug.cgi?id=2072493#c6
that the same lvm2 feature could affect the libguestfs appliance itself,
but decided in
https://bugzilla.redhat.com/show_bug.cgi?id=2072493#c8
https://bugzilla.redhat.com/show_bug.cgi?id=2072493#c10
that this would not be the case, because "appliance/init" already
constructed a pristine LVM_SYSTEM_DIR.
Unfortunately, that's not enough: due to the "use_devicesfile=1" default
(on RHEL9 anyway), some "lvm" invocation, possibly inside the
lvm-set-filter API, *creates* "$LVM_SYSTEM_DIR/devices/system.devices".
And then we get (minimally) warnings such as
> Please remove the lvm.conf global_filter, it is ignored with the devices
> file.
> Please remove the lvm.conf filter, it is ignored with the devices file.
when using the lvm-set-filter API.
Explicitly disable the "devices file" in "appliance/init", and also
whenever we rewrite "lvm.conf" -- that is, in set_filter()
[daemon/lvm-filter.c]. In the former, check for the feature by locating
the devicesfile-related utilities "lvmdevices" and "vgimportdevices". In
the C code, invoke the utilities with the "--help" option instead. (In
"appliance/init", I thought it was best not to call any lvm2 utilities
even with "--help", with our lvm2.conf still under construction there.) If
either utility is available, set "use_devicesfile = 0".
Cc: David Teigland <teigland(a)redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1965941
Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
---
Notes:
Tested:
- on top of "master" (7afbf5ee4415), using a Fedora 35 host, where the
lvm2 packages copied into the appliance do not have the "devicesfile"
feature (-> regression test);
- on top of "rhel-9.1" (ad24b9f4d695), in the form of a Brew scratch
build, using a nightly RHEL-9.1 install, where the lvm2 packages
copied into the appliance have the "devicesfile" feature (-> bugfix
test).
daemon/lvm-filter.c | 19 +++++++++++++++++++
appliance/init | 11 +++++++++++
2 files changed, 30 insertions(+)
diff --git a/daemon/lvm-filter.c b/daemon/lvm-filter.c
index c6dd35156d37..f75e2198831e 100644
--- a/daemon/lvm-filter.c
+++ b/daemon/lvm-filter.c
@@ -68,6 +68,18 @@ free_lvm_system_dir (void)
free (lvm_system_dir);
}
+static bool devicesfile_feature (void)
+{
+ static bool checked, available;
+
+ if (!checked) {
+ checked = true;
+ available = command (NULL, NULL, "lvmdevices", "--help", NULL) == 0 ||
+ command (NULL, NULL, "vgimportdevices", "--help", NULL) == 0;
+ }
+ return available;
+}
+
/* Rewrite the 'filter = [ ... ]' line in lvm.conf. */
static int
set_filter (char *const *filters)
@@ -88,6 +100,13 @@ set_filter (char *const *filters)
}
fprintf (fp, "devices {\n");
+
+ /* If lvm2 supports a "devices file", we need to disable its use
+ * (RHBZ#1965941).
+ */
+ if (devicesfile_feature ())
+ fprintf (fp, " use_devicesfile = 0\n");
+
for (j = 0; filter_types[j] != NULL; ++j) {
fprintf (fp, " %s = [\n", filter_types[j]);
fprintf (fp, " ");
diff --git a/appliance/init b/appliance/init
index 7076821d2250..19aa151b73aa 100755
--- a/appliance/init
+++ b/appliance/init
@@ -134,6 +134,17 @@ mdadm -As --auto=yes --no-degraded
# Empty LVM configuration file means "all defaults".
mkdir -p /tmp/lvm
touch /tmp/lvm/lvm.conf
+
+# If lvm2 supports a "devices file", we need to disable its use
+# (RHBZ#1965941).
+if command -v lvmdevices || command -v vgimportdevices; then
+ {
+ printf 'devices {\n'
+ printf '\tuse_devicesfile = 0\n'
+ printf '}\n'
+ } >> /tmp/lvm/lvm.conf
+fi
+
LVM_SYSTEM_DIR=/tmp/lvm
export LVM_SYSTEM_DIR
lvmetad
base-commit: 7afbf5ee4415f6fa2553898d3af238e794062096
--
2.19.1.3.g30247aa5d201
2 years, 5 months
[nbdkit PATCH] RFC: blocksize: Add test for sharding behavior
by Eric Blake
Demonstrate the bug where an aligned write can be lost if it races
between the read and write of a RMW unaligned write.
---
Sending this out for review of the test; it fails (which it is
supposed to as long as I don't fix the blocksize filter), but I hope
it starts passing once I also patch the filter. In the process, I ran
into a libnbd bug where we were using PyByteArray_AsString()
incorrectly, and segfaulting (rather than diagnosing type mismatch) on
something as simple as nbd.Buffer.from_bytearray(b'1'*16) that looks
like it should work; so I posted a patch for that.
tests/Makefile.am | 2 +
tests/test-blocksize-sharding.sh | 161 +++++++++++++++++++++++++++++++
2 files changed, 163 insertions(+)
create mode 100755 tests/test-blocksize-sharding.sh
diff --git a/tests/Makefile.am b/tests/Makefile.am
index b6d30c0a..67e7618b 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1376,11 +1376,13 @@ TESTS += \
test-blocksize.sh \
test-blocksize-extents.sh \
test-blocksize-default.sh \
+ test-blocksize-sharding.sh \
$(NULL)
EXTRA_DIST += \
test-blocksize.sh \
test-blocksize-extents.sh \
test-blocksize-default.sh \
+ test-blocksize-sharding.sh \
$(NULL)
# blocksize-policy filter test.
diff --git a/tests/test-blocksize-sharding.sh b/tests/test-blocksize-sharding.sh
new file mode 100755
index 00000000..09f12051
--- /dev/null
+++ b/tests/test-blocksize-sharding.sh
@@ -0,0 +1,161 @@
+#!/usr/bin/env bash
+# nbdkit
+# Copyright (C) 2018-2022 Red Hat Inc.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+#
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution.
+#
+# * Neither the name of Red Hat nor the names of its contributors may be
+# used to endorse or promote products derived from this software without
+# specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
+# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR
+# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
+# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
+# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+# SUCH DAMAGE.
+
+# Demonstrate a fix for a bug where blocksize could lose aligned writes
+# run in parallel with unaligned writes
+
+source ./functions.sh
+set -e
+set -x
+
+requires_plugin eval
+requires_nbdsh_uri
+
+files='blocksize-sharding.img blocksize-sharding.tmp'
+rm -f $files
+cleanup_fn rm -f $files
+
+# Script a server that requires 16-byte aligned requests, and which delays
+# 4s after reads if a witness file exists. Couple it with the delay filter
+# that delays 2s before writes. If an unaligned and aligned write overlap,
+# and can execute in parallel, we would have this timeline:
+#
+# T1 aligned write 1's to 0/16 T2 unaligned write 2's to 4/12
+#t=0 blocksize: next->pwrite(0, 16) blocksize: next->pread(0, 16)
+# delay: wait 2s delay: next->pread(0, 16)
+# ... eval: read 0's, wait 4s
+#t=2 delay: next->pwrite(0, 16) ...
+# eval: write 1's ...
+# return ...
+#t=4 return 0's (now stale)
+# blocksize: next->pwrite(0, 16)
+# delay: wait 2s
+#t=6 delay: next->pwrite(0, 16)
+# eval: write stale RMW buffer
+#
+# leaving us with a sharded 0000222222222222 (T1's write is lost).
+# But as long as the blocksize filter detects the overlap, we should end
+# up with either 1111222222222222 (aligned write completed first), or with
+# 1111111111111111 (unaligned write completed first), either taking 8s,
+# but with no sharding.
+#
+# We also need an nbdsh script that kicks off parallel writes.
+export script='
+import os
+import time
+
+witness = os.getenv("witness")
+
+def touch(path):
+ open(path, "a").close()
+
+# First pass: check that two aligned operations work in parallel
+# Total time should be closer to 2 seconds, rather than 4 if serialized
+print("sanity check")
+ba1 = bytearray(b"1"*16)
+ba2 = bytearray(b"2"*16)
+buf1 = nbd.Buffer.from_bytearray(ba1)
+buf2 = nbd.Buffer.from_bytearray(ba2)
+touch(witness)
+start_t = time.time()
+h.aio_pwrite(buf1, 0)
+h.aio_pwrite(buf2, 0)
+
+while h.aio_in_flight() > 0:
+ h.poll(-1)
+end_t = time.time()
+os.unlink(witness)
+
+# assert h.pread(16, 0) in [b"1"*16, b"2"*16]
+print(h.pread(16,0))
+t = end_t - start_t
+print(t)
+assert t <= 3
+
+# Next pass: try to kick off unaligned first
+print("unaligned first")
+h.zero(16, 0)
+ba3 = bytearray(b"3"*12)
+ba4 = bytearray(b"4"*16)
+buf3 = nbd.Buffer.from_bytearray(ba3)
+buf4 = nbd.Buffer.from_bytearray(ba4)
+touch(witness)
+start_t = time.time()
+h.aio_pwrite(buf3, 4)
+h.aio_pwrite(buf4, 0)
+
+while h.aio_in_flight() > 0:
+ h.poll(-1)
+end_t = time.time()
+os.unlink(witness)
+
+# assert h.pread(16, 0) in [b"4"*4 + b"3"*12, b"4"*16]
+print(h.pread(16,0))
+t = end_t - start_t
+print(t)
+
+# Next pass: try to kick off aligned first
+print("aligned first")
+ba5 = bytearray(b"5"*16)
+ba6 = bytearray(b"6"*12)
+buf5 = nbd.Buffer.from_bytearray(ba5)
+buf6 = nbd.Buffer.from_bytearray(ba6)
+h.zero(16, 0)
+touch(witness)
+start_t = time.time()
+h.aio_pwrite(buf5, 0)
+h.aio_pwrite(buf6, 4)
+
+while h.aio_in_flight() > 0:
+ h.poll(-1)
+end_t = time.time()
+os.unlink(witness)
+
+assert h.pread(16, 0) in [b"5"*4 + b"6"*12, b"5"*16]
+t = end_t - start_t
+print(t)
+'
+
+# Now run everything
+truncate --size=16 blocksize-sharding.img
+export witness="$PWD/blocksize-sharding.tmp"
+nbdkit -U - --filter=blocksize --filter=delay eval delay-write=2 \
+ config='ln -sf "$(realpath "$3")" $tmpdir/$2' \
+ img="$PWD/blocksize-sharding.img" tmp="$PWD/blocksize-sharding.tmp" \
+ get_size='echo 16' block_size='echo 16 64K 1M' \
+ thread_model='echo parallel' \
+ zero='dd if=/dev/zero skip=$4 count=$3 iflag=count_bytes,skip_bytes' \
+ pread='
+ dd if=$tmpdir/img skip=$4 count=$3 iflag=count_bytes,skip_bytes
+ if [ -f $tmpdir/tmp ]; then sleep 4; fi ' \
+ pwrite='dd of=$tmpdir/img seek=$4 conv=notrunc oflag=seek_bytes' \
+ --run 'nbdsh -u "$uri" -c "$script"'
--
2.36.1
2 years, 5 months
[libnbd PATCH] python: Accept buffers in nbd.Buffer.from_bytearray()
by Eric Blake
Prior to this patch, the following fails, but at least seems to give a
sensible error:
$ nbdsh -c 'nbd.Buffer.from_bytearray(b"1"*8)'
Traceback (most recent call last):
File "/usr/lib64/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib64/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/lib64/python3.10/site-packages/nbd.py", line 2726, in <module>
nbdsh.shell()
File "/usr/lib64/python3.10/site-packages/nbdsh.py", line 139, in shell
exec(c, d, d)
File "<string>", line 1, in <module>
File "/usr/lib64/python3.10/site-packages/nbd.py", line 132, in from_bytearray
o = libnbdmod.aio_buffer_from_bytearray(ba)
RuntimeError: parameter is not a bytearray
while this version 1 byte longer flat out segfaults:
$ nbdsh -c 'nbd.Buffer.from_bytearray(b"1"*9)'
and this one is subtly different:
$ nbdsh -c 'nbd.Buffer.from_bytearray(h)'
Traceback (most recent call last):
...
File "/usr/lib64/python3.10/site-packages/nbd.py", line 132, in from_bytearray
o = libnbdmod.aio_buffer_from_bytearray(ba)
MemoryError
That's because PyByteArray_AsString() blindly assumes that its
argument is a PyByteArray, and goes haywire when it is not, unless we
got lucky that the incorrectly-typed object behaves similarly enough
(which, for byte literals, is size-dependent).
But Python already has a handy way to convert any object that supports
the buffer interface into a bytearray. Using it, we can now support
many more parameters; passing in b"1"*9 now correctly creates a 9-byte
buffer rather than failing. And the error message for a non-buffer
also improves:
$ ./run nbdsh -c 'nbd.Buffer.from_bytearray(h)'
Traceback (most recent call last):
...
File "/home/eblake/libnbd/python/nbd.py", line 132, in from_bytearray
o = libnbdmod.aio_buffer_from_bytearray(ba)
TypeError: cannot convert 'NBD' object to bytearray
(A reliable TypeError is always better than an unexpected MemoryError
or segfault).
---
python/handle.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/python/handle.c b/python/handle.c
index 9c08dc1..9fe3f8e 100644
--- a/python/handle.c
+++ b/python/handle.c
@@ -1,5 +1,5 @@
/* NBD client library in userspace
- * Copyright (C) 2013-2020 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -159,6 +159,7 @@ PyObject *
nbd_internal_py_aio_buffer_from_bytearray (PyObject *self, PyObject *args)
{
PyObject *obj;
+ PyObject *arr = NULL;
Py_ssize_t len;
void *data;
struct py_aio_buffer *buf;
@@ -169,9 +170,17 @@ nbd_internal_py_aio_buffer_from_bytearray (PyObject *self, PyObject *args)
&obj))
return NULL;
+ if (! PyByteArray_Check (obj)) {
+ arr = PyByteArray_FromObject (obj);
+ if (arr == NULL)
+ return NULL;
+ obj = arr;
+ }
data = PyByteArray_AsString (obj);
if (!data) {
- PyErr_SetString (PyExc_RuntimeError, "parameter is not a bytearray");
+ PyErr_SetString (PyExc_RuntimeError,
+ "parameter is not a bytearray or buffer");
+ Py_XDECREF (arr);
return NULL;
}
len = PyByteArray_Size (obj);
@@ -179,6 +188,7 @@ nbd_internal_py_aio_buffer_from_bytearray (PyObject *self, PyObject *args)
buf = malloc (sizeof *buf);
if (buf == NULL) {
PyErr_NoMemory ();
+ Py_XDECREF (arr);
return NULL;
}
@@ -187,9 +197,11 @@ nbd_internal_py_aio_buffer_from_bytearray (PyObject *self, PyObject *args)
if (buf->data == NULL) {
PyErr_NoMemory ();
free (buf);
+ Py_XDECREF (arr);
return NULL;
}
memcpy (buf->data, data, len);
+ Py_XDECREF (arr);
ret = PyCapsule_New (buf, aio_buffer_name, free_aio_buffer);
if (ret == NULL) {
--
2.36.1
2 years, 5 months