Libguestfs June 2022

guestfs@lists.libguestfs.org

13 participants
64 discussions

libnbd | Failed pipeline for master | 3a1470f7

by GitLab

Pipeline #553165807 has failed! Project: libnbd ( https://gitlab.com/nbdkit/libnbd ) Branch: master ( https://gitlab.com/nbdkit/libnbd/-/commits/master ) Commit: 3a1470f7 ( https://gitlab.com/nbdkit/libnbd/-/commit/3a1470f722970ae2a1930233f683894... ) Commit Message: Change README to use markdown gitlab can parse... Commit Author: Richard W_M_ Jones ( https://gitlab.com/rwmjones ) Pipeline #553165807 ( https://gitlab.com/nbdkit/libnbd/-/pipelines/553165807 ) triggered by Richard W_M_ Jones ( https://gitlab.com/rwmjones ) had 1 failed job. Job #2533064571 ( https://gitlab.com/nbdkit/libnbd/-/jobs/2533064571/raw ) Stage: builds Name: x86_64-opensuse-leap-153 -- You're receiving this email because of your account on gitlab.com.

3 years, 9 months

1
0
0 / 0

libnbd | Failed pipeline for master | dcac6b4f

by GitLab

Pipeline #553166767 has failed! Project: libnbd ( https://gitlab.com/nbdkit/libnbd ) Branch: master ( https://gitlab.com/nbdkit/libnbd/-/commits/master ) Commit: dcac6b4f ( https://gitlab.com/nbdkit/libnbd/-/commit/dcac6b4f06c574df19d4601af12a0bb... ) Commit Message: Fix small whitespace problem in README.md Fixe... Commit Author: Richard W_M_ Jones ( https://gitlab.com/rwmjones ) Pipeline #553166767 ( https://gitlab.com/nbdkit/libnbd/-/pipelines/553166767 ) triggered by Richard W_M_ Jones ( https://gitlab.com/rwmjones ) had 1 failed job. Job #2533069577 ( https://gitlab.com/nbdkit/libnbd/-/jobs/2533069577/raw ) Stage: builds Name: x86_64-opensuse-leap-153 -- You're receiving this email because of your account on gitlab.com.

3 years, 9 months

1
0
0 / 0

[nbdkit PATCH v2] blocksize: Avoid losing aligned writes to RMW race

by Eric Blake

[We are still investigating if a CVE needs to be assigned.] We have a bug where the blocksize filter can lose the effects of an aligned write that happens in parallel with an overlapping unaligned write. In general, a client should be extremely cautious about overlapping writes; it is not NBD's fault which of the two writes lands on the disk (or even if both writes land partially, at block-level granularities on which write landed last). However, a client should still be able to assume that even when two writes partially overlap, the final state of the disk of the non-overlapping portion will be that of the one write explicitly touching that portion, rather than stale data from before either write touched that block. Fix the bug by switching the blocksize filter locking from a mutex to a rwlock. We still need mutual exclusion during all unaligned operations (whether reading or writing to the plugin), because we share the same bounce buffer among all such code, which we get via the mutual exclusion of wrlock. But we now also add in shared exclusion for aligned write-like operations (pwrite, trim, zero); these can operate in parallel with one another, but must not be allowed to fall in the window between when an overlapping unaligned operation has read from the plugin but not yet written, so that we do not end up re-writing stale contents that undo the portion of the aligned write that was outside the unaligned region [it's ironic that we only need this protection on write I/O, but get it by using the rdlock feature of the rwlock]. Read-like operations (pread, extents, cache) don't need to be protected from RMW operations, because it is already the user's problem if they execute parallel overlapping reads while modifying the same portion of the image; and they will still end up seeing either the state before or after the unaligned write, indistinguishable from if we had added more locking. Whether this lock favors rdlock (aka aligned pwrite) or wrlock (aka unaligned access) is implementation-defined by the pthread implementation; if we need to be more precise, we'll have to introduce our own rwlock implementation on top of lower-level primitives (https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock mentions the use of two mutex to favor readers, or a condvar/mutex to favor writers). This is also not very fine-grained; it may turn out that we could get more performance if instead of locking the entire operation (across all clients, and regardless of whether the offsets overlap), we instead just lock metadata and then track whether a new operation overlaps with an existing unaligned operation, or even add in support for parallel unaligned operations by having more than one bounce buffer. But if performance truly matters, you're better off fixing the client to do aligned access in the first place, rather than needing the blocksize filter. Add a test that fails without the change to blocksize.c. With some carefully timed setups (using the delay filter to stall writes reaching the plugin, and the plugin itself to stall reads coming back from the plugin, so that we are reasonably sure of the overall timeline), we can demonstrate the bug present in the unpatched code, where an aligned write is lost when it lands in the wrong part of an unaligned RMW cycle; the fixed code instead shows that the overlapping operations were serialized. We may need to further tweak the test to be more reliable even under heavy load, but hopefully 2 seconds per stall (where a successful test requires 18 seconds) is sufficient for now. Reported-by: Nikolaus Rath <Nikolaus(a)rath.org> Fixes: 1aadbf9971 ("blocksize: Allow parallel requests", v1.25.3) --- In v2: - Implement the blocksize fix. - Drop RFC; I think this is ready, other than determining if we want to tag it with a CVE number. - Improve the test: print out more details before assertions, to aid in debugging if it ever dies under CI resources tests/Makefile.am | 2 + filters/blocksize/blocksize.c | 26 +++-- tests/test-blocksize-sharding.sh | 168 +++++++++++++++++++++++++++++++ 3 files changed, 187 insertions(+), 9 deletions(-) create mode 100755 tests/test-blocksize-sharding.sh diff --git a/tests/Makefile.am b/tests/Makefile.am index b6d30c0a..67e7618b 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -1376,11 +1376,13 @@ TESTS += \ test-blocksize.sh \ test-blocksize-extents.sh \ test-blocksize-default.sh \ + test-blocksize-sharding.sh \ $(NULL) EXTRA_DIST += \ test-blocksize.sh \ test-blocksize-extents.sh \ test-blocksize-default.sh \ + test-blocksize-sharding.sh \ $(NULL) # blocksize-policy filter test. diff --git a/filters/blocksize/blocksize.c b/filters/blocksize/blocksize.c index 03da4971..1c81d5e3 100644 --- a/filters/blocksize/blocksize.c +++ b/filters/blocksize/blocksize.c @@ -51,10 +51,15 @@ #define BLOCKSIZE_MIN_LIMIT (64U * 1024) -/* In order to handle parallel requests safely, this lock must be held - * when using the bounce buffer. +/* Lock in order to handle overlapping requests safely. + * + * Grabbed for exclusive access (wrlock) when using the bounce buffer. + * + * Grabbed for shared access (rdlock) when doing aligned writes. + * These can happen in parallel with one another, but must not land in + * between the read and write of an unaligned RMW operation. */ -static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; +static pthread_rwlock_t lock = PTHREAD_RWLOCK_INITIALIZER; /* A single bounce buffer for alignment purposes, guarded by the lock. * Size it to the maximum we allow for minblock. @@ -255,7 +260,7 @@ blocksize_pread (nbdkit_next *next, /* Unaligned head */ if (offs & (h->minblock - 1)) { - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + ACQUIRE_WRLOCK_FOR_CURRENT_SCOPE (&lock); drop = offs & (h->minblock - 1); keep = MIN (h->minblock - drop, count); if (next->pread (next, bounce, h->minblock, offs - drop, flags, err) == -1) @@ -278,7 +283,7 @@ blocksize_pread (nbdkit_next *next, /* Unaligned tail */ if (count) { - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + ACQUIRE_WRLOCK_FOR_CURRENT_SCOPE (&lock); if (next->pread (next, bounce, h->minblock, offs, flags, err) == -1) return -1; memcpy (buf, bounce, count); @@ -306,7 +311,7 @@ blocksize_pwrite (nbdkit_next *next, /* Unaligned head */ if (offs & (h->minblock - 1)) { - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + ACQUIRE_WRLOCK_FOR_CURRENT_SCOPE (&lock); drop = offs & (h->minblock - 1); keep = MIN (h->minblock - drop, count); if (next->pread (next, bounce, h->minblock, offs - drop, 0, err) == -1) @@ -321,6 +326,7 @@ blocksize_pwrite (nbdkit_next *next, /* Aligned body */ while (count >= h->minblock) { + ACQUIRE_RDLOCK_FOR_CURRENT_SCOPE (&lock); keep = MIN (h->maxdata, ROUND_DOWN (count, h->minblock)); if (next->pwrite (next, buf, keep, offs, flags, err) == -1) return -1; @@ -331,7 +337,7 @@ blocksize_pwrite (nbdkit_next *next, /* Unaligned tail */ if (count) { - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + ACQUIRE_WRLOCK_FOR_CURRENT_SCOPE (&lock); if (next->pread (next, bounce, h->minblock, offs, 0, err) == -1) return -1; memcpy (bounce, buf, count); @@ -371,6 +377,7 @@ blocksize_trim (nbdkit_next *next, /* Aligned body */ while (count) { + ACQUIRE_RDLOCK_FOR_CURRENT_SCOPE (&lock); keep = MIN (h->maxlen, count); if (next->trim (next, keep, offs, flags, err) == -1) return -1; @@ -413,7 +420,7 @@ blocksize_zero (nbdkit_next *next, /* Unaligned head */ if (offs & (h->minblock - 1)) { - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + ACQUIRE_WRLOCK_FOR_CURRENT_SCOPE (&lock); drop = offs & (h->minblock - 1); keep = MIN (h->minblock - drop, count); if (next->pread (next, bounce, h->minblock, offs - drop, 0, err) == -1) @@ -428,6 +435,7 @@ blocksize_zero (nbdkit_next *next, /* Aligned body */ while (count >= h->minblock) { + ACQUIRE_RDLOCK_FOR_CURRENT_SCOPE (&lock); keep = MIN (h->maxlen, ROUND_DOWN (count, h->minblock)); if (next->zero (next, keep, offs, flags, err) == -1) return -1; @@ -437,7 +445,7 @@ blocksize_zero (nbdkit_next *next, /* Unaligned tail */ if (count) { - ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + ACQUIRE_WRLOCK_FOR_CURRENT_SCOPE (&lock); if (next->pread (next, bounce, h->minblock, offs, 0, err) == -1) return -1; memset (bounce, 0, count); diff --git a/tests/test-blocksize-sharding.sh b/tests/test-blocksize-sharding.sh new file mode 100755 index 00000000..177668ed --- /dev/null +++ b/tests/test-blocksize-sharding.sh @@ -0,0 +1,168 @@ +#!/usr/bin/env bash +# nbdkit +# Copyright (C) 2018-2022 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +# Demonstrate a fix for a bug where blocksize could lose aligned writes +# run in parallel with unaligned writes + +source ./functions.sh +set -e +set -x + +requires_plugin eval +requires_nbdsh_uri + +files='blocksize-sharding.img blocksize-sharding.tmp' +rm -f $files +cleanup_fn rm -f $files + +# Script a server that requires 16-byte aligned requests, and which delays +# 4s after reads if a witness file exists. Couple it with the delay filter +# that delays 2s before writes. If an unaligned and aligned write overlap, +# and can execute in parallel, we would have this timeline: +# +# T1 aligned write 1's to 0/16 T2 unaligned write 2's to 4/12 +#t=0 blocksize: next->pwrite(0, 16) blocksize: next->pread(0, 16) +# delay: wait 2s delay: next->pread(0, 16) +# ... eval: read 0's, wait 4s +#t=2 delay: next->pwrite(0, 16) ... +# eval: write 1's ... +# return ... +#t=4 return 0's (now stale) +# blocksize: next->pwrite(0, 16) +# delay: wait 2s +#t=6 delay: next->pwrite(0, 16) +# eval: write stale RMW buffer +# +# leaving us with a sharded 0000222222222222 (T1's write is lost). +# But as long as the blocksize filter detects the overlap, we should end +# up with either 1111222222222222 (aligned write completed first), or with +# 1111111111111111 (unaligned write completed first), either taking 8s, +# but with no sharding. +# +# We also need an nbdsh script that kicks off parallel writes. +export script=' +import os +import time + +witness = os.getenv("witness") + +def touch(path): + open(path, "a").close() + +# First pass: check that two aligned operations work in parallel +# Total time should be closer to 2 seconds, rather than 4 if serialized +print("sanity check") +ba1 = bytearray(b"1"*16) +ba2 = bytearray(b"2"*16) +buf1 = nbd.Buffer.from_bytearray(ba1) +buf2 = nbd.Buffer.from_bytearray(ba2) +touch(witness) +start_t = time.time() +h.aio_pwrite(buf1, 0) +h.aio_pwrite(buf2, 0) + +while h.aio_in_flight() > 0: + h.poll(-1) +end_t = time.time() +os.unlink(witness) + +out = h.pread(16,0) +print(out) +t = end_t - start_t +print(t) +assert out in [b"1"*16, b"2"*16] +assert t >= 2 and t <= 3 + +# Next pass: try to kick off unaligned first +print("unaligned first") +h.zero(16, 0) +ba3 = bytearray(b"3"*12) +ba4 = bytearray(b"4"*16) +buf3 = nbd.Buffer.from_bytearray(ba3) +buf4 = nbd.Buffer.from_bytearray(ba4) +touch(witness) +start_t = time.time() +h.aio_pwrite(buf3, 4) +h.aio_pwrite(buf4, 0) + +while h.aio_in_flight() > 0: + h.poll(-1) +end_t = time.time() +os.unlink(witness) + +out = h.pread(16,0) +print(out) +t = end_t - start_t +print(t) +assert out in [b"4"*4 + b"3"*12, b"4"*16] +assert t >= 8 + +# Next pass: try to kick off aligned first +print("aligned first") +ba5 = bytearray(b"5"*16) +ba6 = bytearray(b"6"*12) +buf5 = nbd.Buffer.from_bytearray(ba5) +buf6 = nbd.Buffer.from_bytearray(ba6) +h.zero(16, 0) +touch(witness) +start_t = time.time() +h.aio_pwrite(buf5, 0) +h.aio_pwrite(buf6, 4) + +while h.aio_in_flight() > 0: + h.poll(-1) +end_t = time.time() +os.unlink(witness) + +out = h.pread(16,0) +print(out) +t = end_t - start_t +print(t) +assert out in [b"5"*4 + b"6"*12, b"5"*16] +assert t >= 8 +' + +# Now run everything +truncate --size=16 blocksize-sharding.img +export witness="$PWD/blocksize-sharding.tmp" +nbdkit -U - --filter=blocksize --filter=delay eval delay-write=2 \ + config='ln -sf "$(realpath "$3")" $tmpdir/$2' \ + img="$PWD/blocksize-sharding.img" tmp="$PWD/blocksize-sharding.tmp" \ + get_size='echo 16' block_size='echo 16 64K 1M' \ + thread_model='echo parallel' \ + zero='dd if=/dev/zero of=$tmpdir/img skip=$4 count=$3 \ + iflag=count_bytes,skip_bytes' \ + pread=' + dd if=$tmpdir/img skip=$4 count=$3 iflag=count_bytes,skip_bytes + if [ -f $tmpdir/tmp ]; then sleep 4; fi ' \ + pwrite='dd of=$tmpdir/img seek=$4 conv=notrunc oflag=seek_bytes' \ + --run 'nbdsh -u "$uri" -c "$script"' -- 2.36.1

3 years, 9 months

3
5
0 / 0

[nbdkit PATCH] linuxdisk: Reduce size of test

by Eric Blake

Using all of ../plugins as the contents for the linuxdisk gets progressively bigger over time with incremental builds. Before I cleaned it up, my plugins/rust/target/debug had accumulated over 6G of cruft, causing test-linuxdisk-copy-out.sh to fail due to ENOSPC during mke2fs. Pick a smaller directory tree to export, but still test that we expose a subdirectory. Fixes: 5da0f2fc9 ("Add linuxdisk plugin.") --- plugins/linuxdisk/subdir/.gitignore | 0 tests/test-linuxdisk-copy-out.sh | 18 +++++++++--------- 2 files changed, 9 insertions(+), 9 deletions(-) create mode 100644 plugins/linuxdisk/subdir/.gitignore diff --git a/plugins/linuxdisk/subdir/.gitignore b/plugins/linuxdisk/subdir/.gitignore new file mode 100644 index 00000000..e69de29b diff --git a/tests/test-linuxdisk-copy-out.sh b/tests/test-linuxdisk-copy-out.sh index 3a38fb58..f926a55c 100755 --- a/tests/test-linuxdisk-copy-out.sh +++ b/tests/test-linuxdisk-copy-out.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash # nbdkit -# Copyright (C) 2018-2020 Red Hat Inc. +# Copyright (C) 2018-2022 Red Hat Inc. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are @@ -49,25 +49,25 @@ cleanup_fn rm -f $files nbdkit -f -v -U - \ --filter=partition \ - linuxdisk $srcdir/../plugins partition=1 label=ROOT \ + linuxdisk $srcdir/../plugins/linuxdisk partition=1 label=ROOT \ --run 'nbdcopy "$uri" linuxdisk-copy-out.img' # Check the disk content. guestfish --ro -a linuxdisk-copy-out.img -m /dev/sda <<EOF # Check some known files and directories exist. ll / - ll /linuxdisk - is-dir /linuxdisk - is-file /linuxdisk/Makefile.am + ll /subdir + is-dir /subdir + is-file /Makefile.am # This reads out all the directory entries and all file contents. tar-out / - | cat >/dev/null # Download some files and compare to local copies. - download /linuxdisk/Makefile linuxdisk-copy-out.test1 - download /linuxdisk/Makefile.am linuxdisk-copy-out.test2 - download /linuxdisk/nbdkit-linuxdisk-plugin.pod linuxdisk-copy-out.test3 - download /linuxdisk/filesystem.c linuxdisk-copy-out.test4 + download /Makefile linuxdisk-copy-out.test1 + download /Makefile.am linuxdisk-copy-out.test2 + download /nbdkit-linuxdisk-plugin.pod linuxdisk-copy-out.test3 + download /filesystem.c linuxdisk-copy-out.test4 EOF # Compare downloaded files to local versions. -- 2.36.1

3 years, 9 months

3
3
0 / 0

← Newer
1
2
3
4
5
6
7
Older →

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Libguestfs June 2022