Testing virt-v2v with slow input and rhv-upload keep alive changes
by Nir Soffer
I tested again these patches:
https://listman.redhat.com/archives/libguestfs/2021-December/msg00187.html
(without patch 2: v2v/lib/util.ml: Get disk allocation from input - not needed)
Short version:
import from horribly slow input source works now
Long version:
Patches tested on top current master:
commit 66b0a40406effd1c63c0ab47a271aa5240a77638
input: Remove incorrect text about passwords in vCenter over HTTPS mode
To test slow input, I used this temporary patch:
diff --git a/input/input_disk.ml b/input/input_disk.ml
index 2b21950a..3dabd3c9 100644
--- a/input/input_disk.ml
+++ b/input/input_disk.ml
@@ -99,6 +99,8 @@ let rec disk_source dir options args =
match input_format with
| "raw" ->
let cmd = Nbdkit.create "file" in
+ Nbdkit.add_filter cmd "delay";
+ Nbdkit.add_arg cmd "delay-extents" "30";
Nbdkit.add_filter cmd "cow";
Nbdkit.add_arg cmd "file" disk;
if Nbdkit.version nbdkit_config >= (1, 22, 0) then
Tested on ovirt 4.5 system, which gives virt-v2v 3600 seconds inactivity_timeout
so it does not close the connection after 60 seconds of inactivity like current
imageio release. To make it behave like release version, I change:
/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/__init__.py
# Authorized connections get a longer timeout from the ticket.
#req.set_connection_timeout(ticket.inactivity_timeout)
# Simulate imageio 2.3, using hard coded 60 seconds timeout.
req.set_connection_timeout(60)
Testing import of fedora-35 disk with 3g random data, created based on Richard
instructions posted a few weeks ago.
virt-v2v command:
$ cat ~/v2v/v2v.sh
#!/bin/sh
vm_name=${1:-v2v}
shift
./run virt-v2v \
-i disk /var/tmp/fedora-35-data.raw \
-o rhv-upload \
-oc https://engine.local/ovirt-engine/api \
-op ~/.config/ovirt/engine.local/password \
-on $vm_name \
-os alpine-nfs-01 \
-of qcow2 \
-oo rhv-cafile=/etc/pki/vdsm/certs/cacert.pem \
-oo rhv-cluster=el8 \
-oo rhv-direct=true \
"$@"
$ ~/v2v/v2v.sh v2v-delay -v >v2v.log 2>&1
█ 100% [****************************************]
Interesting events from v2v.log:
$ grep '^\[' v2v.log | tail -4
[ 43.4] Setting up the destination: -o rhv-upload -oc
https://engine.local/ovirt-engine/api -os alpine-nfs-01
[ 55.0] Copying disk 1/1
[1138.5] Creating output metadata
[1144.7] Finishing off
Copying took 1083 seconds.
Interesting events from the rhv-uplod plugin:
nbdkit: debug: python: load
nbdkit: debug: python: config key=script,
value=/tmp/v2v.uotkG8/rhv-upload-plugin.py
nbdkit: debug: module requested API_VERSION 2
nbdkit: debug: python: config key=size, value=6442450944
nbdkit: debug: python: config key=url,
value=https://host4:54322/images/624751e7-8f1c-45fb-8d6d-3635c2b677b6
nbdkit: debug: python: config key=cafile, value=/etc/pki/vdsm/certs/cacert.pem
nbdkit: debug: python: config key=insecure, value=true
nbdkit: debug: python: config key=is_ovirt_host, value=true
nbdkit: debug: python: config_complete
nbdkit: debug: using thread model: parallel
nbdkit: debug: python: get_ready
nbdkit: debug: bound to unix socket /tmp/v2v.c7O006/out0
nbdkit: debug: written pidfile /run/user/1000/v2vnbdkit.oFxaw7/nbdkit5.pid
nbdkit: debug: python: after_fork
nbdkit: debug: creating https connection host=host4 port=54322
nbdkit: debug: imageio features: flush=True zero=True
unix_socket='\x00/org/ovirt/imageio' max_readers=8 max_writers=8
nbdkit: debug: creating http pool connections=4
nbdkit: debug: creating unix http connection socket='\x00/org/ovirt/imageio'
nbdkit: debug: creating unix http connection socket='\x00/org/ovirt/imageio'
nbdkit: debug: creating unix http connection socket='\x00/org/ovirt/imageio'
nbdkit: debug: creating unix http connection socket='\x00/org/ovirt/imageio'
nbdkit: debug: poolkeeper: started
nbdkit: debug: accepted connection
...
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: poolkeeper: flushing idle connection
nbdkit: debug: python: cleanup
nbdkit: debug: cleaning up
nbdkit: debug: closing http pool
nbdkit: debug: python: unload plugin
Interesting events from image log:
$ grep '\[images\]' daemon.log | grep -A1 -B1 FLUSH
2022-02-10 19:23:57,557 DEBUG (Thread-312) [images] [local] ZERO
size=127074304 offset=141361152 flush=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:24:11,544 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:24:41,554 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:24:41,558 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:24:41,562 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:24:41,566 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:24:57,605 DEBUG (Thread-311) [images] [local] ZERO
size=2097152 offset=536870912 flush=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
--
2022-02-10 19:24:57,609 DEBUG (Thread-314) [images] [local] ZERO
size=117899264 offset=553189376 flush=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:25:11,571 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:25:27,626 DEBUG (Thread-312) [images] [local] ZERO
size=983040 offset=65536 flush=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
--
2022-02-10 19:32:02,484 DEBUG (Thread-314) [images] [local] WRITE
size=262144 offset=4735107072 flush=False close=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:32:41,591 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:32:41,601 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:32:41,604 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:32:41,608 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:33:11,613 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:33:11,617 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:33:11,621 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:33:11,625 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:33:41,630 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:33:41,634 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:33:41,639 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:33:41,643 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:34:11,648 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:34:11,653 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:34:11,657 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:34:11,662 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:34:41,667 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:34:41,672 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:34:41,676 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:34:41,680 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:35:11,684 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:35:11,685 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:35:11,686 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:35:11,688 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:35:32,843 DEBUG (Thread-311) [images] [local] WRITE
size=262144 offset=2266497024 flush=False close=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
--
2022-02-10 19:35:33,522 DEBUG (Thread-314) [images] [local] WRITE
size=262144 offset=2536243200 flush=False close=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:36:11,690 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:36:11,693 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:36:11,694 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:36:11,695 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:36:41,699 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:36:41,703 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:36:41,707 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:36:41,711 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:37:02,667 DEBUG (Thread-312) [images] [local] WRITE
size=262144 offset=4735369216 flush=False close=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
--
2022-02-10 19:38:07,953 DEBUG (Thread-314) [images] [local] WRITE
size=4096 offset=6442446848 flush=False close=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:38:41,719 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:38:41,730 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:38:41,734 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:38:41,738 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:39:11,743 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:39:11,747 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:39:11,751 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:39:11,755 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:39:41,760 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:39:41,764 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:39:41,768 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:39:41,772 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:40:11,777 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:40:11,781 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:40:11,785 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:40:11,789 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:40:41,793 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:40:41,794 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:40:41,795 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:40:41,796 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:11,800 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:11,804 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:11,808 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:11,812 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,160 DEBUG (Thread-311) [images] [local] WRITE
size=262144 offset=3341025280 flush=False close=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
--
2022-02-10 19:41:30,947 DEBUG (Thread-313) [images] [local] WRITE
size=262144 offset=3890806784 flush=False close=False
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,953 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,955 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,956 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,956 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,957 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,958 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,959 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,959 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,960 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,961 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,962 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,963 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,964 INFO (Thread-311) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,965 INFO (Thread-312) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,966 INFO (Thread-314) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
2022-02-10 19:41:30,967 INFO (Thread-313) [images] [local] FLUSH
transfer=ce5a1653-004b-4e59-b119-67f8d42e0323
Stats from imageio logs:
$ grep '\[http\] CLOSE' daemon.log | tail -4
2022-02-10 19:41:30,973 INFO (Thread-312) [http] CLOSE
connection=312 client=local [connection 1 ops, 1083.454034 s]
[dispatch 4057 ops, 17.015172 s] [zero 39 ops, 0.035068 s, 656.44 MiB,
18.28 GiB/s] [zero.zero 39 ops, 0.034430 s, 656.44 MiB, 18.62 GiB/s]
[flush 20 ops, 0.014281 s] [write 3998 ops, 10.133743 s, 994.21 MiB,
98.11 MiB/s] [write.read 3998 ops, 1.623026 s, 994.21 MiB, 612.57
MiB/s] [write.write 3998 ops, 8.352089 s, 994.21 MiB, 119.04 MiB/s]
2022-02-10 19:41:30,973 INFO (Thread-313) [http] CLOSE
connection=313 client=local [connection 1 ops, 1083.453773 s]
[dispatch 4002 ops, 17.051927 s] [write 3944 ops, 10.005780 s, 979.06
MiB, 97.85 MiB/s] [write.read 3944 ops, 1.611647 s, 979.06 MiB, 607.49
MiB/s] [write.write 3944 ops, 8.234478 s, 979.06 MiB, 118.90 MiB/s]
[zero 39 ops, 0.036099 s, 307.39 MiB, 8.32 GiB/s] [zero.zero 39 ops,
0.035290 s, 307.39 MiB, 8.51 GiB/s] [flush 19 ops, 0.015969 s]
2022-02-10 19:41:30,979 INFO (Thread-314) [http] CLOSE
connection=314 client=local [connection 1 ops, 1083.459459 s]
[dispatch 4062 ops, 17.000945 s] [write 3995 ops, 10.088226 s, 992.67
MiB, 98.40 MiB/s] [write.read 3995 ops, 1.628156 s, 992.67 MiB, 609.69
MiB/s] [write.write 3995 ops, 8.299689 s, 992.67 MiB, 119.60 MiB/s]
[zero 48 ops, 0.038225 s, 629.38 MiB, 16.08 GiB/s] [zero.zero 48 ops,
0.037418 s, 629.38 MiB, 16.43 GiB/s] [flush 19 ops, 0.008533 s]
2022-02-10 19:41:33,749 INFO (Thread-441) [http] CLOSE
connection=441 client=local [connection 1 ops, 0.000969 s] [dispatch 1
ops, 0.000326 s]
Attaching the full logs for reference.
Nir
2 years, 9 months
[libnbd PATCH 0/3] libnbd hardening against nbd_pread bugs
by Eric Blake
In documenting the recent CVE-2022-0485 bug in nbdcopy, I pointed out
that the severity of the flaw was server-dependent (a server with
structured replies caused nbdcopy to write zeroes, but a server
without structured replies caused nbdcopy to leak heap contents). In
fact, this series demonstrates that the severity of ignoring read
errors has had server-dependent behavior in ALL stable released
versins of libnbd, predating the nbdcopy bug.
While the core developers were aware of that fact more than a week
ago, it wasn't until this week that the Red Hat secalert team had
finally decided that publicizing this fact does not constitute a
second CVE fix, but is merely a data hardening technique, and
therefore it is not as essential to backport to stable branches as was
the nbdcopy bug fix. Other distros may disagree, so I intentionally
separated this series with an eye towards easy backporting.
Eric Blake (3):
api: Drop server control of memset() prior to NBD_CMD_READ
api: Guarantee sanitized buffer on pread failure
api: Add new API nbd_set_pread_initialize()
lib/internal.h | 5 +-
generator/API.ml | 87 +++++++++++++++++++---
generator/C.ml | 12 ++-
lib/handle.c | 17 ++++-
lib/rw.c | 18 ++---
python/t/110-defaults.py | 3 +-
python/t/120-set-non-defaults.py | 4 +-
ocaml/tests/test_110_defaults.ml | 4 +-
ocaml/tests/test_120_set_non_defaults.ml | 5 +-
tests/errors.c | 34 ++++++++-
golang/libnbd_110_defaults_test.go | 10 ++-
golang/libnbd_120_set_non_defaults_test.go | 12 +++
12 files changed, 179 insertions(+), 32 deletions(-)
--
2.34.1
2 years, 9 months
[guestfs-tools PATCH] virt-resize: replace "wrap" calls with calls to "info"
by Laszlo Ersek
Utilities shouldn't directly call "Std_utils.wrap" for printing
informative messages; the "Tools_utils.info" function handles that better.
Because "info" prints a trailing newline automatically, strip one newline
from the originally wrapped messages. While at it, rewrap (in the source
code) the "Resize operation completed with no errors" message, for better
readability.
The patch introduces some visible (but, arguably, correct) changes to the
output:
> virt-resize: /dev/sda1: This partition will be resized from 1023.9M to
> 2.0G. The filesystem ext4 on /dev/sda1 will be expanded using the
> ‘resize2fs’ method.
> [...]
> virt-resize: Resize operation completed with no errors. Before deleting
> the old disk, carefully check that the resized disk boots and works
> correctly.
These messages now carry the "virt-resize: " prefix, and they are printed
in magenta.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820221
Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
---
resize/resize.ml | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/resize/resize.ml b/resize/resize.ml
index dad453ff99b7..b77e680d49c8 100644
--- a/resize/resize.ml
+++ b/resize/resize.ml
@@ -938,7 +938,7 @@ read the man page virt-resize(1).
""
) in
- wrap (text ^ "\n\n") in
+ info "%s" (text ^ "\n") in
List.iter print_summary partitions;
@@ -969,7 +969,7 @@ read the man page virt-resize(1).
""
) in
- wrap (text ^ "\n\n")
+ info "%s" (text ^ "\n")
) lvs;
if surplus > 0L then (
@@ -983,7 +983,7 @@ read the man page virt-resize(1).
) else
s_" The surplus space will be ignored. Run a partitioning program in the guest to partition this extra space if you want." in
- wrap (text ^ "\n\n")
+ info "%s" (text ^ "\n")
);
printf "**********\n";
@@ -1440,7 +1440,9 @@ read the man page virt-resize(1).
if not (quiet ()) then (
print_newline ();
- wrap (s_"Resize operation completed with no errors. Before deleting the old disk, carefully check that the resized disk boots and works correctly.\n");
+ info "%s" (s_"Resize operation completed with no errors. Before deleting \
+ the old disk, carefully check that the resized disk boots and \
+ works correctly.");
)
let () = run_main_and_handle_errors main
base-commit: 7d5d5e921d3d483a997f40566c1ccabf8a689a8a
--
2.19.1.3.g30247aa5d201
2 years, 9 months
Performance regression in -o rhv-upload questions
by Richard W.M. Jones
Hi Nir,
https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c4
I'm looking for some advice/help with a performance regression in
virt-v2v between 1.44.2 and the latest version. It's very obvious and
reproducible when I do a conversion from a local disk to local RHV
using -o rhv-upload, specifically:
$ time ./run virt-v2v -i disk /var/tmp/fedora-35.qcow2 -o rhv-upload -oc https://ovirt4410.home.annexia.org/ovirt-engine/api -op /tmp/ovirt-passwd -oo rhv-direct -os ovirt-data -on test5 -of raw
The guest is an ordinary Fedora guest containing some junk data.
Now that I've been able to reproduce the problem locally, it turns out
to be not at all what I thought it was going to be. The timings show
that:
- new virt-v2v is much faster doing the copy, but
- new virt-v2v takes ages finalizing the transfer (about 10x longer
than the old version)
virt-v2v 1.44.2:
Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.44.2-rhv-upload.log
[ 63.1] Copying disk 1/1
...
transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finalizing_success
transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a is finished_success
transfer 7aecd359-3706-49f5-8a0c-c8799f7b100a finalized in 6.105 seconds
[ 89.2] Creating output metadata
[ 89.8] Finishing off
virt-v2v 1.45.97:
Complete log: http://oirase.annexia.org/tmp/virt-v2v-1.45.97-rhv-upload.log
[ 71.0] Copying disk 1/1
[ 82.7] Creating output metadata
...
transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finalizing_success
transfer 6ea3d724-16f9-4bda-a33e-69a783480abc is finished_success
transfer 6ea3d724-16f9-4bda-a33e-69a783480abc finalized in 61.552 seconds
[ 144.9] Finishing off
It's not a completely like-for-like comparison because the rhv-upload
backend changed a lot between these versions. In particular if I was
going to pick some suspicious change it would be my refactoring here
which was supposed to be neutral but maybe isn't:
https://github.com/libguestfs/virt-v2v/commit/143a22860216b94d3a817061930...
Unfortunately this commit is almost impossible to revert because of
deep restructuring of surrounding code.
Both branches have your earlier fix to finalization, so it shouldn't
be this:
https://github.com/libguestfs/virt-v2v/commit/79702b28329d15a7485801ed7e9...
Another idea:
Old virt-v2v uses qemu-img convert which does not flush by default.
New virt-v2v uses nbdcopy with the --flush option, so it will call
imageio PATCH /path ... "op":"flush" at the end. However removing
nbdcopy --flush didn't help very much (only a couple of seconds off).
Do you have any other ideas?
What exactly does imageio do during the finalizing_success state?
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
2 years, 9 months
[PATCH v2v] output: -o rhv-upload: Kill nbdkit instances before finalizing
by Richard W.M. Jones
If nbdkit instance(s) are still running then they will hold open some
http connections to imageio. In some versions of imageio, starting
finalization in this state causes a 60 second timeout.
See-also: https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html
Thanks: Nir Soffer
---
output/output_rhv_upload.ml | 43 +++++++++++++++++++++++++++----------
1 file changed, 32 insertions(+), 11 deletions(-)
diff --git a/output/output_rhv_upload.ml b/output/output_rhv_upload.ml
index 4d8dc1c135..1fefed123c 100644
--- a/output/output_rhv_upload.ml
+++ b/output/output_rhv_upload.ml
@@ -280,12 +280,26 @@ e command line has to match the number of guest disk images (for this guest: %d)
ignore (Python_script.run_command cancel_script json_params [])
in
- (* Set up an at-exit handler so we delete the orphan disks on failure. *)
+ (* Set up an at-exit handler to perform some cleanups.
+ * - Kill nbdkit PIDs (only before finalization).
+ * - Delete the orphan disks (only on conversion failure).
+ *)
+ let nbdkit_pids = ref [] in
On_exit.f (
fun () ->
+ (* Kill the nbdkit PIDs. *)
+ List.iter (
+ fun pid ->
+ try
+ kill pid Sys.sigterm
+ with exn -> debug "%s" (Printexc.to_string exn)
+ ) !nbdkit_pids;
+ nbdkit_pids := [];
+
(* virt-v2v writes v2vdir/done on success only. *)
- let success = Sys.file_exists (dir // "done") in
- if not success then (
+ let on_failure = not (Sys.file_exists (dir // "done")) in
+
+ if on_failure then (
if disk_uuids <> [] then
cancel !transfer_ids disk_uuids
)
@@ -351,11 +365,7 @@ e command line has to match the number of guest disk images (for this guest: %d)
if is_ovirt_host then
Nbdkit.add_arg cmd "is_ovirt_host" "true";
let _, pid = Nbdkit.run_unix ~socket cmd in
-
- (* --exit-with-parent should ensure nbdkit is cleaned
- * up when we exit, but it's not supported everywhere.
- *)
- On_exit.kill pid;
+ List.push_front pid nbdkit_pids
) (List.combine disks disk_uuids);
(* Stash some data we will need during finalization. *)
@@ -363,7 +373,7 @@ e command line has to match the number of guest disk images (for this guest: %d)
let t = (disk_sizes : int64 list), disk_uuids, !transfer_ids,
finalize_script, createvm_script, json_params,
rhv_storagedomain_uuid, rhv_cluster_uuid,
- rhv_cluster_cpu_architecture, rhv_cluster_name in
+ rhv_cluster_cpu_architecture, rhv_cluster_name, nbdkit_pids in
t
and rhv_upload_finalize dir source inspect target_meta
@@ -374,7 +384,8 @@ and rhv_upload_finalize dir source inspect target_meta
(disk_sizes, disk_uuids, transfer_ids,
finalize_script, createvm_script, json_params,
rhv_storagedomain_uuid, rhv_cluster_uuid,
- rhv_cluster_cpu_architecture, rhv_cluster_name) =
+ rhv_cluster_cpu_architecture, rhv_cluster_name,
+ nbdkit_pids) =
(* Check the cluster CPU arch matches what we derived about the
* guest during conversion.
*)
@@ -386,6 +397,16 @@ and rhv_upload_finalize dir source inspect target_meta
rhv_cluster_name target_meta.guestcaps.gcaps_arch arch
);
+ (* We must kill all our nbdkit instances before finalizing the
+ * transfer. See:
+ * https://listman.redhat.com/archives/libguestfs/2022-February/msg00111.html
+ *
+ * We want to fail here if the kill fails because nbdkit
+ * died already, as that would be unexpected.
+ *)
+ List.iter (fun pid -> kill pid Sys.sigterm) !nbdkit_pids;
+ nbdkit_pids := []; (* Don't kill them again in the On_exit handler. *)
+
(* Finalize all the transfers. *)
let json_params =
let ids = List.map (fun id -> JSON.String id) transfer_ids in
@@ -442,7 +463,7 @@ module RHVUpload = struct
type t = int64 list * string list * string list *
Python_script.script * Python_script.script *
JSON.field list * string option * string option *
- string option * string
+ string option * string * int list ref
let to_string options =
"-o rhv-upload" ^
--
2.32.0
2 years, 9 months
[libnbd PATCH] copy: Fail nbdcopy if NBD read or write fails
by Eric Blake
FIXME: This is CVE-2022-XXXXX (still awaiting assignment of the CVE number).
nbdcopy has a nasty bug when performing multi-threaded copies using
asynchronous nbd calls - it was blindly treating the completion of an
asynchronous command as successful, rather than checking the *error
parameter. This can result in the silent creation of a corrupted
image in two different ways: when a read fails, we blindly wrote
garbage to the destination; when a write fails, we did not flag that
the destination was not written.
Since nbdcopy already calls exit() on a synchronous read or write
failure to a file, doing the same for an asynchronous op to an NBD
server is the simplest solution. A nicer solution, but more invasive
to write and thus not done here, might be to allow up to N retries of
the transaction (in case the read or write failure was transient), or
even having a mode where as much data is copied as possible (portions
of the copy that failed would be logged on stderr, and nbdcopy would
still fail with a non-zero exit status, but this would copy more than
just stopping at the first error).
Note that since we rely on auto-retiring and do NOT call
nbd_aio_command_completed, our completion callbacks must always return
1 (if they do not exit() first), even when acting on *error. As such,
there is no sane way to return an error to a manual caller of the
callback, and therefore we can drop dead code that exit()s if the
callback "fails". It is also worth documenting the contract on when
we must manually call the callback during the asynch_zero callback, so
that we do not leak or double-free the command.
And since we are now exit()ing in the callback if *error is set, we
must be careful that we do not leak an unknown value of errno on paths
where we did not encounter a failure.
Reported-by: Nir Soffer <nsoffer(a)redhat.com>
Fixes: bc896eec4d ("copy: Implement multi-conn, multiple threads, multiple requests in flight.", v1.5.6)
Fixes: https://bugzilla.redhat.com/2046194
---
Once the CVE is assigned, I will also update docs/libnbd-security.pod,
and backport this to affected stable branches.
I'm not sure if I like passing &errno in the manual calls to
cb.callback(); better might be 'int dummy = 0; cb.callback(...,
&dummy)', but changing that could be a separate patch.
I verified that the copy-nbd-error.sh test fails (in both the read and
write directions, by modifying the filter arguments) without this
patch.
TODO | 1 +
copy/Makefile.am | 4 ++-
copy/copy-nbd-error.sh | 54 +++++++++++++++++++++++++++++++++++++
copy/file-ops.c | 21 +++++----------
copy/multi-thread-copying.c | 18 ++++++++++++-
copy/nbdcopy.h | 7 ++---
copy/null-ops.c | 12 +++------
7 files changed, 88 insertions(+), 29 deletions(-)
create mode 100755 copy/copy-nbd-error.sh
diff --git a/TODO b/TODO
index da157942..7c9c15e2 100644
--- a/TODO
+++ b/TODO
@@ -33,6 +33,7 @@ nbdcopy:
- Better page cache usage, see nbdkit-file-plugin options
fadvise=sequential cache=none.
- Consider io_uring if there are performance bottlenecks.
+ - Configurable retries in response to read or write failures.
nbdfuse:
- If you write beyond the end of the virtual file, it returns EIO.
diff --git a/copy/Makefile.am b/copy/Makefile.am
index f2100853..85989798 100644
--- a/copy/Makefile.am
+++ b/copy/Makefile.am
@@ -1,5 +1,5 @@
# nbd client library in userspace
-# Copyright (C) 2020 Red Hat Inc.
+# Copyright (C) 2020-2022 Red Hat Inc.
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
@@ -33,6 +33,7 @@ EXTRA_DIST = \
copy-nbd-to-small-nbd-error.sh \
copy-nbd-to-sparse-file.sh \
copy-nbd-to-stdout.sh \
+ copy-nbd-error.sh \
copy-progress-bar.sh \
copy-sparse.sh \
copy-sparse-allocated.sh \
@@ -124,6 +125,7 @@ TESTS += \
copy-stdin-to-nbd.sh \
copy-stdin-to-null.sh \
copy-nbd-to-stdout.sh \
+ copy-nbd-error.sh \
copy-progress-bar.sh \
copy-sparse.sh \
copy-sparse-allocated.sh \
diff --git a/copy/copy-nbd-error.sh b/copy/copy-nbd-error.sh
new file mode 100755
index 00000000..50723195
--- /dev/null
+++ b/copy/copy-nbd-error.sh
@@ -0,0 +1,54 @@
+#!/usr/bin/env bash
+# nbd client library in userspace
+# Copyright (C) 2022 Red Hat Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+. ../tests/functions.sh
+
+set -e
+set -x
+
+requires nbdkit --exit-with-parent --version
+
+pidfile=copy-nbd-error.pid
+sock=$(mktemp -u /tmp/libnbd-test-copy.XXXXXX)
+cleanup_fn rm -f $pidfile $sock
+
+# Run an nbdkit server that randomly fails.
+nbdkit --exit-with-parent -f -v -P $pidfile -U $sock \
+ --filter=error --filter=noextents \
+ memory size=5M error-pread-rate=0.5 error-pwrite-rate=0.5 &
+# Wait for the pidfile to appear.
+for i in {1..60}; do
+ if test -f $pidfile; then
+ break
+ fi
+ sleep 1
+done
+if ! test -f $pidfile; then
+ echo "$0: nbdkit did not start up"
+ exit 1
+fi
+
+fail=0
+
+# Failure to read should be fatal
+$VG nbdcopy -- "nbd+unix:///?socket=$sock" null: && fail=1
+
+# Failure to write should be fatal
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] "nbd+unix:///?socket=$sock" && fail=1
+
+exit $fail
diff --git a/copy/file-ops.c b/copy/file-ops.c
index 84704341..eb30f924 100644
--- a/copy/file-ops.c
+++ b/copy/file-ops.c
@@ -1,5 +1,5 @@
/* NBD client library in userspace.
- * Copyright (C) 2020 Red Hat Inc.
+ * Copyright (C) 2020-2022 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -589,11 +589,8 @@ file_asynch_read (struct rw *rw,
{
file_synch_read (rw, slice_ptr (command->slice),
command->slice.len, command->offset);
- errno = 0;
- if (cb.callback (cb.user_data, &errno) == -1) {
- perror (rw->name);
- exit (EXIT_FAILURE);
- }
+ errno = 0; /* safe, since file_synch_read exits on error */
+ cb.callback (cb.user_data, &errno);
}
static void
@@ -603,11 +600,8 @@ file_asynch_write (struct rw *rw,
{
file_synch_write (rw, slice_ptr (command->slice),
command->slice.len, command->offset);
- errno = 0;
- if (cb.callback (cb.user_data, &errno) == -1) {
- perror (rw->name);
- exit (EXIT_FAILURE);
- }
+ errno = 0; /* safe, since file_synch_write exits on error */
+ cb.callback (cb.user_data, &errno);
}
static bool
@@ -617,10 +611,7 @@ file_asynch_zero (struct rw *rw, struct command *command,
if (!file_synch_zero (rw, command->offset, command->slice.len, allocate))
return false;
errno = 0;
- if (cb.callback (cb.user_data, &errno) == -1) {
- perror (rw->name);
- exit (EXIT_FAILURE);
- }
+ cb.callback (cb.user_data, &errno);
return true;
}
diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c
index b17ca598..c578c4bc 100644
--- a/copy/multi-thread-copying.c
+++ b/copy/multi-thread-copying.c
@@ -1,5 +1,5 @@
/* NBD client library in userspace.
- * Copyright (C) 2020 Red Hat Inc.
+ * Copyright (C) 2020-2022 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -376,6 +376,13 @@ finished_read (void *vp, int *error)
{
struct command *command = vp;
+ /* XXX - is it worth retrying a failed command? */
+ if (*error) {
+ errno = *error;
+ perror("read");
+ exit (EXIT_FAILURE);
+ }
+
if (allocated || sparse_size == 0) {
/* If sparseness detection (see below) is turned off then we write
* the whole command.
@@ -475,6 +482,7 @@ finished_read (void *vp, int *error)
/* Free the original command since it has been split into
* subcommands and the original is no longer needed.
*/
+ errno = 0;
free_command (command, &errno);
}
@@ -537,6 +545,7 @@ fill_dst_range_with_zeroes (struct command *command)
free (data);
free_and_return:
+ errno = 0;
free_command (command, &errno);
}
@@ -546,6 +555,13 @@ free_command (void *vp, int *error)
struct command *command = vp;
struct buffer *buffer = command->slice.buffer;
+ /* XXX - is it worth retrying a failed command? */
+ if (*error) {
+ errno = *error;
+ perror("write");
+ exit (EXIT_FAILURE);
+ }
+
if (buffer != NULL) {
if (--buffer->refs == 0) {
free (buffer->data);
diff --git a/copy/nbdcopy.h b/copy/nbdcopy.h
index e7fe1eab..c070f8d7 100644
--- a/copy/nbdcopy.h
+++ b/copy/nbdcopy.h
@@ -1,5 +1,5 @@
/* NBD client library in userspace.
- * Copyright (C) 2020-2021 Red Hat Inc.
+ * Copyright (C) 2020-2022 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -157,7 +157,8 @@ struct rw_ops {
bool allocate);
/* Asynchronous I/O operations. These start the operation and call
- * 'cb' on completion.
+ * 'cb' on completion. 'cb' will return 1, for auto-retiring with
+ * asynchronous libnbd calls.
*
* The file_ops versions are actually implemented synchronously, but
* still call 'cb'.
@@ -173,7 +174,7 @@ struct rw_ops {
nbd_completion_callback cb);
/* Asynchronously zero. command->slice.buffer is not used. If not possible,
- * returns false.
+ * returns false. 'cb' must be called only if returning true.
*/
bool (*asynch_zero) (struct rw *rw, struct command *command,
nbd_completion_callback cb, bool allocate);
diff --git a/copy/null-ops.c b/copy/null-ops.c
index a38666d6..924eaf1e 100644
--- a/copy/null-ops.c
+++ b/copy/null-ops.c
@@ -1,5 +1,5 @@
/* NBD client library in userspace.
- * Copyright (C) 2020-2021 Red Hat Inc.
+ * Copyright (C) 2020-2022 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -118,10 +118,7 @@ null_asynch_write (struct rw *rw,
nbd_completion_callback cb)
{
errno = 0;
- if (cb.callback (cb.user_data, &errno) == -1) {
- perror (rw->name);
- exit (EXIT_FAILURE);
- }
+ cb.callback (cb.user_data, &errno);
}
static bool
@@ -129,10 +126,7 @@ null_asynch_zero (struct rw *rw, struct command *command,
nbd_completion_callback cb, bool allocate)
{
errno = 0;
- if (cb.callback (cb.user_data, &errno) == -1) {
- perror (rw->name);
- exit (EXIT_FAILURE);
- }
+ cb.callback (cb.user_data, &errno);
return true;
}
--
2.34.1
2 years, 9 months
[libnbd PATCH v2 0/5] CVE-2022-0485 fix: nbdcopy silent corruption
by Eric Blake
Here's the summary of the diff to v1:
Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
001/5:[down] 'python: tests: Fix error handling'
002/5:[down] 'ocaml: tests: Fix error handling'
003/5:[0083] [FC] 'docs: Clarify how callbacks should handle errors'
004/5:[down] 'copy: Pass in dummy variable rather than &errno to callback'
005/5:[down] 'copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails'
except that my version of the git backport-diff script doesn't handle
renamed patches well.
Patch 3 was originally at:
https://listman.redhat.com/archives/libguestfs/2022-February/msg00037.html
Patch 5 was originally at:
https://listman.redhat.com/archives/libguestfs/2022-February/msg00039.html
the rest are indeed new to this posting. Enough has changed that I
didn't carry forward any positive reviews from Rich, Nir, or Laszlo.
At the bottom of the patch, I'll include the full diff of how v2
improves over v1.
Eric Blake (5):
python: tests: Fix error handling
ocaml: tests: Fix error handling
docs: Clarify how callbacks should handle errors
copy: Pass in dummy variable rather than &errno to callback
copy: CVE-2022-0485: Fail nbdcopy if NBD read or write fails
docs/libnbd.pod | 69 +++++++++++++++++++++-------
generator/API.ml | 24 +++++++---
python/t/590-aio-copy.py | 4 +-
ocaml/tests/test_590_aio_copy.ml | 8 ++--
TODO | 1 +
copy/Makefile.am | 4 +-
copy/copy-nbd-error.sh | 78 ++++++++++++++++++++++++++++++++
copy/file-ops.c | 28 +++++-------
copy/multi-thread-copying.c | 23 ++++++++--
copy/nbdcopy.h | 7 +--
copy/null-ops.c | 18 +++-----
11 files changed, 204 insertions(+), 60 deletions(-)
create mode 100755 copy/copy-nbd-error.sh
diff --git c/docs/libnbd.pod w/docs/libnbd.pod
index 006d530c..15bdf0a8 100644
--- c/docs/libnbd.pod
+++ w/docs/libnbd.pod
@@ -826,11 +826,15 @@ This can be used to free associated C<user_data>. For example:
(nbd_chunk_callback) { .callback = my_fn,
.user_data = my_data,
.free = free },
- NBD_NULL_CALLBACK(completion),
+ NBD_NULL_COMPLETION,
0);
-will call L<free(3)> on C<my_data> after the last time that the
-S<C<chunk.callback = my_fn>> function is called.
+will call L<free(3)> once on C<my_data> after the point where it is
+known that the S<C<chunk.callback = my_fn>> function can no longer be
+called, regardless of how many times C<my_fn> was actually called. If
+both a mid-command and completion callback are supplied, the functions
+will be reached in this order: mid-function callbacks, completion
+callback, mid-function free, and finally completion free.
The free function is only accessible in the C API as it is not needed
in garbage collected programming languages.
@@ -858,14 +862,16 @@ same nbd object, as it would cause deadlock.
=head2 Completion callbacks
-All of the low-level commands have a completion callback variant that
-registers a callback function used right before the command is marked
-complete.
+All of the asychronous commands have an optional completion callback
+function that is used right before the command is marked complete,
+after any mid-command callbacks have finished, and before any free
+functions.
When the completion callback returns C<1>, the command is
automatically retired (there is no need to call
-L<nbd_aio_command_completed(3)>); for any other return value, the command
-still needs to be retired.
+L<nbd_aio_command_completed(3)>); for any other return value, the
+command still needs to be manually retired (otherwise, the command
+will tie up resources until L<nbd_close(3)> is eventually reached).
=head2 Callbacks with C<int *error> parameter
@@ -874,37 +880,42 @@ L<nbd_block_status(3)>) involve the use of a callback function invoked
by the state machine at appropriate points in the server's reply
before the overall command is complete. These callback functions,
along with all of the completion callbacks, include a parameter
-C<error> containing the value of any error detected so far. If a
-callback function fails and wants to change the resulting error of the
-overall command visible later in the API sequence, it should assign
-back into C<error> and return C<-1>. Assignments into C<error> are
-ignored for any other return value; similarly, assigning C<0> into
-C<error> does not have an effect.
+C<error> which is a pointer containing the value of any error detected
+so far. If a callback function fails and wants to change the
+resulting error of the overall command visible later in the API
+sequence, it should assign back into C<error> and return C<-1> in the
+C API. Assignments into C<error> are ignored for any other return
+value; similarly, assigning C<0> into C<error> does not have an
+effect. In other language bindings, reporting callback errors is
+generally done by raising an exception rather than by return value.
Note that a mid-command callback might never be reached, such as if
libnbd detects that the command was invalid to send (see
-L<nbd_set_strict_mode(3)>) or if the server reports a failure. It is
-safe for a mid-command callback to ignore non-zero C<error>: all the
-other parameters to the mid-command callback will still be valid
-(corresponding to the current portion of the server's reply), and the
-overall command will still fail (at the completion callback or
-L<nbd_aio_command_completed(3)> for an asynchronous command, or as the
-result of the overall synchronous command). Returing C<-1> from a
-mid-command callback does not prevent that callback from being reached
-again, if the server sends more mid-command replies that warrant
-another use of that callback. A mid-command callback may be reached
-more times than expected if the server is non-compliant.
+L<nbd_set_strict_mode(3)>) or if the server reports a failure that
+concludes the command. It is safe for a mid-command callback to
+ignore non-zero C<error>: all the other parameters to the mid-command
+callback will still be valid (corresponding to the current portion of
+the server's reply), and the overall command will still fail (at the
+completion callback or L<nbd_aio_command_completed(3)> for an
+asynchronous command, or as the result of the overall synchronous
+command). Returing C<-1> from a mid-command callback does not prevent
+that callback from being reached again, if the server sends more
+mid-command replies that warrant another use of that callback. A
+mid-command callback may be reached more times than expected if the
+server is non-compliant.
On the other hand, if a completion callback is supplied (only possible
with asynchronous commands), it will always be reached exactly once,
-and the completion callback must not ignore the value of C<error>. In
-particular, the content of a buffer passed to L<nbd_aio_pread(3)> or
-L<nbd_aio_pread_structured(3)> is unspecified if C<error> is set on
-entry to the completion callback. It is recommended that if your code
-uses automatic command retirement, then the completion function should
-return C<1> on all control paths, even when handling errors (note that
-with automatic retirement, assigning into C<error> is pointless as
-there is no later API to see that value).
+and the completion callback must not ignore the value pointed to by
+C<error>. In particular, the content of a buffer passed to
+L<nbd_aio_pread(3)> or L<nbd_aio_pread_structured(3)> is undefined
+if C<*error> is non-zero on entry to the completion callback. It is
+recommended that if you choose to use automatic command retirement
+(where the completion callback returns C<1> to avoid needing to call
+L<nbd_aio_command_completed(3)> later), your completion function
+should return C<1> on all control paths, even when handling errors
+(note that with automatic retirement, assigning into C<error> is
+pointless as there is no later API to see that value).
=head1 COMPILING YOUR PROGRAM
diff --git c/generator/API.ml w/generator/API.ml
index bdb8e7c8..012016bc 100644
--- c/generator/API.ml
+++ w/generator/API.ml
@@ -1831,7 +1831,7 @@ "pread", {
The C<flags> parameter must be C<0> for now (it exists for future NBD
protocol extensions).
-Note that if this command fails, the contents of C<buf> are unspecified."
+Note that if this command fails, the contents of C<buf> are undefined."
^ strict_call_description;
see_also = [Link "aio_pread"; Link "pread_structured";
Link "get_block_size"; Link "set_strict_mode"];
@@ -1918,7 +1918,7 @@ "pread_structured", {
this, see L<nbd_can_df(3)>). Libnbd does not validate that the server
actually obeys the flag.
-Note that if this command fails, the contents of C<buf> are unspecified."
+Note that if this command fails, the contents of C<buf> are undefined."
^ strict_call_description;
see_also = [Link "can_df"; Link "pread";
Link "aio_pread_structured"; Link "get_block_size";
@@ -2459,7 +2459,7 @@ "aio_pread", {
as described in L<libnbd(3)/Completion callbacks>.
Note that you must ensure C<buf> is valid until the command has
-completed. Furthermore, the contents of C<buf> are unspecified if the
+completed. Furthermore, the contents of C<buf> are undefined if the
C<error> parameter to C<completion_callback> is set, or if
L<nbd_aio_command_completed(3)> reports failure. Other parameters behave
as documented in L<nbd_pread(3)>."
@@ -2487,7 +2487,7 @@ "aio_pread_structured", {
as described in L<libnbd(3)/Completion callbacks>.
Note that you must ensure C<buf> is valid until the command has
-completed. Furthermore, the contents of C<buf> are unspecified if the
+completed. Furthermore, the contents of C<buf> are undefined if the
C<error> parameter to C<completion_callback> is set, or if
L<nbd_aio_command_completed(3)> reports failure. Other parameters behave
as documented in L<nbd_pread_structured(3)>."
diff --git c/python/t/590-aio-copy.py w/python/t/590-aio-copy.py
index 6cde5934..861fa6c8 100644
--- c/python/t/590-aio-copy.py
+++ w/python/t/590-aio-copy.py
@@ -1,5 +1,5 @@
# libnbd Python bindings
-# Copyright (C) 2010-2019 Red Hat Inc.
+# Copyright (C) 2010-2022 Red Hat Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -36,6 +36,7 @@ def asynch_copy(src, dst):
# This callback is called when any pread from the source
# has completed.
def read_completed(buf, offset, error):
+ assert not error
global bytes_read
bytes_read += buf.size()
wr = (buf, offset)
@@ -46,6 +47,7 @@ def asynch_copy(src, dst):
# This callback is called when any pwrite to the destination
# has completed.
def write_completed(buf, error):
+ assert not error
global bytes_written
bytes_written += buf.size()
# By returning 1 here we auto-retire the pwrite command.
diff --git c/ocaml/tests/test_590_aio_copy.ml w/ocaml/tests/test_590_aio_copy.ml
index 11d89256..a0339c8b 100644
--- c/ocaml/tests/test_590_aio_copy.ml
+++ w/ocaml/tests/test_590_aio_copy.ml
@@ -1,6 +1,6 @@
(* hey emacs, this is OCaml code: -*- tuareg -*- *)
(* libnbd OCaml test case
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2022 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -45,7 +45,8 @@ let
* next iteration of the loop.
*)
let writes = ref [] in
- let read_completed buf offset _ =
+ let read_completed buf offset err =
+ assert (!err = 0);
bytes_read := !bytes_read + NBD.Buffer.size buf;
(* Get ready to issue a write command. *)
writes := (buf, offset) :: !writes;
@@ -56,7 +57,8 @@ let
(* This callback is called when any pwrite to the destination
* has completed.
*)
- let write_completed buf _ =
+ let write_completed buf err =
+ assert (!err = 0);
bytes_written := !bytes_written + NBD.Buffer.size buf;
(* By returning 1 here we auto-retire the pwrite command. *)
1
diff --git c/copy/Makefile.am w/copy/Makefile.am
index 85989798..e729f86a 100644
--- c/copy/Makefile.am
+++ w/copy/Makefile.am
@@ -33,7 +33,7 @@ EXTRA_DIST = \
copy-nbd-to-small-nbd-error.sh \
copy-nbd-to-sparse-file.sh \
copy-nbd-to-stdout.sh \
- copy-nbd-error.sh \
+ copy-nbd-error.sh \
copy-progress-bar.sh \
copy-sparse.sh \
copy-sparse-allocated.sh \
@@ -125,7 +125,7 @@ TESTS += \
copy-stdin-to-nbd.sh \
copy-stdin-to-null.sh \
copy-nbd-to-stdout.sh \
- copy-nbd-error.sh \
+ copy-nbd-error.sh \
copy-progress-bar.sh \
copy-sparse.sh \
copy-sparse-allocated.sh \
diff --git c/copy/copy-nbd-error.sh w/copy/copy-nbd-error.sh
index 50723195..89f0a2f1 100755
--- c/copy/copy-nbd-error.sh
+++ w/copy/copy-nbd-error.sh
@@ -16,6 +16,9 @@
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+# Tests several scenarios of handling NBD server errors
+# Serves as a regression test for the CVE-2022-0485 fix.
+
. ../tests/functions.sh
set -e
@@ -23,32 +26,53 @@ set -x
requires nbdkit --exit-with-parent --version
-pidfile=copy-nbd-error.pid
-sock=$(mktemp -u /tmp/libnbd-test-copy.XXXXXX)
-cleanup_fn rm -f $pidfile $sock
-
-# Run an nbdkit server that randomly fails.
-nbdkit --exit-with-parent -f -v -P $pidfile -U $sock \
- --filter=error --filter=noextents \
- memory size=5M error-pread-rate=0.5 error-pwrite-rate=0.5 &
-# Wait for the pidfile to appear.
-for i in {1..60}; do
- if test -f $pidfile; then
- break
- fi
- sleep 1
-done
-if ! test -f $pidfile; then
- echo "$0: nbdkit did not start up"
- exit 1
-fi
-
fail=0
+# Failure to get block status should not be fatal, but merely downgrade to
+# reading the entire image as if data
+echo "Testing extents failures on source"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
+ error-extents-rate=1 ] null: || fail=1
+
# Failure to read should be fatal
-$VG nbdcopy -- "nbd+unix:///?socket=$sock" null: && fail=1
+echo "Testing read failures on non-sparse source"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M \
+ error-pread-rate=0.5 ] null: && fail=1
-# Failure to write should be fatal
-$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] "nbd+unix:///?socket=$sock" && fail=1
+# However, reliable block status on a sparse image can avoid the need to read
+echo "Testing read failures on sparse source"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error null 5M \
+ error-pread-rate=1 ] null: || fail=1
+
+# Failure to write data should be fatal
+echo "Testing write data failures on arbitrary destination"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v pattern 5M ] \
+ [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \
+ memory 5M error-pwrite-rate=0.5 ] && fail=1
+
+# However, writing zeroes can bypass the need for normal writes
+echo "Testing write data failures from sparse source"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \
+ [ nbdkit --exit-with-parent -v --filter=error --filter=noextents \
+ memory 5M error-pwrite-rate=1 ] || fail=1
+
+# Failure to write zeroes should be fatal
+echo "Testing write zero failures on arbitrary destination"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \
+ [ nbdkit --exit-with-parent -v --filter=error memory 5M \
+ error-zero-rate=1 ] && fail=1
+
+# However, assuming/learning destination is zero can skip need to write
+echo "Testing write failures on pre-zeroed destination"
+$VG nbdcopy --destination-is-zero -- \
+ [ nbdkit --exit-with-parent -v null 5M ] \
+ [ nbdkit --exit-with-parent -v --filter=error memory 5M \
+ error-pwrite-rate=1 error-zero-rate=1 ] || fail=1
+
+# Likewise, when write zero is not advertised, fallback to normal write works
+echo "Testing write zeroes to destination without zero support"
+$VG nbdcopy -- [ nbdkit --exit-with-parent -v null 5M ] \
+ [ nbdkit --exit-with-parent -v --filter=nozero --filter=error memory 5M \
+ error-zero-rate=1 ] || fail=1
exit $fail
diff --git c/copy/file-ops.c w/copy/file-ops.c
index eb30f924..aaf04ade 100644
--- c/copy/file-ops.c
+++ w/copy/file-ops.c
@@ -587,10 +587,12 @@ file_asynch_read (struct rw *rw,
struct command *command,
nbd_completion_callback cb)
{
+ int dummy = 0;
+
file_synch_read (rw, slice_ptr (command->slice),
command->slice.len, command->offset);
- errno = 0; /* safe, since file_synch_read exits on error */
- cb.callback (cb.user_data, &errno);
+ /* file_synch_read called exit() on error */
+ cb.callback (cb.user_data, &dummy);
}
static void
@@ -598,20 +600,23 @@ file_asynch_write (struct rw *rw,
struct command *command,
nbd_completion_callback cb)
{
+ int dummy = 0;
+
file_synch_write (rw, slice_ptr (command->slice),
command->slice.len, command->offset);
- errno = 0; /* safe, since file_synch_write exits on error */
- cb.callback (cb.user_data, &errno);
+ /* file_synch_write called exit() on error */
+ cb.callback (cb.user_data, &dummy);
}
static bool
file_asynch_zero (struct rw *rw, struct command *command,
nbd_completion_callback cb, bool allocate)
{
+ int dummy = 0;
+
if (!file_synch_zero (rw, command->offset, command->slice.len, allocate))
return false;
- errno = 0;
- cb.callback (cb.user_data, &errno);
+ cb.callback (cb.user_data, &dummy);
return true;
}
diff --git c/copy/multi-thread-copying.c w/copy/multi-thread-copying.c
index c578c4bc..2aa74d63 100644
--- c/copy/multi-thread-copying.c
+++ w/copy/multi-thread-copying.c
@@ -28,6 +28,7 @@
#include <errno.h>
#include <assert.h>
#include <sys/stat.h>
+#include <inttypes.h>
#include <pthread.h>
@@ -378,8 +379,8 @@ finished_read (void *vp, int *error)
/* XXX - is it worth retrying a failed command? */
if (*error) {
- errno = *error;
- perror("read");
+ fprintf (stderr, "read at offset 0x%" PRIx64 "failed: %s\n",
+ command->offset, strerror (*error));
exit (EXIT_FAILURE);
}
@@ -400,6 +401,7 @@ finished_read (void *vp, int *error)
bool last_is_hole = false;
uint64_t i;
struct command *newcommand;
+ int dummy = 0;
/* Iterate over whole blocks in the command, starting on a block
* boundary.
@@ -482,8 +484,7 @@ finished_read (void *vp, int *error)
/* Free the original command since it has been split into
* subcommands and the original is no longer needed.
*/
- errno = 0;
- free_command (command, &errno);
+ free_command (command, &dummy);
}
return 1; /* auto-retires the command */
@@ -510,6 +511,7 @@ fill_dst_range_with_zeroes (struct command *command)
{
char *data;
size_t data_size;
+ int dummy = 0;
if (destination_is_zero)
goto free_and_return;
@@ -545,8 +547,7 @@ fill_dst_range_with_zeroes (struct command *command)
free (data);
free_and_return:
- errno = 0;
- free_command (command, &errno);
+ free_command (command, &dummy);
}
static int
@@ -557,8 +558,8 @@ free_command (void *vp, int *error)
/* XXX - is it worth retrying a failed command? */
if (*error) {
- errno = *error;
- perror("write");
+ fprintf (stderr, "write at offset 0x%" PRIx64 "failed: %s\n",
+ command->offset, strerror (*error));
exit (EXIT_FAILURE);
}
diff --git c/copy/null-ops.c w/copy/null-ops.c
index 924eaf1e..5f1fda50 100644
--- c/copy/null-ops.c
+++ w/copy/null-ops.c
@@ -117,16 +117,18 @@ null_asynch_write (struct rw *rw,
struct command *command,
nbd_completion_callback cb)
{
- errno = 0;
- cb.callback (cb.user_data, &errno);
+ int dummy = 0;
+
+ cb.callback (cb.user_data, &dummy);
}
static bool
null_asynch_zero (struct rw *rw, struct command *command,
nbd_completion_callback cb, bool allocate)
{
- errno = 0;
- cb.callback (cb.user_data, &errno);
+ int dummy = 0;
+
+ cb.callback (cb.user_data, &dummy);
return true;
}
2 years, 9 months
[PATCH libnbd] golang: make-dist.sh: Generate the list file
by Nir Soffer
Generated the list file when creating the distribution. Since the Go
tool treat the list file on the proxy server as the source of truth, we
do the same. The new list file is created by downloading the current
list file, sorting it, and appending the current version.
Creating a distribution tarball requires now access to
download.libguestfs.org.
With this change the distribution tarball can be extract on the server
without any additional manual process.
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
golang/make-dist.sh | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/golang/make-dist.sh b/golang/make-dist.sh
index a590e6c6..5fe006ff 100755
--- a/golang/make-dist.sh
+++ b/golang/make-dist.sh
@@ -86,48 +86,42 @@ rm -rf libguestfs.org
#
# libguestfs.org
# └── libnbd
# ├── @latest
# └── @v
# ├── list
# ├── v1.11.4.info
# ├── v1.11.4.mod
# └── v1.11.4.zip
#
-# We create @latest and @v/*{.info,mod,zip} here.
-#
-# The "@v/list" file must be created on the web server after uploading
-# a new release:
-#
-# $ cd libguestfs.org/libnbd/@v
-# $ ls -1 v*.info | awk -F.info '{print $1}' > list
-# $ cat list
-# v1.11.3
-# v1.11.4
-#
# See https://golang.org/ref/mod#serving-from-proxy
module_dir=libguestfs.org/libnbd
v_dir=$module_dir/@v
mkdir -p $v_dir
# Go wants a string in RFC 3339 format, git strict ISO 8601 format is
# compatible.
info="{
\"Version\": \"$version\",
\"Time\": \"$(git show -s --format=%cI)\"
}"
echo "$info" > $module_dir/@latest
echo "$info" > $v_dir/$version.info
cp go.mod $v_dir/$version.mod
mv $version.zip $v_dir
+# Create the list file by amending the curent file on the server.
+list_url=https://download.libguestfs.org/libnbd/golang/libguestfs.org/libnbd/@v/list
+curl --silent --show-error "$list_url" | sort > $v_dir/list
+grep -q "$version" $v_dir/list || echo "$version" >> $v_dir/list
+
# Create tarball to upload and extract on the webserver. It should be
# extracted in the directory pointed by the "go-import" meta tag.
output=$PWD/libnbd-golang-$version.tar.gz
tar czf $output libguestfs.org
rm -rf libguestfs.org
echo output written to $output
--
2.34.1
2 years, 9 months
LIBNBD SECURITY: Silent image corruption with nbdcopy
by Eric Blake
We have discovered a security flaw with potential moderate impact in
libnbd.
Lifecycle
---------
Reported: 2022-01-24 Fixed: 2022-02-04 Published: 2022-01-27
This has been assigned CVE-2022-0485.
Credit
------
Reported by Nir Soffer <nsoffer(a)redhat.com> and patched by Eric Blake
<eblake(a)redhat.com>
Description
-----------
libnbd is a Network Block Device (NBD) client library. nbdcopy is an
application shipped as part of the libnbd project designed to copy
files (typically disk images) where at least one of the source or
destination locations is an NBD server.
A flaw in the nbdcopy program silently ignored any asynchronous read
or write errors reported by an NBD server. With no error messages and
no change to the exit status, a client of the nbdcopy application has
no indication that the corruption occurred, where the loss of data
integrity may have further implications in a denial of service attack
against disk images. What's more, in some NBD client/server
configurations, the data that nbdcopy writes to the destination image
could come from uninitialized heap data in the nbdcopy process, where
the leaked information may aid in mounting a more sophisticated attack
against a machine using a long-running nbdcopy.
If TLS is not in use, a meddler-in-the-middler attacker can easily
corrupt the destination image by faking error replies to NBD read and
write commands. But even when TLS is in use, the nature of the flaw
means that nbdcopy cannot detect legitimate error reports from the
server. Furthermore, even though nbdcopy attempts to detect abrupt
disconnection of a TLS connection, the nature of asynchronous commands
means that we were unable to rule out whether a well-timed MitM
attacker could cause undetected corruption near the end of a copy
process by aborting a TLS stream at the right moment, even without
having access to the content of the TLS stream.
Test if libnbd is vulnerable
----------------------------
With nbdkit 1.14 or later installed, the following serves as an easy
test of the vulnerability:
$ nbdkit -U - --filter=error pattern size=1M error-pread-rate=1 \
--run 'nbdcopy $uri null:' && echo vulnerable
The error messages printed from nbdkit demonstrate the failed NBD
transactions. Where a vulnerable version of nbdcopy ignores those
errors and results in 'vulnerable' being printed to stdout, a fixed
version will diagnose a read failure to stderr.
Workarounds
-----------
Use of 'nbdcopy --synchronous' will avoid undected data corruption,
but comes at a potential performance cost by avoiding the speed
benefits of asynchronous operations. If your version of nbdcopy lacks
the '--synchronous' command-line option (all 1.4.x releases), it is
not vulnerable.
Connecting a vulnerable version of nbdcopy to an NBD server that
supports structured replies by default (such as qemu-nbd 2.11 or
nbdkit 1.12 and newer) does not eliminate the risk of data corruption
on an error, but is sufficient to guarantee that any corruptions will
either read as all zeroes or else as prior contents of the destination
image. This is safer than connecting nbdcopy to an NBD server that
lacks structured replies (such as nbd-server 3.23, current as of this
notice), where the data corruption could also leak heap contents from
the nbdcopy process.
Although the use of TLS does not avoid the bug, it is still
recommended to use TLS when utilizing nbdcopy to copy files across
machines, so that the NBD server can be trusted to not be malicious.
It is recommended to apply the fix or upgrade to a fixed version.
Fixes
-----
The flaw was introduced in libnbd 1.5.6 (commit bc896eec4d), when
nbdcopy gained the ability to default to using asynchronous NBD reads
and writes. A fix for the missing error detection is available for
all affected stable branches, and the current development branch.
However, note that the stable-1.6 branch of libnbd has other known
data corruption bugs when communicating with an NBD server that
supports trim operations, which were repaired for 1.8 but not
backported to any 1.6.x release. While the trim bug did not amount to
a CVE, if your reason for upgrading libnbd is to get a version of
nbdcopy that avoids known data corruption issues, it is recommended
that you use a newer release.
* development branch (1.11)
https://gitlab.com/nbdkit/libnbd/-/commit/8d444b41d09a700c7ee6f9182a649f3...
or use libnbd >= 1.11.8 from
http://download.libguestfs.org/libnbd/1.7-development/
* stable branch 1.10
https://gitlab.com/nbdkit/libnbd/-/commit/9219d2e70c770d8efb98d6e8eaf68e8...
or use libnbd >= 1.10.4 from
http://download.libguestfs.org/libnbd/1.10-stable/
* stable branch 1.8
https://gitlab.com/nbdkit/libnbd/-/commit/7ed8072a90922372dd1ba32bddd5526...
or use libnbd >= 1.8.7 from
http://download.libguestfs.org/libnbd/1.8-stable/
* stable branch 1.6
https://gitlab.com/nbdkit/libnbd/-/commit/6c8f2f859926b82094fb5e85c446ea0...
or use libnbd >= 1.6.6 from
http://download.libguestfs.org/libnbd/1.6-stable/
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
2 years, 9 months