On Wed, May 08, 2024 at 09:16:19AM +0100, Richard W.M. Jones wrote:
Hi Eric,
----- Forwarded message from notify(a)aur.archlinux.org -----
solsticedhiver [1] added the following comment to nbdkit [2]:
There is still one test failing now test-blocksize-sharding.sh. I
don't know if it is expected or not
The full test-suite.log
https://0x0.st/X8TI.log
[1]
https://aur.archlinux.org/account/solsticedhiver/
[2]
https://aur.archlinux.org/pkgbase/nbdkit/
----- End forwarded message -----
I've attached the fragment of the failure log that they sent. However
I don't understand the test enough to see where the error happens.
Additionally I think there's a problem with the various 'print'-
statements in the Python code. They probably need to flush stdout
otherwise they don't appear coherent with error messages. Something
like this thoroughout:
-print("sanity check")
+print("sanity check", flush=True)
That sounds like a useful change no matter what. It may also be worth
adding some more timestamps into the sanity check portion, to make
sure that the plugin is responding as fast as we expect, and not under
heavy load.
# First pass: check that two aligned operations work in parallel
# Total time should be closer to 2 seconds, rather than 4 if serialized
print("sanity check")
ba1 = bytearray(b"1"*16)
ba2 = bytearray(b"2"*16)
buf1 = nbd.Buffer.from_bytearray(ba1)
buf2 = nbd.Buffer.from_bytearray(ba2)
touch(witness)
start_t = time.time()
h.aio_pwrite(buf1, 0)
h.aio_pwrite(buf2, 0)
while h.aio_in_flight() > 0:
h.poll(-1)
end_t = time.time()
os.unlink(witness)
out = h.pread(16,0)
print(out)
t = end_t - start_t
print(t)
assert out in [b"1"*16, b"2"*16]
assert t >= 2 and t <= 3
Based on the later message, it looks like t is higher than expected,
and that this was the assertion that is failing...
libnbd: debug: nbdsh: nbd_connect_uri: leave: ret=0
libnbd: debug: nbdsh: nbd_aio_pwrite: enter: buf="
0000: 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 |1111111111111111|
" count=16 offset=0 completion=NULL flags=0x0
libnbd: debug: nbdsh: nbd_aio_pwrite: leave: ret=1
libnbd: debug: nbdsh: nbd_aio_pwrite: enter: buf="
0000: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 |2222222222222222|
" count=16 offset=0 completion=NULL flags=0x0
libnbd: debug: nbdsh: nbd_aio_pwrite: leave: ret=2
libnbd: debug: nbdsh: nbd_aio_in_flight: enter:
libnbd: debug: nbdsh: nbd_aio_in_flight: leave: ret=2
We definitely kicked off two parallel aio_pwrite commands...
libnbd: debug: nbdsh: nbd_poll: enter: timeout=-1
libnbd: debug: nbdsh: nbd_poll: poll start: events=1
libnbd: debug: nbdsh: nbd_poll: poll end: r=1 revents=1
libnbd: debug: nbdsh: nbd_poll: leave: ret=1
libnbd: debug: nbdsh: nbd_aio_in_flight: enter:
libnbd: debug: nbdsh: nbd_aio_in_flight: leave: ret=1
libnbd: debug: nbdsh: nbd_poll: enter: timeout=-1
libnbd: debug: nbdsh: nbd_poll: poll start: events=1
libnbd: debug: nbdsh: nbd_poll: poll end: r=1 revents=1
libnbd: debug: nbdsh: nbd_poll: leave: ret=1
libnbd: debug: nbdsh: nbd_aio_in_flight: enter:
libnbd: debug: nbdsh: nbd_aio_in_flight: leave: ret=0
...and we were able to poll until both completed, but lack timestamps
to see how long we polled each time...
libnbd: debug: nbdsh: nbd_pread: enter: buf=<buf> count=16
offset=0 flags=0x0
libnbd: debug: nbdsh: nbd_pread: poll start: events=1
libnbd: debug: nbdsh: nbd_pread: poll end: r=1 revents=1
libnbd: debug: nbdsh: nbd_pread: leave: ret=0
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/lib/python3.12/site-packages/nbd.py", line 3986, in
<module>
nbdsh.shell()
File "/usr/lib/python3.12/site-packages/nbdsh.py", line 149, in shell
do_snippet[act](arg)
File "/usr/lib/python3.12/site-packages/nbdsh.py", line 140, in
<lambda>
"command": lambda arg: exec(arg, d, d),
^^^^^^^^^^^^^^^
File "<string>", line 32, in <module>
AssertionError
sanity check
bytearray(b'2222222222222222')
5.224142789840698
...we then checked the contents (it looks like the write with contents
"2"*16 landed last), and overall we spent 5.2 seconds doing all this
work, when we expected between 2 and 3 seconds. Given that the server
is set up with the delay filter adding 2 seconds before each pwrite,
it seems like we would expect > 2 seconds no matter what, but that we
would be > 4 seconds only if the two writes are serialized or else
heavy load gets in the way of fast execution (we should have < 4
seconds if the two writes are done in parallel).
The assert that t <= 3 is probably stricter than necessary; checking
whether t < 4 is going to add a bit more leeway.
So how about re-running the test with these enhancements, to see if we
can learn more about the situation?
diff --git i/tests/test-blocksize-sharding.sh w/tests/test-blocksize-sharding.sh
index 7e926ad1..3711c49c 100755
--- i/tests/test-blocksize-sharding.sh
+++ w/tests/test-blocksize-sharding.sh
@@ -80,9 +80,13 @@ witness = os.getenv("witness")
def touch(path):
open(path, "a").close()
+def progress(prefix, start):
+ t = time.time() - start
+ print("%s: %g" % (prefix, t), flush=True)
+
# First pass: check that two aligned operations work in parallel
# Total time should be closer to 2 seconds, rather than 4 if serialized
-print("sanity check")
+print("sanity check", flush=True)
ba1 = bytearray(b"1"*16)
ba2 = bytearray(b"2"*16)
buf1 = nbd.Buffer.from_bytearray(ba1)
@@ -91,21 +95,24 @@ touch(witness)
start_t = time.time()
h.aio_pwrite(buf1, 0)
h.aio_pwrite(buf2, 0)
+progress("after writes", start_t)
while h.aio_in_flight() > 0:
h.poll(-1)
+ progress("after poll", start_t)
end_t = time.time()
os.unlink(witness)
out = h.pread(16,0)
-print(out)
+progress("after read", start_t)
+print(out, flush=True)
t = end_t - start_t
-print(t)
+print("elapsed write time: %g" % t, flush=True)
assert out in [b"1"*16, b"2"*16]
assert t >= 2 and t <= 3
# Next pass: try to kick off unaligned first
-print("unaligned first")
+print("unaligned first", flush=True)
h.zero(16, 0)
ba3 = bytearray(b"3"*12)
ba4 = bytearray(b"4"*16)
@@ -122,14 +129,14 @@ end_t = time.time()
os.unlink(witness)
out = h.pread(16,0)
-print(out)
+print(out, flush=True)
t = end_t - start_t
-print(t)
+print(t, flush=True)
assert out in [b"4"*4 + b"3"*12, b"4"*16]
assert t >= 8
# Next pass: try to kick off aligned first
-print("aligned first")
+print("aligned first", flush=True)
ba5 = bytearray(b"5"*16)
ba6 = bytearray(b"6"*12)
buf5 = nbd.Buffer.from_bytearray(ba5)
@@ -146,9 +153,9 @@ end_t = time.time()
os.unlink(witness)
out = h.pread(16,0)
-print(out)
+print(out, flush=True)
t = end_t - start_t
-print(t)
+print(t, flush=True)
assert out in [b"5"*4 + b"6"*12, b"5"*16]
assert t >= 8
'
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:
qemu.org |
libguestfs.org