On Sat, Oct 23, 2021 at 1:25 AM Nir Soffer <nirsof(a)gmail.com> wrote:
I see a rare random failure when calling BlockStatus via Go binding:
block_status: nbd_block_status: poll: Interrupted system call
I could not reproduce this with "nbdinfo --map", even after modifying it
to call nbd_block_status() for every 128 MiB.
Fixing this in nbd_unlock_poll() avoids this issue in the entire
library, when we wait for command completion. This seems more useful
that fixing it in all libnbd clients.
An alternative is to fix this in the Go binding. Looking at related fix
in go standard library, Go code do not expect to handle EINTR:
https://go-review.googlesource.com/c/go/+/232862/
I looked briefly in the generator - looks like we need to change this
code in generator/GoLang.ml:
pr " ret = nbd_%s " name;
C.print_arg_list ~wrap:true ~handle:true ~types:false args optargs;
pr ";\n";
(match errcode with
| None -> ()
| Some errcode ->
pr " if (ret == %s)\n" errcode;
pr " save_error (err);\n";
to add a retry loop for all wrappers.
Tested using a go client listing all extents in an image, calling
BlockStatus for every 128m with fedora 34 qcow2 image. Without this fix,
this was always failing.
$ hyperfine -r1000 --show-output "./client nbd+unix://?socket=/tmp/nbd.sock >
/dev/null"
Benchmark 1: ./client nbd+unix://?socket=/tmp/nbd.sock > /dev/null
Time (mean ± σ): 31.6 ms ± 3.1 ms [User: 8.8 ms, System: 7.2 ms]
Range (min … max): 26.1 ms … 52.3 ms 1000 runs
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
lib/poll.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/lib/poll.c b/lib/poll.c
index edfcc59..df01d94 100644
--- a/lib/poll.c
+++ b/lib/poll.c
@@ -57,8 +57,11 @@ nbd_unlocked_poll (struct nbd_handle *h, int timeout)
* would allow other threads to close file descriptors which we have
* passed to poll.
*/
- r = poll (fds, 1, timeout);
- debug (h, "poll end: r=%d revents=%x", r, fds[0].revents);
+ do {
+ r = poll (fds, 1, timeout);
+ debug (h, "poll end: r=%d revents=%x", r, fds[0].revents);
+ } while (r == -1 && errno == EINTR);
+
if (r == -1) {
set_error (errno, "poll");
return -1;
--
2.31.1