On 6/30/19 12:54 PM, Richard W.M. Jones wrote:
 On Thu, Jun 27, 2019 at 10:18:30PM -0500, Eric Blake wrote:
> +  /* Queue up a write command so large that we block on POLLIN, then queue
> +   * multiple disconnects. XXX The last one should fail.
> +   */
> +  if (nbd_aio_pwrite (nbd, buf, 2 * 1024 * 1024, 0, 0) == -1) {
> +    fprintf (stderr, "%s: %s\n", argv[0], nbd_get_error ());
> +    exit (EXIT_FAILURE);
> +  }
> +  if ((nbd_aio_get_direction (nbd) & LIBNBD_AIO_DIRECTION_WRITE) == 0) {
> +    fprintf (stderr, "%s: test failed: "
> +             "expect to be blocked on write\n",
> +             argv[0]);
> +    exit (EXIT_FAILURE);
> +  }
 
 This test fails when run under valgrind.  An abbreviated log shows
 what's happening:
 
 libnbd: debug: nbd_aio_pwrite: event CmdIssue: READY -> ISSUE_COMMAND.START
 libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.START -> ISSUE_COMMAND.
 SEND_REQUEST
 libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.SEND_REQUEST -> ISSUE_C
 OMMAND.PREPARE_WRITE_PAYLOAD
 libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD -
> ISSUE_COMMAND.SEND_WRITE_PAYLOAD
 libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.SEND_WRITE_PAYLOAD -> I
 SSUE_COMMAND.FINISH
 libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.FINISH -> READY
 /home/rjones/d/libnbd/tests/.libs/lt-errors: test failed: expect to be blocked on write
 
 It seems as if this is caused by valgrinded code running more slowly,
 rather than an actual valgrind/memory error. 
Or even that valgrind's interception of send()/recv() performs buffering
differently than we get by default from the kernel.  I don't know if
running strace on valgrind is a sensible enough thing to do to see
syscall behavior?
 
 I wonder if we could remove the race using a custom nbdkit-sh-plugin
 which would block on writes until (eg) a local trigger file was
 touched?  Even that seems as if it would depend on the amount of data
 that the kernel is able to buffer. 
I don't know how to make an nbdkit plugin stop the code in nbdkit/server
from read()ing from the client (the plugin code doesn't get to run until
the core has learned that the client wants a command serviced).  But it
may be possible to tweak things to send back-to-back write requests,
where even if the first write request gets sent completely, the plugin
can delay responding to that first write and use --filter=noparallel to
prevent the second command from reaching nbdkit.  I'll play with that,
to see if I can reproduce the valgrind race, as well as work around it
with back-to-back write commands to increase the likelihood of actually
preventing nbdkit from consuming the second command.
-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  
qemu.org | 
libvirt.org