On Wed, Sep 27, 2023 at 04:22:48PM -0500, Eric Blake wrote:
On Fri, Sep 22, 2023 at 04:03:19PM -0500, Eric Blake wrote:
> On Fri, Sep 22, 2023 at 09:47:55PM +0100, Richard W.M. Jones wrote:
> > On Fri, Sep 22, 2023 at 12:33:36PM -0500, Eric Blake wrote:
> > > There are a number of ways in which gathering information can fail.
> > > But when possible, it makes sense to let the server know that we are
> > > done talking, as it minimizes the likelihood that nbdinfo's error
> > > message will be obscured by an accompanying error message by the
> > > server complaining about an unclean disconnect.
> > >
> > > For example, with a one-off qemu temporarily patched as mentioned in
> > > commit 0f8ee8c6 to advertise sizes larger than 2^63, kicking off
> > > 'qemu-nbd -f raw -r file &' produces:
> > >
> > > pre-patch:
> > >
> > > $ ./run nbdinfo --size nbd://localhost
> > > /home/eblake/libnbd/info/.libs/nbdinfo: nbd_get_size: server claims size
9223372036854781440 which does not fit in signed result: Value too large for defined data
type
> > > qemu-nbd: option negotiation failed: Failed to read opts magic: Unexpected
end-of-file before all data were read
> >
> > This doesn't necessarily seem like a bug?
>
> It's a quality of service thing - qemu is just being noisy that the
> client closed the socket abruptly without giving any reason why. It
> doesn't change libnbd's behavior (nbdinfo has already reported its
> error message), but may confuse people reading qemu-nbd logs.
>
> It may also be a case where we could patch qemu-nbd to not output a
> complaint if the client exited at a clean point in the back-and-forth
> transactions. We still want to be noisy if the socket closes after
> the first byte has been read, but if there are zero bytes available,
> announcing that the client no longer cares does not add much value.
>
> >
> > This feels like quite a significant change in behaviour to me. In
> > particular I'm worried if it interacts in some subtle way with the
> > forking done by the "[ ... ]" syntax for running servers on the
> > command line (or any of the other ways that libnbd might fork/exec).
>
> Interesting observation. atexit() handlers are not preserved across
> exec, and I think all our fork() paths end in either exec or _exit
> (also where atexit handlers are ignored), so I don't think we are
> risking calling the handler twice.
So there's a couple of other things I don't like about this patch
after further reflection:
(1) It causes nbdinfo to do extra I/O on the error path. While I
think we're only writing (NBD_CMD_DISC?) and not expecting a reply,
there might be some situation where a server (especially a local one)
is able to block progress by refusing to read, causing nbdinfo to
hang on exit. I'm not sure how realistic that is.
(2) It hints that the best way to use the API is to always issue a
shutdown before close, but in reality that's not required.
I think it's also trying to solve something which isn't a problem.
Rich.
> >
> > Can we hold this patch until after 1.18 is released and then put it
> > in? Should only be a week or two.
>
> Sure, being conservative on this one is fine by me. I'll delay it
> until after 1.18.
>
> >
> > Provisionally ACKed for post-1.18
Now in as commit fd4f3fea, so we can start getting some soak time under CI.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:
qemu.org |
libguestfs.org
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit