On Sun, Jun 27, 2021 at 05:00:00PM +0530, Abhay Raj Singh wrote:
I ran into a problem while working on receiving data from nbd source
(Reply to NBD_CMD_READ)
As you know we need to parse the error in Reply Header before we can
proceed reading the data.
Let's say an error occurred so instead of
HEADER1_DATA1..._HEADER2_DATA2... we will
get HEADER1_HEADER2_ DATA2... (as far as I know) so submitting a recv
request to io_uring
with length = sizeof(HEADER1+DATA1) would cause problem as it won't
detect NBD packet boundaries
and will give us as many bytes we ask it (I may be wrong here that's
what I read till now).
Two problems with requesting length = sizeof(HEADER1+DATA1):
First, as you pointed out, if the server errors out, you will only get
HEADER1 bytes back (including the error indication), and no data
bytes.
Second, once you start issuing out-of-order requests to the server
rather than synchronous waiting for a reply before beginning the next
request, then you are also at risk that the server might answer
HEADER2 prior to answering HEADER1.
A remedy to this would be just submit 'header reads' to io_uring when
we get a read and if header says there were no errors
we can be sure there is length bytes ready to be read in the
buffer(rest of the NBD packet) and read won't block.
Yes, you really DO have to submit a read request for JUST a header,
and then based on what that header tells you are you finally able to
decipher what to expect next on the wire (another header, or WHICH
read you are getting a reply to). As Rich said, processing the
headers via user-space copies is probably fine, where the real savings
come into play when processing the data payloads.
Now, as far as I can tell this would work as I expect but our main
concern is avoiding copy_user_enhanced_fast_string
so this won't be nice.
Also attaching metadata (Operation) to read SQE doesn't make sense
because As far as I know io_uring won't be able to tell
the difference the read is for which io_uring request, Reply Header's
handle will tell us which operation in operations vector
does this NBD packet belong to.
Yeah, because of the out-of-order potential of the NBD protocol, you
will have to be careful that you are processing headers before knowing
where to send payloads.
Another solution would be opening multiple sockets one for each slot
in operations vector, only one NBD operation runs on a socket
i.e. only one inflight request per socket, that too sounds like a bad idea.
To some extent, multiple sockets is what Rich mentioned in the
multi-conn approach, but having one socket per parallel operation is
going to be slower than properly handling out-of-order traffic on one
socket (there may still be savings by having multiple sockets, but you
also want to be sure to handle multiple in-flight commands per
socket).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org