Re: [Libguestfs] nbdcpy: from scratch nbdcopy using io_uring

Monday, 23 August 2021

[adding the NBD list into cc]

On Mon, Aug 23, 2021 at 09:26:34PM +0530, Abhay Raj Singh wrote:
...
 I had an idea for optimizing my current approach, it's good in
some
 ways but can be faster with some breaking changes to the protocol.

 Currently, we read (from socket connected to source) one request at a time
 the simple flow looks like `read_header(io_uring) ---- success --->
 recv(data) --- success ---> send(data) & queue another read header`
 but it's not as efficient as it could be at best it's a hack.

 Another approach I am thinking about is a large buffer
 where we can read all of the socket's data and process packets from
 that buffer as all the I/O is handled.
 this minimizes the number of read requests to the kernel as we do 1
 read for multiple NBD packets.

 Further optimization requires changing the NBD protocol a bit
 Current protocol
 1. Memory representation of a response (20-byte header + data)
 2. Memory representation of a request (28-byte header + data)

 HHHHH_DDDDDDDDD...
 HHHHHHH_DDDDDDDDD...

 H and D represent 4 bytes, _ represents 0 bytes 
You are correct that requests are currently 28 bytes header plus any
payload (where payload is currently only in NBD_CMD_WRITE).  But
responses are two different lengths: simple responses are 16 bytes +
payload (payload only for NBD_CMD_READ, and only if structured replies
not negotiated), while structured responses are 20 bytes + payload
(but while NBD_CMD_READ and NBD_CMD_BLOCK_STATUS require structured
replies, a compliant server can still send simple replies to other
commands).  So it's even trickier than you represent here, as reading
20-byte headers of a reply is not going to always do the right thing.

...

 With the large buffer approach, we read data into a large buffer, then
 copy the NBD packet's data to a new buffer, strap a new header to it
 and send it.
 This copying is what we wanted to avoid in the first place.

 If the response header was 28 bytes or the first 8-bytes of data were
 useless we could have just overwritten the header part and sent data
 directly from the large buffer, therefore avoiding the copy.

 What are your thoughts? 
There's already discussions about what it would take to extend the NBD
protocol to support 64-bit requests (not that we'd want to go beyond
current server restrictions of 32M or 64M maximum NBD_CMD_READ and
NBD_CMD_WRITE, but more so we can permit quick image zeroing via a
64-bit NBD_CMD_WRITE_ZEROES).  Your observation that having the
request and response headers be equally sized for more efficient
handling is worthwhile to consider in making such a protocol extension
- of necessity, it would have to be via an NBD_OPT_* option requested
by the client during negotiation and responded to affirmatively by the
server, before both sides then use the new-size packets in both
directions after NBD_OPT_GO (and a client would still have to be
prepared to fall back to the unequal-sized headers if the server
doesn't understand the option).

For that matter, is there a benefit to having cache-line-optimized
sizing, where all headers are exactly 32 bytes (both requests and
responses, and both simple and structured replies)?  I'm thinking
maybe NBD_OPT_FIXED_SIZE_HEADER might be a sane name for such an
option.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] nbdcpy: from scratch nbdcopy using io_uring