I had an idea for optimizing my current approach, it's good in some
ways but can be faster with some breaking changes to the protocol.
Currently, we read (from socket connected to source) one request at a time
the simple flow looks like `read_header(io_uring) ---- success --->
recv(data) --- success ---> send(data) & queue another read header`
but it's not as efficient as it could be at best it's a hack.
Another approach I am thinking about is a large buffer
where we can read all of the socket's data and process packets from
that buffer as all the I/O is handled.
this minimizes the number of read requests to the kernel as we do 1
read for multiple NBD packets.
Further optimization requires changing the NBD protocol a bit
Current protocol
1. Memory representation of a response (20-byte header + data)
2. Memory representation of a request (28-byte header + data)
HHHHH_DDDDDDDDD...
HHHHHHH_DDDDDDDDD...
H and D represent 4 bytes, _ represents 0 bytes
With the large buffer approach, we read data into a large buffer, then
copy the NBD packet's data to a new buffer, strap a new header to it
and send it.
This copying is what we wanted to avoid in the first place.
If the response header was 28 bytes or the first 8-bytes of data were
useless we could have just overwritten the header part and sent data
directly from the large buffer, therefore avoiding the copy.
What are your thoughts?
Thanks and Regards.
Abhay