Thanks, sir, the suggestions were very insightful.
 For that matter is there a benefit to having cache-line-optimized
 sizing, where all headers are exactly 32 bytes (both requests and
 responses, and both simple and structured replies)?  I'm thinking
 maybe NBD_OPT_FIXED_SIZE_HEADER might be a sane name for such an
 option.
 
32-bytes sound good also you mentioned about upgrading length field
from the current 32 bits to 64 bits, which will increase the size of
RequestHeaderfrom 28 to 32 bytes, also we currently need to add
__attribute__((__packed__)) to request header(for x64 at least) which
is not best thing for speed.
So, if I understand it correctly I need to implement this in nbdkit
too to make it work.
I gave the idea a bit more thought and I think there will be a sweet
spot for how large our NBD packets should be as
1. The small payloads (<1MB) might help us process more packets hence
should be better for a large buffer
2. But more packets might stress our NBD state machine causing it to
be more CPU intensive, however, that might not be very significant but
just putting it out there.
Anyways this is a bit different from my previous approach so I need to
redesign nbdcpy, to make it wait some time to collect packets before
reading. I will update on further developments.
Thanks & Regards,
Abhay