Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length

Friday, 8 April 2022

Hi Eric,

On Thu, Apr 07, 2022 at 04:37:19PM -0500, Eric Blake wrote:
...
 The spec was silent on how many extents a server could reply with.
 However, both qemu and nbdkit (the two server implementations known to
 have implemented the NBD_CMD_BLOCK_STATUS extension) implement a hard
 cap, and will truncate the amount of extents in a reply to avoid
 sending a client a reply larger than the maximum NBD_CMD_READ response
 they are willing to tolerate:

 When qemu first implemented NBD_CMD_BLOCK_STATUS for the
 base:allocation context (qemu commit e7b1948d51, Mar 2018), it behaved
 as if NBD_CMD_FLAG_REQ_ONE were always passed by the client, and never
 responded with more than one extent.  Later, when adding its
 qemu:dirty-bitmap:XXX context extension (qemu commit 3d068aff16, Jun
 2018), it added a cap to 128k extents (1M+4 bytes), and that cap was
 applied to base:allocation once qemu started sending multiple extents
 for that context as well (qemu commit fb7afc797e, Jul 2018).  Qemu
 extents are never smaller than 512 bytes (other than an exception at
 the end of a file whose size is not aligned to 512), but even so, a
 request for just under 4G of block status could produce 8M extents,
 resulting in a reply of 64M if it were not capped smaller.

 When nbdkit first implemented NBD_CMD_BLOCK_STATUS (nbdkit 4ca66f70a5,
 Mar 2019), it did not impose any restriction on the number of extents
 in the reply chunk.  But because it allows extents as small as one
 byte, it is easy to write a server that can amplify a client's request
 of status over 1M of the image into a reply over 8M in size, and it
 was very easy to demonstrate that a hard cap was needed to avoid
 crashing clients or otherwise killing the connection (a bad server
 impacting the client negatively); unique to nbdkit's situation is the
 fact that because it is designed for plugin server implementations,
 not capping the length of extent also posed a problem to nbdkit as the
 server (a client requesting large block status could cause the server
 to run out of memory depending on the plugin providing the server
 callbacks).  So nbdkit enforced a bound of 1M extents (8M+4 bytes,
 nbdkit commit 6e0dc839ea, Jun 2019).

 Since the limit chosen by these two implementations is different, and
 since nbdkit has versions that were not limited, add this as a SHOULD
 NOT instead of MUST NOT constraint on servers implementing block
 status.  It does not matter that qemu picked a smaller limit that it
 truncates to, since we have already documented that the server may
 truncate for other reasons (such as it being inefficient to collect
 that many extents in the first place).  But documenting the limit now
 becomes even more important in the face of a future addition of 64-bit
 requests, where a client's request is no longer bounded to 4G and
 could thereby produce even more than 8M extents for the corner case
 when every 512 bytes is a new extent, if it were not for this
 recommendation. 
It feels backwards to me to make this a restriction on the server side.
You're saying there are server implementations that will be inefficient
if there are more than 2^20 extents, and therefore no server should send
more than those, even if it can do so efficiently.

Isn't it up to the server implementation to decide what can be done
efficiently?

Perhaps we can make the language about possibly reducing length of
extens a bit stronger; but I don't think adding explicit limits for a
server's own protection is necessary.

...
 ---
  doc/proto.md | 17 +++++++++++++----
  1 file changed, 13 insertions(+), 4 deletions(-)

 diff --git a/doc/proto.md b/doc/proto.md
 index bacccfa..c3c7cd9 100644
 --- a/doc/proto.md
 +++ b/doc/proto.md
 @@ -1815,6 +1815,12 @@ MUST initiate a hard disconnect.
    the different contexts need not have the same number of extents or
    cumulative extent length.

 +  Servers SHOULD NOT send more than 2^20 extents in a single reply
 +  chunk; in other words, the maximum size of
 +  `NBD_REPLY_TYPE_BLOCK_STATUS` should not be more than 4 + 8*2^20
 +  (8,388,612 bytes), even if this requires that the server truncate
 +  the response in relation to the *length* requested by the client.
 +
    Even if the client did not use the `NBD_CMD_FLAG_REQ_ONE` flag in
    its request, the server MAY return fewer descriptors in the reply
    than would be required to fully specify the whole range of requested
 @@ -2181,10 +2187,13 @@ The following request types exist:
      multiple descriptors, and the final descriptor MAY extend beyond
      the original requested size if the server can determine a larger
      length without additional effort.  On the other hand, the server MAY
 -    return less data than requested. However the server MUST return at
 -    least one status descriptor (and since each status descriptor has
 -    a non-zero length, a client can always make progress on a
 -    successful return).  The server SHOULD use different *status*
 +    return less data than requested.  In particular, a server SHOULD NOT
 +    send more than 2^20 status descriptors in a single chunk.
 +
 +    However the server MUST return at least one status descriptor,
 +    and since each status descriptor has a non-zero length, a client
 +    can always make progress on a successful return.  The server SHOULD
 +    use different *status*
      values between consecutive descriptors where feasible, although
      the client SHOULD be prepared to handle consecutive descriptors
      with the same *status* value.  The server SHOULD use descriptor
 -- 
 2.35.1

-- 
     w(a)uter.{be,co.za}
wouter(a){grep.be,fosdem.org,debian.org}

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH 1/2] spec: Recommend cap on NBD_REPLY_TYPE_BLOCK_STATUS length