Re: [Libguestfs] nbdkit blocksize filter, read-modify-write, and concurrency

Sunday, 22 May 2022

On Sat, May 21, 2022 at 05:37:10PM +0100, Nikolaus Rath wrote:
...
 On May 21 2022, "Richard W.M. Jones"
<rjones(a)redhat.com&gt; wrote:
 > On Sat, May 21, 2022 at 01:21:11PM +0100, Nikolaus Rath wrote:
 >> Hi,
 >>
 >> How does the blocksize filter take into account writes that end-up
 >> overlapping due to read-modify-write cycles?
 >>
 >> Specifically, suppose there are two non-overlapping writes handled
 >> by two different threads, that, due to blocksize requirements,
 >> overlap when expanded.  I think there is a risk that one thread may
 >> partially undo the work of the other here.
 >>
 >> Looking at the code, it seems that writes of unaligned heads and
 >> tails are protected with a global lock., but writes of aligned data
 >> can occur concurrently.
 >
 > I agree.
 >
 > Assuming the underlying plugin is NBDKIT_THREAD_MODEL_PARALLEL and no
 > other filters impose thread model limits, the blocksize filter does
 > not limit the thread model, so the thread model of nbdkit would also
 > be NBDKIT_THREAD_MODEL_PARALLEL.
 >
 > That means that two writes either on different connections or
 > pipelined on the same connection could happen at the same time.
 > “blocksize_pwrite” would be called concurrently for the two requests.
 >
 >> However, does this not miss the case where there is one unaligned
 >> write that overlaps with an aligned one?
 >>
 >> For example, with blocksize 10, we could have:
 >> 
 >> Thread 1: receives write request for offset=0, size=10
 >> Thread 2: receives write request for offset=4, size=16
 >> Thread 1: acquires lock, reads bytes 0-4
 >> Thread 2: does aligned write (no locking needed), writes bytes 0-10
 >> Thread 1: writes bytes 0-10, overwriting data from Thread 2
 >
 > I believe this analysis is correct.  (CC'd to Eric who knows a lot
 > more about this.)
 >
 > However I don't think it's a bug.  If a client doesn't want writes to
 > squash each other, then it shouldn't send overlapping requests.  I bet
 > the same thing happens with an SSD.

 But the requests are not overlapping from the client point of view. They
 only become overlapping when the server applies its read-modify-write
 operation to align them to the blocksize. 
I'm going to leave this one to Eric who's an expert on this ("write
tearing", I think).

...
 I think you elsewhere said that the blocksize reported by the NBD
server
 is only a preferred blocksize, so I'd be surprised if not following this
 "preference" results in data corruption. 
This is true for NBD at the moment, but I think everyone accepts it's
a mistake in the protocol.  Eric was looking into this too.

...
 > NBD_CMD_FLAG_FUA is provided for clients that wish to ensure
that a
 > write has been committed before sending another request.
 >
 > Do you have an example of a client which sends overlapping requests
 > and depends on particular behaviour of the server?  You may be able to
 > get it to work by using nbdkit-noparallel-filter which can be used to
 > serialize nbdkit.

 I'm working with the kernel's NBD client, and it would explain all the
 mysterious data corruption issues that I've seen with the S3 plugin. But
 I have not yet confirmed definitely that this is the root cause.

 For now, I'll avoid the blocksize filter and instead do the
 read-modify-write in the plugin with proper locking. If that fixes it,
 then I think we can conclude that the kernel is sending such requests
 (but, as I said above, I would not consider them overlapping nor would I
 consider this a bug). 
Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] nbdkit blocksize filter, read-modify-write, and concurrency