Re: [Libguestfs] nbdkit blocksize filter, read-modify-write, and concurrency

Wednesday, 25 May 2022

On May 25 2022, Eric Blake <eblake(a)redhat.com&gt; wrote:
...
 On Tue, May 24, 2022 at 08:45:02PM +0100, Nikolaus Rath wrote:
> > However, you are worried that a third possibility occurs:
> >
> > T2 sees that it needs to do RMW, grabs the lock, and reads 0x00-0x0f
> > for the unaligned head (it only needs 0x00-0x03, but we have to read a
> > block at a time), to populate its buffer with IIIIBBBBBBBBBBBB.
> >
> > T1 now writes 0x00-0x0f with AAAAAAAAAAAAAAAA, without any lock
> > blocking it.
> >
> > T2 now writes 0x00-0x0f using the contents of its buffer, resulting in:
> >
> >        0   0   0   0   1   1   1   1
> >        0...4...8...a...0...4...8...a...
> > end3:  IIIIBBBBBBBBBBBBBBBBBBBBBBIIIIII
> >
> > which does NOT reflect either of the possibilities where T1 and T2
> > write atomically.  Basically, we have become the victim of sharding.
> 
> Yes, this is the scenario that I am worried about.
> 
> I this is a data corruption problem no matter if we assume that writes
> should be atomically or not.

 Writes to a single block should be atomic.  Writes larger than a block
 need not be atomic.  But when we advertise a particular block size,
 clients should be safe in assuming that anything smaller than that
 block size is inadvisable (either it will fail as too small, or it
 will incur a RMW penalty), but that something at the block size should
 not need client-side protection when no other parallel requests are
 overlapping that block.  And that is where our blocksize filter is
 currently failing. 
Agreed.

...
> > You have just demonstrated that our current approach
(grabbing a
> > single semaphore, only around the unaligned portions), does not do
> > what we hoped.  So what WOULD protect us, while still allowing as much
> > parallelism as possible?
> 
> How about a per-block lock as implemented for the S3 plugin in
> https://gitlab.com/nbdkit/nbdkit/-/merge_requests/10?

 That feels pretty heavyweight.  It may allow more parallelism, but at
 the expense of more resources.  I'm hoping that a single rwlock will
 do, although it may be dominated by starvation time when toggling
 between unaligned actions (serialized) vs aligned actions (parallel). 
The resource usage is a single shared variable with an underlying mutex,
and a set (probably best implemented as a hash) containing at most as
many elements as block are being written in concurrently. This does not
sound like much to me. Am I missing something?

Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] nbdkit blocksize filter, read-modify-write, and concurrency