On Wed, Aug 31, 2022 at 10:13:08AM +0100, Richard W.M. Jones wrote:
On Wed, Aug 31, 2022 at 09:45:45AM +0800, Ming Lei wrote:
> On Tue, Aug 30, 2022 at 05:13:46PM +0100, Richard W.M. Jones wrote:
> > On Tue, Aug 30, 2022 at 11:29:26PM +0800, Ming Lei wrote:
> > > On Tue, Aug 30, 2022 at 03:38:50PM +0100, Richard W.M. Jones wrote:
> > > > On Tue, Aug 30, 2022 at 03:12:23PM +0800, Ming Lei wrote:
> > > > > The patch sent in last email may cause io hang on MQ, and
follows the fixed
> > > > > version:
> > > >
> > > > I split this into two commits and cleaned them up and posted them
here:
> > > >
> > > >
https://gitlab.com/rwmjones/libnbd/-/commits/nbdublk/
> > > >
> > > > Unfortunately this doesn't work for me. When I do various
filesystem
> > > > operations like git clone and a compile I see some subtle disk
errors
> > > > and eventually it deadlocks, so I guess there is some problem.
> > >
> > > OK, care to provide more details about the reproducer? Like how backend
> > > is setup, MQ/SQ is used, disk size, ...
> >
> > My test script is attached. $1 == "ublk".
> >
> > It basically just clones a Linux repo and compiles it. It hangs
> > either during the clone or early in the build, and there are various
> > "scary messages" from git which might indicate disk corruption.
> >
> > The NBD server is:
> >
> > nbdkit -f memory 24G
> >
> > running on the hypervisor ("nbd://pick").
> >
> > > I have cloned linux kernel source tree on nbdublk disk and built it with
> > > fedora 36 config for ~20min, so far so good. In my setting, backend is
> > > 'nbdkit file /dev/sda(virtio-scsi)', nbdublk is single queue.
> >
> > Can you see if you can reproduce a hang with the source from:
> >
> >
https://gitlab.com/rwmjones/libnbd/-/commits/nbdublk/
> >
> > I may have made a mistake when rebasing your patch or fixing it up to
> > remove compiler warnings.
>
> My test used the your tree directly. And I compared with it with
> my native tree, basically same.
>
> Today I will setup & run the test by your approach.
I tried it again now and it definitely deadlocks under load.
I can reproduce it, please try the top patch in aio branch, which fixed
hang in my reproducer with your test setting.