On Fri, Sep 02, 2022 at 09:06:27AM +0100, Richard W.M. Jones wrote:
On Thu, Sep 01, 2022 at 12:14:40PM +0100, Richard W.M. Jones wrote:
> On Thu, Sep 01, 2022 at 04:01:39PM +0800, Ming Lei wrote:
> > Maybe you should use one nbd disk, which has the closest code path
> > with nbdublk.
>
> Good idea - I have started a heavy test using nbd.ko as the
> backing. Let's see what happens after 12 hours or more.
This has been running overnight (using nbd.ko) and there are no
problems.
Since I have seen 2 failures with ublk, I think there is a problem
with the kernel driver, although it happens very rarely.
My tests were done using 6.0.0-rc3 (dcf8e5633e).
I believe it is still too early to say it is one ublk driver's issue.
The log shows that the following BUG_ON() is triggered, and ublk driver
is far away with this issue, and zero zspace->isolated is more like
one zsmalloc issue, and I highly suggest to report to zsmalloc/zram
community.
static void dec_zspage_isolation(struct zspage *zspage)
{
VM_BUG_ON(zspage->isolated == 0);
zspage->isolated--;
}
The effect from ublk could be that swap is more frequently or easy to
trigger since the ublk io code path needs to pin page from ublksrv's
vm space, and nbd socket could consume more memory too.
BTW, I did run same kernel build workload on ublk-loop for ~20 hours,
and it did survive without any issue triggered. And all ublk userspace
block devices share totally same ublk driver io code path.
Also maybe you can try to run the same workload by replacing zram
swap disk with other swap disk, such as virito-scsi/..., then see if
it can survive.
Thanks,
Ming