On Mon, May 02, 2022 at 03:36:33PM +0100, Nikolaus Rath wrote:
On May 02 2022, Laszlo Ersek <lersek(a)redhat.com> wrote:
> On 05/01/22 18:35, Nikolaus Rath wrote:
>> Hi,
>>
>> I am developing a new nbdkit plugin, and occasionally I am getting
>> errors like this:
>>
>> |nbdkit: s3backer.8: error: write reply: NBD_CMD_WRITE: Broken pipe
>> nbdkit: s3backer.15: error: write reply: NBD_CMD_WRITE: Broken pipe|
>>
>>
>> (where "s3backer" is the plugin name).
>>
>> I am not sure what to make of these. Can someone advise?
>>
>> Looking at the nbdkit source, it looks to me like these are generated
>> when there is a problem sending a reply to the nbd client. On the other
>> hand, I am using the standard 'nbd-client' program through a Unix
>> socket, so I'd think this should not result in errors...?
So firstly, yes, we should interoperate correctly with the kernel
nbd.ko client and nbd-client. If there's a bug in interop, it's a bug
in nbdkit.
However in this case these errors could be normal if the client
disconnects suddenly. It's easy enough to simulate this even using
only the userspace client from libnbd. If we initiate NBD_CMD_WRITE
but disconnect before it finishes then:
$ nbdkit -U - -fv memory 1M \
--run 'nbdsh -u $uri -c "b = nbd.Buffer.from_bytearray(bytearray(512));
h.aio_pwrite(b, 0)"'
...
nbdkit: memory.0: error: write reply: NBD_CMD_WRITE: Broken pipe
Note that this isn't a problem for nbdkit. It prints the error
because it cannot send the reply on this connection, but continues
processing other connections as normal. Data is potentially lost, but
there's nothing nbdkit can do about that if the client goes away
suddenly. Clients that care about data integrity should issue
NBD_CMD_FLUSH and wait for the reply before declaring that data has
been committed.
> If your plugin managed to crash nbd-client remotely, that would
be
> consistent with this symptom.
So I tried to reproduce this, and noticed something odd. It seems I can
disconnect the nbd device (nbd-client -d) while there are still requests
in flight:
May 02 15:20:50
vostro.rath.org kernel: nbd1: detected capacity change from 0 to
52428800
May 02 15:20:50
vostro.rath.org kernel: block nbd1: NBD_DISCONNECT
May 02 15:20:50
vostro.rath.org kernel: block nbd1: Disconnected due to user request.
May 02 15:20:50
vostro.rath.org kernel: block nbd1: shutting down sockets
May 02 15:20:50
vostro.rath.org kernel: I/O error, dev nbd1, sector 776 op 0x0:(READ)
flags 0x80700 phys_seg 29 prio class 0
May 02 15:20:50
vostro.rath.org kernel: I/O error, dev nbd1, sector 776 op 0x0:(READ)
flags 0x0 phys_seg 1 prio class 0
May 02 15:20:50
vostro.rath.org kernel: Buffer I/O error on dev nbd1, logical block 97,
async page read
May 02 15:20:50
vostro.rath.org kernel: block nbd1: Attempted send on invalid socket
May 02 15:20:50
vostro.rath.org kernel: I/O error, dev nbd1, sector 0 op 0x1:(WRITE)
flags 0x800 phys_seg 0 prio class 0
May 02 15:20:50
vostro.rath.org kernel: block nbd1: Attempted send on invalid socket
May 02 15:20:50
vostro.rath.org kernel: I/O error, dev nbd1, sector 0 op 0x1:(WRITE)
flags 0x800 phys_seg 0 prio class 0
This was generated by running:
$ nbd-client localhost /dev/nbd1 && mkfs.ext4 /dev/nbd1 && nbd-client -d
/dev/nbd1
Is that expected behavior?
It's a bit unexpected to me. Adding Wouter to the thread - he might
have an idea here, especially if there's a way to have "nbd-client -d"
wait for pending requests to finish before disconnecting.
I don't use the kernel client very much myself. We mostly use either
libnbd or the qemu client.
I would have thought that nb-client will block until any dirty data
has
been written.
Curiously enough, in this case I did *not* get the above warnings from
nbdkit itself.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top