Re: [Libguestfs] mkfs.ext2 succeeds despite nbd write errors?

Saturday, 7 November 2015

On Sat, Nov 07, 2015 at 12:21:29AM -0600, Jason Pepas wrote:
...
 Hi,

 So I've been hacking together an nbdkit plugin (similar to the "file"
 plugin, but it splits the file up into chunks):
 https://github.com/pepaslabs/nbdkit-chunks-plugin

 I got it to the point of being a working prototype.  Then I threw it
 onto a raspberry pi, which it turns out only has a 50/50 shot of
 fallocate() working correctly.

 I'm checking the return code of fallocate(), and my chunks_pwrite()
 returns -1 if it fails.  No problems there.

 When I run mkfs.ext2 /dev/nbd0 on the client, I see this on the nbd-server:

 nbdkit: chunks[1]: error: Unable to fallocate
 '/home/cell/nbds/default/chunks/00000000000000030723'
 nbdkit: chunks[1]: error: Unable to fallocate
 '/home/cell/nbds/default/chunks/00000000000000030724'
 nbdkit: chunks[1]: error: Unable to fallocate
 '/home/cell/nbds/default/chunks/00000000000000030725'
 nbdkit: chunks[1]: error: Unable to fallocate
 '/home/cell/nbds/default/chunks/00000000000000030726'
 nbdkit: chunks[1]: error: Unable to fallocate
 '/home/cell/nbds/default/chunks/00000000000000030727'
 nbdkit: chunks[1]: error: Unable to fallocate
 '/home/cell/nbds/default/chunks/00000000000000030728'
 nbdkit: chunks[1]: error: Unable to fallocate
 '/home/cell/nbds/default/chunks/00000000000000031232'

 Indeed, there is definitely a problem with fallocate, as some of the
 chunks are the correct size (256k), and some are zero length:

 cell@pi1$ pwd
 /home/cell/nbds/default/chunks
 cell@pi1$ ls -l | tail
 -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032256
 -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032257
 -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032258
 -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032259
 -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032260
 -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032261
 -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032262
 -rw------- 1 cell cell 262144 Nov  7 06:01 00000000000000032263
 -rw------- 1 cell cell      0 Nov  7 06:01 00000000000000032264
 -rw------- 1 cell cell      0 Nov  7 06:01 00000000000000032767

 But that's my concern.  The problem is that, alarmingly, mkfs.ext2
 isn't phased by this at all:

 root@debian:~# nbd-client pi1 10809 /dev/nbd0
 Negotiation: ..size = 8192MB
 bs=1024, sz=8589934592 bytes
 root@debian:~# mkfs.ext2 /dev/nbd0
 mke2fs 1.42.12 (29-Aug-2014)
 Creating filesystem with 2097152 4k blocks and 524288 inodes
 Filesystem UUID: 2230269c-6d2a-4927-93df-d9dd9f4fa40c
 Superblock backups stored on blocks:
         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

 Allocating group tables: done
 Writing inode tables: done
 Writing superblocks and filesystem accounting information: done

 root@debian:~#

 However, the nbd-client's dmesg is chock full of errors:

 [ 9832.409219] block nbd0: Other side returned error (22)
 [ 9832.457401] block nbd0: Other side returned error (22)
 [ 9832.503100] block nbd0: Other side returned error (22)
 [ 9832.542457] block nbd0: Other side returned error (22)
 [ 9832.590394] block nbd0: Other side returned error (22)
 [ 9832.642393] block nbd0: Other side returned error (22)
 [ 9832.681455] block nbd0: Other side returned error (22)
 [ 9832.721355] block nbd0: Other side returned error (22)
 [ 9832.722676] quiet_error: 15129 callbacks suppressed
 [ 9832.722679] Buffer I/O error on device nbd0, logical block 6293248
 [ 9832.724274] lost page write due to I/O error on nbd0
 [ 9832.724282] Buffer I/O error on device nbd0, logical block 6293249
 [ 9832.725110] lost page write due to I/O error on nbd0
 [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293250
 [ 9832.725110] lost page write due to I/O error on nbd0
 [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293251
 [ 9832.725110] lost page write due to I/O error on nbd0
 [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293252
 [ 9832.725110] lost page write due to I/O error on nbd0
 [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293253
 [ 9832.725110] lost page write due to I/O error on nbd0
 [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293254
 [ 9832.725110] lost page write due to I/O error on nbd0
 [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293255
 [ 9832.725110] lost page write due to I/O error on nbd0
 [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293256
 [ 9832.725110] lost page write due to I/O error on nbd0
 [ 9832.725110] Buffer I/O error on device nbd0, logical block 6293257
 [ 9832.725110] lost page write due to I/O error on nbd0
 [ 9832.743111] block nbd0: Other side returned error (22)
 [ 9832.744420] blk_update_request: 125 callbacks suppressed
 [ 9832.744422] end_request: I/O error, dev nbd0, sector 12587008
 [ 9832.758203] block nbd0: Other side returned error (22)
 [ 9832.759513] end_request: I/O error, dev nbd0, sector 12845056
 [ 9832.777635] block nbd0: Other side returned error (22)
 [ 9832.779511] end_request: I/O error, dev nbd0, sector 12849160
 [ 9832.805950] block nbd0: Other side returned error (22)
 [ 9832.810278] end_request: I/O error, dev nbd0, sector 12849416
 [ 9832.846880] block nbd0: Other side returned error (22)

 So, my question / concern is, how is it that the nbd-client's kernel
 is correctly detecting massive I/O errors, but apparently not sending
 them through to mkfs.ext2? 
It's definitely not good, but I don't think it can be nbdkit, since
nbd-client is seeing the errors.

...
 Or perhaps mkfs.ext2 doesn't check for I/O errors?  That's a
bit hard
 to believe... 
How about 'strace mkfs.ext2 ..' and see if any system calls are
returning errors.  That would show you whether nbd-client is throwing
errors away, or whether mkfs is getting the errors and ignoring them
(seems pretty unlikely, but you never know).

After that, it'd be down to tracing where the errors end up in the
kernel.

Rich.

...
 Anyway, I'm sure someone on this list has run into similar
issues, so
 I thought I'd reach out before I go too far down a rabbit hole.

 Thanks,
 Jason Pepas

 _______________________________________________
 Libguestfs mailing list
 Libguestfs(a)redhat.com
 https://www.redhat.com/mailman/listinfo/libguestfs 
-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] mkfs.ext2 succeeds despite nbd write errors?