On Sat, Nov 7, 2015 at 5:03 AM, Richard W.M. Jones <rjones(a)redhat.com> wrote:
How about 'strace mkfs.ext2 ..' and see if any system calls
are
returning errors. That would show you whether nbd-client is throwing
errors away, or whether mkfs is getting the errors and ignoring them
(seems pretty unlikely, but you never know).
After that, it'd be down to tracing where the errors end up in the
kernel.
Thanks for the tip!
The results are interesting. It looks like all of mkfs's pwrite()
calls succeed, but its final fsync() calls do actually fail:
root@debian:~# strace mkfs.ext2 /dev/nbd0 2>&1 | tee strace.out
root@debian:~# cat strace.out | grep pwrite
pwrite(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, 8187379712) = 32768
pwrite(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, 8187412480) = 32768
pwrite(3,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
32768, 8187445248) = 32768
...
root@debian:~# cat strace.out | grep fsync
fsync(3) = -1 EIO (Input/output error)
fsync(3) = -1 EIO (Input/output error)
The fsync() calls happen just before mkfs exists success:
root@debian:~# cat strace.out | tail
pwrite(3,
"\1\2\0\0\2\2\0\0\3\2\0\0\367{\365\37\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096, 6576672768) = 4096
fsync(3) = -1 EIO (Input/output error)
pwrite(3, "\0\0\10\0\0\0
\0\231\231\1\0qm\37\0\365\377\7\0\0\0\0\0\2\0\0\0\2\0\0\0"..., 1024,
1024) = 1024
fsync(3) = -1 EIO (Input/output error)
close(3) = 0
write(1, "done\n\n", 6done
) = 6
exit_group(0) = ?
+++ exited with 0 +++
root@debian:~#
I did manage to find two calls to fsync in the e2fsprogs source which
are not return-value-checked:
https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125...
https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125...
I'll see about submitting a patch there.
I'm not sure where to start with hunting down why mkfs's pwrite()
calls aren't failing. I'd look to the kernel source for that?
-jason