On Mon, Jul 30, 2018 at 6:01 PM Eric Blake <eblake(a)redhat.com> wrote:
...
> @@ -42,6 +43,8 @@
#include <sys/stat.h>
> #include <errno.h>
> #include <linux/falloc.h> /* For FALLOC_FL_* on RHEL, glibc < 2.18 */
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
Does this need a configure-time probe to see if it exists, since it will
break compilation on BSD systems? Same question to linux/falloc.h.
Actually, linux/falloc.h doesn't see any use in the current nbdkit.git;
does this email depend on another thread being applied first?
Yes, this depends on
https://www.redhat.com/archives/libguestfs/2018-July/msg00083.html
I plan to protect both imports with #if defined(__linux__). Any reason to
use configure instead?
+
> +#ifdef FALLOC_FL_PUNCH_HOLE
> + /* If we can punch hole but may not trim, we can combine punching
hole and
> + fallocate to zero a range. This is much more efficient than
writing zeros
> + manually. */
s/is/can be/ (it's two syscalls instead of one, and may not be as
efficient as we'd like - but does indeed stand a chance of being more
efficient than manual efforts)
"can be" is better, but I really mean "is typically", or "is
expected to
be".
For example in imageio same change improved upload throughout by
450% with my poor NFS server, see:
https://gerrit.ovirt.org/c/92871/11//COMMIT_MSG
> + if (h->can_punch_hole && h->can_fallocate) {
> + r = do_fallocate (h->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> + offset, count);
> + if (r == 0) {
> + r = do_fallocate(h->fd, 0, offset, count);
> + if (r == 0)
> + return 0;
> +
> + if (errno != EOPNOTSUPP) {
> + nbdkit_error ("zero: %m");
> + return r;
> + }
> +
> + h->can_fallocate = false;
> + } else {
> + if (errno != EOPNOTSUPP) {
> + nbdkit_error ("zero: %m");
> + return r;
> + }
> +
> + h->can_punch_hole = false;
> + }
> + }
> +#endif
> +
> + /* For block devices, we can use BLKZEROOUT.
> + NOTE: count and offset must be aligned to logical block size. */
> + if (h->is_block_device) {
> + uint64_t range[2] = {offset, count};
Is it worth attempting the ioctl only when you have aligned values?
I think it does, but this requires getting the logical sector size
and keeping it in the handle.
Looking in plugin_zero, if we find that the offset and count are not
aligned and return EOPNOTSUPP, we will fallback to manual
zeroing for this call.
But this worries me:
549 int can_zero = 1; /* TODO cache this per-connection? */
Once can_zero is cached per connection, failing once because of
single unaligned call will prevent efficient zero for the rest of the image.
+
> + r = ioctl(h->fd, BLKZEROOUT, &range);
This portion of the code be conditional on whether BLKZEROOUT is defined.
Right.
...
Nir