On Mon, Jul 30, 2018 at 6:01 PM Eric Blake <eblake@redhat.com> wrote:
... 
> @@ -42,6 +43,8 @@
>   #include <sys/stat.h>
>   #include <errno.h>
>   #include <linux/falloc.h>   /* For FALLOC_FL_* on RHEL, glibc < 2.18 */
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>

Does this need a configure-time probe to see if it exists, since it will
break compilation on BSD systems?  Same question to linux/falloc.h.
Actually, linux/falloc.h doesn't see any use in the current nbdkit.git;
does this email depend on another thread being applied first?

Yes, this depends on 
https://www.redhat.com/archives/libguestfs/2018-July/msg00083.html 

I plan to protect both imports with #if defined(__linux__). Any reason to
use configure instead?

> +
> +#ifdef FALLOC_FL_PUNCH_HOLE
> +  /* If we can punch hole but may not trim, we can combine punching hole and
> +     fallocate to zero a range. This is much more efficient than writing zeros
> +     manually. */

s/is/can be/ (it's two syscalls instead of one, and may not be as
efficient as we'd like - but does indeed stand a chance of being more
efficient than manual efforts)

"can be" is better, but I really mean "is typically", or "is expected to be".

For example in imageio same change improved upload throughout by
450% with my poor NFS server, see:
https://gerrit.ovirt.org/c/92871/11//COMMIT_MSG
 
> +  if (h->can_punch_hole && h->can_fallocate) {
> +    r = do_fallocate (h->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> +                      offset, count);
> +    if (r == 0) {
> +      r = do_fallocate(h->fd, 0, offset, count);
> +      if (r == 0)
> +        return 0;
> +
> +      if (errno != EOPNOTSUPP) {
> +        nbdkit_error ("zero: %m");
> +        return r;
> +      }
> +
> +      h->can_fallocate = false;
> +    } else {
> +      if (errno != EOPNOTSUPP) {
> +        nbdkit_error ("zero: %m");
> +        return r;
> +      }
> +
> +      h->can_punch_hole = false;
> +    }
> +  }
> +#endif
> +
> +  /* For block devices, we can use BLKZEROOUT.
> +     NOTE: count and offset must be aligned to logical block size. */
> +  if (h->is_block_device) {
> +    uint64_t range[2] = {offset, count};

Is it worth attempting the ioctl only when you have aligned values?

I think it does, but this requires getting the logical sector size
and keeping it in the handle.

Looking in plugin_zero, if we find that the offset and count are not
aligned and return EOPNOTSUPP, we will fallback to manual
zeroing for this call.

But this worries me:

549   int can_zero = 1; /* TODO cache this per-connection? */

Once can_zero is cached per connection, failing once because of
single unaligned call will prevent efficient zero for the rest of the image.

> +
> +    r = ioctl(h->fd, BLKZEROOUT, &range);

This portion of the code be conditional on whether BLKZEROOUT is defined.

Right.
...

Nir