On 07/29/2018 07:04 AM, Nir Soffer wrote:
If we may not trim, we tried ZERO_RANGE, but this is not well
supported
yet, for example it is not available on NFS 4.2. ZERO_RANGE and
PUNCH_HOLE are supported now on block devices, but not on RHRL 7, so we
fallback to slow manual zeroing there.
Change the logic to support block devices on RHEL 7, and file systems
that do not support ZERO_RANGE.
The new logic:
- If we may trim, try PUNCH_HOLE
- If we can zero range, Try ZERO_RANGE
- If we can punch hole and fallocate, try fallocate(PUNCH_HOLE) followed
by fallocate(0).
- If underlying file is a block device, try ioctl(BLKZEROOUT)
- Otherwise fallback to manual zeroing
The handle keeps now the underlying file capabilities, so once we
discover that an operation is not supported, we never try it again.
Issues:
- ioctl(BLKZEROOUT) will fail if offset or count are not aligned to
logical sector size. I'm not sure if nbdkit or qemu-img ensure this.
qemu-img tends to default to 512-byte alignment, but can be told to
follow 4k alignment instead. nbdkit includes a filter that can force 4k
alignment on top of any plugin, regardless of client alignment.
Someday, I'd like to enhance nbdkit to support block size advertisement
(qemu-img already knows how to honor such advertisements). It's on my
todo queue, but lower in priority than getting incremental backups
working in libvirt.
- Need testing with NFS
---
plugins/file/file.c | 126 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 103 insertions(+), 23 deletions(-)
+++ b/plugins/file/file.c
@@ -33,6 +33,7 @@
#include <config.h>
+#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
@@ -42,6 +43,8 @@
#include <sys/stat.h>
#include <errno.h>
#include <linux/falloc.h> /* For FALLOC_FL_* on RHEL, glibc < 2.18 */
+#include <sys/ioctl.h>
+#include <linux/fs.h>
Does this need a configure-time probe to see if it exists, since it will
break compilation on BSD systems? Same question to linux/falloc.h.
Actually, linux/falloc.h doesn't see any use in the current nbdkit.git;
does this email depend on another thread being applied first?
+
+#ifdef FALLOC_FL_PUNCH_HOLE
+ /* If we can punch hole but may not trim, we can combine punching hole and
+ fallocate to zero a range. This is much more efficient than writing zeros
+ manually. */
s/is/can be/ (it's two syscalls instead of one, and may not be as
efficient as we'd like - but does indeed stand a chance of being more
efficient than manual efforts)
+ if (h->can_punch_hole && h->can_fallocate) {
+ r = do_fallocate (h->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ offset, count);
+ if (r == 0) {
+ r = do_fallocate(h->fd, 0, offset, count);
+ if (r == 0)
+ return 0;
+
+ if (errno != EOPNOTSUPP) {
+ nbdkit_error ("zero: %m");
+ return r;
+ }
+
+ h->can_fallocate = false;
+ } else {
+ if (errno != EOPNOTSUPP) {
+ nbdkit_error ("zero: %m");
+ return r;
+ }
+
+ h->can_punch_hole = false;
+ }
+ }
+#endif
+
+ /* For block devices, we can use BLKZEROOUT.
+ NOTE: count and offset must be aligned to logical block size. */
+ if (h->is_block_device) {
+ uint64_t range[2] = {offset, count};
Is it worth attempting the ioctl only when you have aligned values?
+
+ r = ioctl(h->fd, BLKZEROOUT, &range);
This portion of the code be conditional on whether BLKZEROOUT is defined.
+ if (r == 0)
+ return 0;
+
+ nbdkit_error("zero: %m");
+ return r;
+ }
+
/* Trigger a fall back to writing */
errno = EOPNOTSUPP;
-#endif
return r;
}
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org