On 08/02/2018 02:05 PM, Nir Soffer wrote:
fallocate(FALLOC_FL_ZERO_RANGE) is supportd for block devices with
modern kernel, but when it is not, we fall back to manual zeroing.
Check if the underlying file is a block device when opening the file,
and fall back to ioctl(BLKZEROOUT) for aligned zero requests for a
block device.
+++ b/plugins/file/file.c
@@ -45,6 +45,7 @@
#if defined(__linux__) && !defined(FALLOC_FL_PUNCH_HOLE)
#include <linux/falloc.h> /* For FALLOC_FL_*, glibc < 2.18 */
+#include <linux/fs.h> /* For BLKZEROOUT */
Will this pick up BLKZEROOUT in all cases where it is needed? Or do we
need to relax the !defined(FALLOC_FL_PUNCH_HOLE), and just blindly
include both of these headers for all Linux compilations?
+static bool
+is_aligned(struct handle *h, uint64_t n)
+{
+ return n % h->sector_size == 0;
Since we know (but the compiler doesn't) that sector_size is a power of
2, it is slightly faster to use bitwise math:
return !(n & (h->sector_size - 1))
+#ifdef BLKSSZGET
+ if (h->is_block_device) {
+ if (ioctl (h->fd, BLKSSZGET, &h->sector_size)) {
+ nbdkit_error ("ioctl(BLKSSZGET): %s: %m", filename);
+ free (h);
+ return NULL;
If the ioctl() fails, would it be better to just fall back...
+ }
+ }
+#else
+ h->sector_size = 4096; /* Safe guess */
...to the safe guess, instead of giving up entirely? (Might matter on a
system with newer headers that have the macro, but where the kernel does
not support the ioctl).
@@ -329,6 +361,20 @@ file_zero (void *handle, uint32_t count,
uint64_t offset, int may_trim)
}
#endif
+#ifdef BLKZEROOUT
+ /* For aligned range and block devices, we can use BLKZEROOUT. */
+ if (h->is_block_device && is_aligned (h, offset) && is_aligned (h,
count)) {
Since alignment is a power of 2, you can compress this as:
if (h->is_block_device && is_aligned (h, offset | count)) {
+ uint64_t range[2] = {offset, count};
+
+ r = ioctl (h->fd, BLKZEROOUT, &range);
+ if (r == 0)
+ return r;
+
+ nbdkit_error ("zero: %m");
+ return r;
Are we sure that treating ALL errors as fatal is worthwhile, or should
we still attempt to trigger a fall back to writing?
+ }
+#endif
+
/* Trigger a fall back to writing */
errno = EOPNOTSUPP;
return r;
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org