On 2/25/21 11:34 AM, Richard W.M. Jones wrote:
When writing to a file or block device, we are always writing new
(ie. previously uncached) data. This commit ensures that very little
of that data will be in the page cache after nbdcopy finishes by
evicting it as we go along. This ensures that the host page cache is
largely unchanged for other host processes.
This uses Linus's technique described here:
https://stackoverflow.com/a/3756466
but instead of using 2 windows, it uses a configurable larger number
of windows (in this case 8).
Here you state configurable...
Before this commit:
$ rm /var/tmp/pattern ; sync ; time ./run nbdcopy [ nbdkit pattern 32G ]
/var/tmp/pattern && cachestats /var/tmp/pattern
real 0m34.852s
user 0m18.368s
sys 0m33.117s
pages in cache: 7090389/8388608 (84.5%) [filesize=33554432.0K, pagesize=4K]
Notice that the newly written file ends up in the cache, thus trashing
the page cache on the host.
After this commit:
$ rm /var/tmp/pattern ; sync ; time ./run nbdcopy [ nbdkit pattern 32G ]
/var/tmp/pattern && cachestats /var/tmp/pattern
real 0m38.721s
user 0m18.837s
sys 0m40.654s
pages in cache: 65536/8388608 (0.8%) [filesize=33554432.0K, pagesize=4K]
The newly written file does not disturb the page cache. However there
is about 11% slow down.
I suspect that is because we end up waiting longer for flushing actions
to complete before evicting things from cache. Do we want this to be an
opt-in/out knob on the command line? If so, which way should we lean
for the default value of that knob?
@@ -159,7 +165,60 @@ page_cache_evict (struct rw_file *rwf, uint64_t
orig_offset, size_t orig_len)
len -= n;
}
}
-#endif
+#endif /* PAGE_CACHE_MAPPING */
+
+#ifdef EVICT_WRITES
+/* Prepare to evict file contents from the page cache when writing.
+ * We cannot do this directly (as for reads above) because we have to
+ * wait for Linux to finish writing the pages to disk. Therefore the
+ * strategy is to (1) tell Linux to begin writing asynchronously and
+ * (2) evict the previous pages, which have hopefully been written
+ * already by the time we get here. We have to maintain window(s) per
+ * thread.
+ *
+ * For more information see
https://stackoverflow.com/a/3756466 and
+ * the links to Linus's advice from that entry.
+ */
I'm less familiar with this interface (having never used it before), but
your usage patterns appear to match the man page and reference materials.
+
+/* Increasing the number of windows gives better performance since
+ * writes are given more time to make it to disk before we have to
+ * pause to do the page cache eviction. But a larger number of
+ * windows means less success overall since (a) more page cache is
+ * used as the program runs, and (b) we don't evict any writes which
+ * are still pending when the program exits.
+ */
+#define NR_WINDOWS 8
...but here you have a #define. Are you missing a command line option,
or saving it for a later patch on top?
Otherwise it looks reasonable, once you decide what command-line tuning
it might need (as the choice between speed vs. cache clobbering may be
something users want to make).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org