On 2/25/21 11:34 AM, Richard W.M. Jones wrote:
When reading from a local file we can take advantage of the page
cache
(ie. not having to read the file from disk if a copy is present in
memory), while at the same time not disturbing the state of the page
cache. Disturbing the page cache can have bad consequences for other
processes running on the host since they will have their working set
evicted so it is something we should generally avoid.
Even more so when dealing with disk images that are likely to be larger
in size than available memory, which tends to be a rather common NBD
usage ;)
This requires Linux APIs, using the technique described here:
https://insights.oetiker.ch/linux/fadvise/
This change only affects reads, since doing the same for writes is
even more complicated.
Notice there is only a small increase in the amount of file which is
cached, the elapsed time is about the same, but there is an increase
in system time (presumably the overhead of POSIX_FADV_DONTNEED).
The improved cache pressure at almost no elapsed time penalty is worth it.
+/* If we are going to attempt page cache mapping which tries not to
+ * disturb the page cache when reading a file. Only do this on Linux
Works, but took me a couple reads to decide it wasn't a sentence
fragment. Would it be any better as s/If/Whether/?
+ * systems where we understand how the page cache behaves. Since
we
+ * need to mmap the whole file, also restrict this to 64 bit systems.
+ */
+#ifdef __linux__
+#ifdef __SIZEOF_POINTER__
+#if __SIZEOF_POINTER__ == 8
+#define PAGE_CACHE_MAPPING 1
+#endif
+#endif
+#endif
+
+#ifdef PAGE_CACHE_MAPPING
+static long page_size;
+
+static void page_size_init (void) __attribute__((constructor));
+static void
+page_size_init (void)
+{
+ page_size = sysconf (_SC_PAGE_SIZE);
Technically, sysconf() can fail; but if it fails on _SC_PAGE_SIZE,
you've got bigger problems ;)
+/* Test if a single page of the file was cached before nbdcopy ran.
*/
+static inline bool
+page_was_cached (struct rw_file *rwf, uint64_t offset)
+{
+ uint64_t page = offset / page_size;
+ if (page < rwf->cached_pages.size)
+ return (rwf->cached_pages.ptr[page] & 1) != 0;
+ else
+ /* This path is taken if we didn't manage to map the input file
+ * for any reason. In this case assume that pages were mapped so
+ * we will not evict them: essentially fall back to doing nothing.
+ */
+ return true;
Nice fallback.
+}
+
+/* Evict file contents from the page cache if they were not present in
+ * the page cache before.
+ */
+static inline void
+page_cache_evict (struct rw_file *rwf, uint64_t orig_offset, size_t orig_len)
+{
+ uint64_t offset, n;
+ size_t len;
+
+ if (rwf->cached_pages.size == 0) return;
+
+ /* Only bother with whole pages. */
+ offset = ROUND_UP (orig_offset, page_size);
+ len = orig_len - (offset - orig_offset);
+ len = ROUND_DOWN (len, page_size);
What assurance do we have that the rest of nbdcopy is favoring a stride
that is a multiple of page_size, so that we aren't stranding partial
pages everywhere?
Looks good.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org