Re: [Libguestfs] [PATCH nbdkit 8/8] file: Implement extents.

Monday, 25 March 2019

On Sat, Mar 23, 2019 at 02:45:44PM -0500, Eric Blake wrote:
...
 On 3/20/19 5:11 PM, Richard W.M. Jones wrote:
 > This uses lseek SEEK_DATA/SEEK_HOLE to search for allocated data and
 > holes in the underlying file.
 > ---
 >  plugins/file/file.c | 139 ++++++++++++++++++++++++++++++++++++++++----
 >  1 file changed, 129 insertions(+), 10 deletions(-)
 > 

 > -int file_debug_zero;            /* to enable: -D file.zero=1 */
 > +/* Any callbacks using lseek must be protected by this lock. */
 > +static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER;
 > +
 > +/* to enable: -D file.zero=1 */
 > +int file_debug_zero;
 >  
 >  static void
 >  file_unload (void)
 > @@ -220,6 +227,21 @@ file_close (void *handle)
 >  
 >  #define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL
 >  
 > +/* For block devices, stat->st_size is not the true size. */
 > +static int64_t
 > +block_device_size (int fd)
 > +{
 > +  off_t size;
 > +
 > +  size = lseek (fd, 0, SEEK_END);

 Calling lseek without the lock? I'm not sure if you can guarantee that
 .size won't be called in parallel with some other .extents.
 /me reads ahead
 Oh, the caller has the lock. I might add a comment to this function that
 it expects the caller to grab the lock. 
Yes I'll add a comment as it wasn't even obvious to me when I looked
back at the code.

...
 > +#ifdef SEEK_HOLE
 > +/* Extents. */
 > +
 > +static int
 > +file_can_extents (void *handle)
 > +{
 > +  struct handle *h = handle;
 > +  off_t r;
 > +
 > +  /* A simple test to see whether SEEK_HOLE etc is likely to work on
 > +   * the current filesystem.
 > +   */
 > +  pthread_mutex_lock (&lseek_lock);
 > +  r = lseek (h->fd, 0, SEEK_HOLE);
 > +  pthread_mutex_unlock (&lseek_lock);
 > +  if (r == -1) {
 > +    nbdkit_debug ("extents disabled: lseek: SEEK_HOLE: %m");
 > +    return 0;

 In fact, if lseek() succeeds but returns the current file size, you know
 there are no holes in the file at all, in which case, it might be more
 efficient to store a variable stating that .extents should avoid lseek()
 altogether and just report the entire file as data.

 Hmm - if we just make .extents return false in this case, then nbdkit
 won't advertise block status to the guest even though we have valid
 status available (knowing there are no holes is nicer than merely
 assuming there are no holes because block status was unavailable). But
 if we return true, then WE have to do that optimization ourselves. Maybe
 .can_extent should be a tri-state return, just like .can_fua, where a
 plugin can choose NBDKIT_EXTENT_NONE to force no block status
 advertisement, NBDKIT_EXTENT_EMULATE (default if .extents is missing) to
 let nbdkit do the work of reporting the entire disk as data without
 calling .extents, and NBDKIT_EXTENT_NATIVE (requires .extents, and lets
 the plugin do all the work). Then, when lseek(SEEK_HOLE) returns EOF, we
 can return NBDKIT_EXTENT_EMULATE instead of having to track our
 optimization of skipping lseek in .extents ourselves. 
I'm not really sure I understand this.  However in the latest
iteration of the series can_extents is disconnected from what we
advertise back to clients.  It is solely used to decide if the server
calls .extents or bypasses it (returning fully allocated).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH nbdkit 8/8] file: Implement extents.