Re: [Libguestfs] [PATCH nbdkit] curl: Add effective-url flag

Thursday, 14 October 2021

On Thu, Oct 14, 2021 at 05:32:01PM +0200, Laszlo Ersek wrote:
...
 On 10/13/21 12:06, Richard W.M. Jones wrote:
 > @@ -594,6 +616,18 @@ curl_open (int readonly)
 >      goto err;
 >    }
 >  
 > +  if (effectiveurl) {
 > +    ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&url_effective_lock);
 > +
 > +    if (url_effective == NULL) {
 > +      r = curl_easy_getinfo (h->c, CURLINFO_EFFECTIVE_URL, &url_effective);
 > +      if (r != CURLE_OK) {
 > +        display_curl_error (h, r, "could not get effective URL");
 > +        goto err;
 > +      }
 > +    }
 > +  }
 > +
 >  #ifdef HAVE_CURLINFO_CONTENT_LENGTH_DOWNLOAD_T
 >    r = curl_easy_getinfo (h->c, CURLINFO_CONTENT_LENGTH_DOWNLOAD_T, &o);
 >    if (r != CURLE_OK) {
 > 

 Asking just for my own education:

 My understanding (or, well, assumption) from this code is that multiple
 threads may run curl_open() concurrently. The first code snippet sets
 the URL -- namely, in the thread that runs it first, it sets the "main"
 URL, and then the second code snippet grabs the redirected URL. The rest
 of the threads are then supposed to set the previously redirected URL as
 their primary URL (and skip the second code snippet). 
There are a couple of things going on.  Firstly there's the lifecycle
of nbdkit, explained in this diagram:

https://libguestfs.org/nbdkit-plugin.3.html#Callback-lifecycle

Then there's the thread model which only affects what happens in the
preconnect...close section of that diagram.  The thread model is
determined by the plugin, and for the curl plugin is set to
NBDKIT_THREAD_MODEL_SERIALIZE_REQUESTS which means that requests are
serialized *within a client* but not across clients.  So curl_open can
indeed be called in parallel.

The first section of the diagram is the configuration phase, which is
when “const char *url” will be set from the command line, and that
happens before threads are created and (in this plugin) “url” never
changes after that.

...
 While "url_effective" is indeed protected by the mutex,
don't we have
 (at least) a TOCTOU issue here? If two threads start at the top of
 curl_open() in close succession, they'll both pass the original URL as
 CURLOPT_URL to their own curl contexts -- "url_effective_lock" will
 serialize them, but, by the time the second thread sets CURLOPT_URL, the
 first thread to exit the upper code snippet may not have reached the
 second code snippet (with CURLINFO_EFFECTIVE_URL). Meaning that both
 CURL contexts could end up with different effective URLs. My concern is
 not that the second code snippet will corrupt anything in that case (it
 will not), but that one of the resolved (effective) URLs is going to be
 thrown away. It's just not intuitive. 
I believe that is correct yes - I should have made the mutex scope
cover the whole function.

...
 In other words, the (CURLOPT_URL, CURLINFO_EFFECTIVE_URL) *pair*
needs
 to be atomic, and until one thread completes *both* steps, no other
 threads must fetch any ranges (because they don't know what URL to use).

 In my opinion this technical detail just strengthens my point that the
 redirect should be resolved outside of nbdkit. 
Agreed.

The problem I was actually thinking about is that we need to change
plugins/curl/scripts.c so it uses “url_effective” instead of “url”.
That also needs the lock (or ... it's a bit unclear to me if it needs
the lock).

I'm going to look at retrying requests, which seems like a worthwhile
change that might be made in a general filter.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH nbdkit] curl: Add effective-url flag