On Wed, Oct 13, 2021 at 5:06 AM Richard W.M. Jones <rjones@redhat.com> wrote:
Probably best to read this first:
https://bugzilla.redhat.com/show_bug.cgi?id=2013000

This adds an effective-url=true|false flag to nbdkit-curl-plugin.  If
true, the first time we fetch the URL, we fetch the "effective" URL
(ie. the URL after all redirects were resolved) and use that for all
future connections within the current nbdkit instance.

The idea behind this patch is to do with the Fedora mirror system
which is flakey.  Some mirrors return errors or 404s.  And the mirror
system will give you a new redirect each time (within your
geographical region).  This results in transfers sometimes failing
because a single read went to a bad mirror.

After implementing this patch I'm not very happy with it.  It is
incomplete because we pass the original URL to cookie-script and
header-script, so plugin/curl/scripts.c will also need to be modified.
Once those modifications are done the change becomes quite invasive.
Also I'm not convinced that it really solves any problem.  In the
manual change I wrote:

    Note use of this feature in long-lived nbdkit instances can cause
    subtle problems:

    •   The effective URL persists across connections for the lifetime
        of the nbdkit instance.  If nbdkit is used for a long time then
        it is possible for the redirected URL to become stale.

    •   It will defeat some mirror load-balancing techniques.

    •   If the mirror service sometimes redirects to a broken URL and
        it happens that the URL you fetch first is broken then nbdkit
        will no longer recover on subsequent connections (instead you
        will need to restart nbdkit).

I suggested another way to solve this by using curl APIs to fetch the
effective URL up front and passing that URL to nbdkit (see
https://bugzilla.redhat.com/show_bug.cgi?id=2013000#c1), but
apparently that solution isn't acceptable for unclear reasons.


The discussion is around who should be responsible for figuring out the effective URL should it be our side or should it be the nbdkit side. In an ideal world, no one would care because we never get a broken mirror, and we wouldn't be asking for any of this. However, we get bug reports about once a month about failing imports and when investigated it turns out that there is a bad mirror somewhere which causes the failed import.

So one solution is to figure out the effective URL and just use that, and the question is who should figure out the effective URL. The application that starts nbdkit or nbdkit itself. IMO it boils down to who can best deal with the failures. The application that starts nbdkit knows if nbdkit is short lived or not. So the URL becoming stale is not an issue there since the application knows if it should restart nbdkit. Defeating the mirror load balancing will be true regardless of who does the pre-fetching. And if the first redirect is to the broken mirror, nbdkit will just fail and the calling application can recreate it.

Personally I think the calling application knows how to handle certain scenarios well enough, that it should figure out the effective URL ahead of time. But I believe the objection from my colleagues is that it is using a hammer to solve the problem. nbdkit knows the internals of nbdkit, and it could come up with a more effective mechanism to deal with these mirror failures. Is it possible to do a retry on the byte range level? Basically if a particular range fails, retry it and possibly get a new mirror that does work, instead of failing the entire import at that point.
 
I can't think of any other way to solve this in the context of nbdkit
(maybe have it detect when redirection is happening and retry the
redirection on error?).  So here's the patch.

Rich.