On 10/13/21 12:06, Richard W.M. Jones wrote:
Probably best to read this first:
https://bugzilla.redhat.com/show_bug.cgi?id=2013000
This adds an effective-url=true|false flag to nbdkit-curl-plugin. If
true, the first time we fetch the URL, we fetch the "effective" URL
(ie. the URL after all redirects were resolved) and use that for all
future connections within the current nbdkit instance.
The idea behind this patch is to do with the Fedora mirror system
which is flakey. Some mirrors return errors or 404s. And the mirror
system will give you a new redirect each time (within your
geographical region). This results in transfers sometimes failing
because a single read went to a bad mirror.
After implementing this patch I'm not very happy with it. It is
incomplete because we pass the original URL to cookie-script and
header-script, so plugin/curl/scripts.c will also need to be modified.
Once those modifications are done the change becomes quite invasive.
Also I'm not convinced that it really solves any problem. In the
manual change I wrote:
Note use of this feature in long-lived nbdkit instances can cause
subtle problems:
• The effective URL persists across connections for the lifetime
of the nbdkit instance. If nbdkit is used for a long time then
it is possible for the redirected URL to become stale.
• It will defeat some mirror load-balancing techniques.
• If the mirror service sometimes redirects to a broken URL and
it happens that the URL you fetch first is broken then nbdkit
will no longer recover on subsequent connections (instead you
will need to restart nbdkit).
I suggested another way to solve this by using curl APIs to fetch the
effective URL up front and passing that URL to nbdkit (see
https://bugzilla.redhat.com/show_bug.cgi?id=2013000#c1), but
apparently that solution isn't acceptable for unclear reasons.
I can't think of any other way to solve this in the context of nbdkit
(maybe have it detect when redirection is happening and retry the
redirection on error?). So here's the patch.
Given that I've been CC'd, here's my opinion:
I strongly dislike transparent mirror selection (redirects) as a
principle (based in experience). Precisely with the Fedora mirror
system, I frequently see "dnf update" download some packages at
lightning speed, then get stuck *completely* at some other package
(potantially after spewing a bunch of "broken mirror" messages at me),
so that I have to Ctrl-C the whole command, and re-issue it. Transparent
redirects seem to want to hide the "mess" from the user, but they fail
to do that quite frequently, IMO.
The Cygwin experience is better, IMO. When you start the Cygwin package
installer (whether you do it for initial installation or for updating
packages), a fresh list of mirrors is fetched from some central location
(I think?), but then you, the user, have to pick a *specific* mirror
from that list. I always just go for fsn.hu, which I know to be a
rock-stable mirror (for many OS distros, including Fedora) in my
location. No bad surprises using that mirror, ever.
With that in mind, I wouldn't complicate *any* application to deal with
redirects / mirror selection transparently. Whatever the application
does, the user will not be happy, and will want to tweak the logic. Just
let the user pick an effective URL themselves, and stick with that forever.
This may not be great for "load balancing", but AIUI the pain point here
is failed *individual* imports. I think it should be OK to stick with a
particular fixed URL for the duration of an import.
(I should actually update my DNF repo files on Fedora to use fsn.hu as
well, I just always get discouraged by the "metalink" stuff in there,
and the hard-to-read variables (such as "$releasever", "$basearch",
...).)
From Alexander's description, it's clear that the reliability
of the
mirror network is the core issue here, and we're now pushing around the
unwanted job of hiding it from the user, from one application to the
other. I'd say let the *user* deal with it, *once*, in their
configuration, and neither application (= neither nbdkit nor the app
that starts nbdkit) should struggle with redirects.
In particular, tolerating (following) redirects per every single Range
request looks incredibly inefficient to me (not to mention, brittle).
(Sorry if my opinion is too naive.)
Thanks
Laszlo