On Tue, Jun 06, 2023 at 08:06:50PM +0100, Richard W.M. Jones wrote:
Michael Henriksen pointed out an issue with this approach.
If the web server is actually generating the content on the fly then
it may send it as chunked encoding, and in HTTP/1.1 it's not required
that the Content-Length field is present (since it may not be known
when the server begins sending the content):
https://www.rfc-editor.org/rfc/rfc2616#section-4.4
The only way to know the true length would be to download all the
content, which we definitely don't want to do.
I'm not totally clear if Content-Length, if present, must be valid.
Curl source code seems to imply not:
https://github.com/curl/curl/blob/6e4fedeef7ff2aefaa4841a4710d6b8a37c6ae7...
but maybe they are just being over-cautious? The RFC is a bit
confusing.
AWS itself _does_ send Content-Length and it appears to be valid ...
So one approach might be to assume it is valid, which I believe is
what the current patch series does.
After looking at the code this morning I'm fairly sure this is a
non-problem, or to be more specific it is an existing (but not
serious) problem in nbdkit-curl-plugin.
Curl already deals with the case where a web server sends the
Transfer-encoding: chunked header with a Content-Length header (which
is apparently wrong) by setting the size to -1:
https://github.com/curl/curl/blob/cd18e5c4645e3a3f8ca0feecefe5b1150405c8f...
which in curl-speak means the size is unknown:
https://curl.se/libcurl/c/CURLINFO_CONTENT_LENGTH_DOWNLOAD_T.html
and we already deal with that. Since it is already possible for a web
server to set headers like this in a HEAD request, we're already
dealing with this case (by rejecting the URL in the plugin).
Adding this GET fallback method, which uses the common code path for
fetching the size, does not make this any worse or better.
I added a comment to this effect to the code, and split up the patch a
lot more, and have pushed it in commits 51b86cff3..4b111535c
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html