On 6/6/23 21:06, Richard W.M. Jones wrote:
Michael Henriksen pointed out an issue with this approach.
If the web server is actually generating the content on the fly then
it may send it as chunked encoding, and in HTTP/1.1 it's not required
that the Content-Length field is present (since it may not be known
when the server begins sending the content):
https://www.rfc-editor.org/rfc/rfc2616#section-4.4
The only way to know the true length would be to download all the
content, which we definitely don't want to do.
I'm not totally clear if Content-Length, if present, must be valid.
Curl source code seems to imply not:
https://github.com/curl/curl/blob/6e4fedeef7ff2aefaa4841a4710d6b8a37c6ae7...
but maybe they are just being over-cautious? The RFC is a bit
confusing.
AWS itself _does_ send Content-Length and it appears to be valid ...
So one approach might be to assume it is valid, which I believe is
what the current patch series does.
I've been aware that some servers don't send Content-Length; you can
notice such servers even through simple downloads in Firefox: the
download manager can't estimate for you how much time is left until the
download completes. You don't get a nice progress bar.
I think it's fine if nbdkit's curl plugin just doesn't support such web
servers. We can simply require a (valid) Content-Length field.
I think that requirement would be similar to another requirement we
already place: I think our curl plugin depends on the server supporting
byte ranges. Without that, we have no random access to the file on the
web server, and it's not possible to expose it as a "block device". Right?
Laszlo