On Wed, Nov 21, 2018 at 04:05:10PM +0000, Richard W.M. Jones wrote:
The xz plugin includes a block cache, and now so does the filter.
(The plugin long predates our addition of filters into nbdkit).
I suppose there is a case for removing the block cache code from the
xz plugin, and relying on the cache filter instead. I'll test that
out to see if it makes a difference. It will certainly simplify the
xz filter code if we did that, but at the cost of making it a bit more
complex to use.
Actually I think we are going to need to retain the block cache. It
solves a slightly different problem from placing the cache filter on
top (in fact both are useful).
Let's say you have an XZ file with a 100,000 byte block size. Then
reading two blocks at 0-1000 and 1000-2000 would result in reading and
uncompressing a whole block twice. The block cache in the xz
plugin/filter avoids this; the cache on top does not.
Interesting factoid:
www.mirrorsite.org rapidly throttles any
connection that makes repeated range requests ... However if you open
a new connection it is unaffected by the throttling on the existing
connection (I thought it would throttle based on IP address). Anyway
this, combined with the large block size in the Fedora Cloud image,
makes xz + curl virtually unusable.
I also think the new filter would be better if it made larger reads.
The plugin makes 8K reads (BUFSIZ) which is likely reasonable for
reading from a local file. But the overhead of reading from the curl
plugin probably makes much larger reads sensible. I wonder if the
filter can intuit a good block size to use somehow?
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v