KubeVirt is a custom resource (a kind of plugin) for Kubernetes which
adds support for running virtual machines. As part of this they have
the same problems as everyone else of how to import large disk images
into the system for pets, templates, etc.
As part of the project they've defined a format for embedding a disk
image into a container (unclear why? perhaps so these can be
distributed using the existing container registry systems?):
https://github.com/kubevirt/containerized-data-importer/blob/master/doc/i...
An example of such a disk-in-a-container is here:
https://hub.docker.com/r/kubevirt/fedora-cloud-container-disk-demo
We've been asked if we can help with tools to efficiently import these
disk images, and I have suggested a few things with nbdkit and have
written a couple of filters (tar, gzip) to support this.
This email is my thoughts on further development work in this area.
----------------------------------------------------------------------
(1) Accessing the disk image directly from the Docker Hub.
When you get down to it, what this actually is:
* There is a disk image in qcow2 format.
* It is embedded as "./disk/downloaded" in a gzip-compressed tar
file. (This is a container with a single layer).
* This tarball is uploaded to (in this case) the Docker Hub and can
be accessed over a URL. The URL can be constructed using a few
json requests.
* The URL is served by nginx and this supports HTTP range requests.
I encapsulated all of this in the attached script. This is an
existence proof that it is possible to access the image with nbdkit.
One problem is that the auth token only lasts for a limited time
(seems to be 5 minutes in my test), and it doesn't automatically renew
as you download the layer, so if the download takes longer than 5
minutes you'll suddenly get unrecoverable authorization failures.
There seem to be two possible ways to solve this:
(a) Write a new nbdkit-container-plugin which does the authorization
(essentially hiding most of the details in the attached script
from the user). It could deal with renewing the key as
required.
(b) Modify nbdkit-curl-plugin so the user could provide a script for
renewing authorization. This would expose the full gory details
to the end user, but on the other hand might be useful in other
situations that require authorization.
(2) nbdkit-tar-filter exportname and listing files.
This has already been covered by an email from Nir Soffer, so I'll
simply link to that:
https://lists.gnu.org/archive/html/qemu-discuss/2020-06/msg00058.html
It basically requires a fairly simple change to nbdkit-tar-filter to
map the tar filenames into export names, and a deeper change to nbdkit
core server to allow listing all export names. The end result would
be that an NBD client could query the list of files [ie exports] in
the tarball and choose one to download.
(3) gzip & tar require full downloads - why not “docker/podman save/export”?
Stepping back to get the bigger picture: Because the OCI standard uses
gzip for compression (
https://stackoverflow.com/a/9213826), and
because the tar index is interspersed with the tar data, you always
need to download the whole container layer before you can access the
disk image inside. Currently nbdkit-gzip-filter hides this from the
end user, but it's still downloading the whole thing to a temporary
file. There's no way round that unless OCI can be persuaded to use a
better format.
But docker/podman already has a way to export container layers,
ie. the save and export commands. These also have the advantage that
it will cache the downloaded layers between runs. So why aren't we
using that?
In this world, nbdkit-container-plugin would simply use docker/podman
save (or export?) to grab the container as a tar file, and we would
use the tar filter as above to expose the contents as an NBD endpoint
for further consumption. IOW:
nbdkit container docker.io/kubevirt/fedora-cloud-container-disk-demo \
--filter=tar tar-entry=./downloaded/disk
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW