On Thu, Mar 2, 2023 at 10:46 AM Richard W.M. Jones <rjones(a)redhat.com> wrote:
On Mon, Feb 27, 2023 at 07:09:33PM +0200, Nir Soffer wrote:
> On Mon, Feb 27, 2023 at 6:41 PM Richard W.M. Jones <rjones(a)redhat.com> wrote:
> > I think it would be more useful if (or in addition) it could compute
> > the checksum of a stream which is being converted with 'qemu-img
> > convert'. Extra points if it can compute the checksum over either the
> > input or output stream.
>
> I thought about this, it could be a filter that you add in the graph
> that gives you checksum as a side effect of copying. But this requires
> disabling unordered writes, which is pretty bad for performance.
>
> But even if you compute the checksum during a transfer, you want to
> verify it by reading the transferred data from storage. Once you computed
> the checksum you can keep it for verifying the same image in the future.
The use-case I have in mind is being able to verify a download when
you already know the checksum and are copying / converting the image
in flight.
eg: You are asked to download
https://example.com/distro-cloud.qcow2
with some published checksum and you will on the fly download and
convert this to raw, but want to verify the checksum (of the qcow2)
during the conversion step. (Or at some point, but during the convert
avoids having to spool the image locally.)
I'm thinking about the same flow. I think the best way to verify is:
1. The remote server publishes a block-checksum of the image
2. The system gets the block-checksum from the server (from http header?)
3. The system pulls data from the server, pushes to the target disk in
the wanted format
4. The system computes a checksum of the target disk
This way you verify the entire pipeline including the storage. If we
compute a checksum
during the conversion, we verify only that we got the correct data
from the server.
If we care only about verifying the transfer from the server, we can compute the
checksum during the download, which is likely to be sequential (so easy to
integrate with blkhash)
If we want to validate nbdcopy, it will be much harder to compute a checksum
inside nbdcopy because it does not stream the data in order.
Nir