On Tue, Feb 28, 2023 at 12:24:04PM +0100, Laszlo Ersek wrote:
 On 2/27/23 17:44, Richard W.M. Jones wrote:
 > On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
 >> Or intentionally choose a hash that can be computed out-of-order, such
 >> as a Merkle Tree.  But we'd need a standard setup for all parties to
 >> agree on how the hash is to be computed and checked, if it is going to
 >> be anything more than just a linear hash of the entire guest-visible
 >> contents.
 > 
 > Unfortunately I suspect that by far the easiest way for people who
 > host images to compute checksums is to run 'shaXXXsum' on them or sign
 > them with a GPG signature, rather than engaging in a novel hash
 > function.  Indeed that's what is happening now:
 > 
 > 
https://alt.fedoraproject.org/en/verify.html
 
 If the output is produced with unordered writes, but the complete output
 needs to be verified with a hash *chain*, that still allows for some
 level of asynchrony. The start of the hashing need not be delayed until
 after the end of output, only after the start of output.
 
 For example, nbdcopy could maintain the highest offset up to which the
 output is contiguous, and on a separate thread, it could be hashing the
 output up to that offset.
 
 Considering a gigantic output, as yet unassembled blocks could likely
 not be buffered in memory (that's why the writes are unordered in the
 first place!), so the hashing thread would have to re-read the output
 via NBD. Whether that would cause performance to improve or to
 deteriorate is undecided IMO. If the far end of the output network block
 device can accommodate a reader that is independent of the writers, then
 this level of overlap is beneficial. Otherwise, this extra reader thread
 would just add more thrashing, and we'd be better off with a separate
 read-through once writing is complete. 
In my mind I'm wondering if there's any mathematical result that lets
you combine each hash(block_i) into the final hash(block[1..N])
without needing to compute the hash of each block in order.
(This is what blkhash solves, but unfortunately the output isn't
compatible with standard hashes.)
Rich.
-- 
Richard Jones, Virtualization Group, Red Hat 
http://people.redhat.com/~rjones
Read my programming and virtualization blog: 
http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top