On 10/28/21 16:24, Richard W.M. Jones wrote:
 This version incorporates many of the changes suggested by Laszlo
and
 Nir in their reviews (thanks).
 
 We now require VDDK 6.5 because it turned out that 6.0 is broken.
 There is a reliable way to detect VDDK 6.0 by the absence of
 VixDiskLib_Wait, so we now give an accurate error message if someone
 tries to use libvixDiskLib.so.6 and it's < 6.5.
 
 Although the 5th patch is large, I didn't split it into two parts,
 because that's hard work. 
*chuckle* :)
 
 About checking the return value from pthread_mutex_init and
 pthread_cond_init.  I checked the implementation of these (in the only
 implementation we really care about, glibc), and my reading is these
 cannot realistically fail, especially if the attr parameter is NULL.
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_mutex_init....
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_cond_init.c...
OK!
 
 cmd->error / cmd->completed have been combined into a single
 cmd->status enum. 
Thanks.
 
 Laszlo raised a good point about abstraction (11c here:
 
https://listman.redhat.com/archives/libguestfs/2021-October/msg00218.html)
 which I broadly agree with.  I addressed it in a slightly different
 way.  I think there was likely a problem with the old code in that it
 set cmd->error outside the mutex.  I couldn't prove that this was
 definitely wrong, but it seemed either unsafe or a source of future
 problems. 
It was safe.
The synchronization point was the "completed" field, which was properly
protected by the mutex. Whatever you data you set/expose, before
"publishing" through the mutex-protected "completed" field, will be
safely consumable in the consumer, assuming that one too goes through
the mutex-protected "completed" field. There are various consistency
models (see Paolo's article series on LWN
<
https://lwn.net/Articles/844224/>), and posix thread mutexes implement
about the heaviest-weight of them.
In this regard, the READ opcode's handling is not much different from
"error". After all, the command consumer thread populates the buffer
(from the vmware disk image) outside of the mutex. Then the mutex is
acquired, the "completed" field is updated, the original command
producer is signaled, and the mutex is released. At this point the
original command producer (the submitter of the READ opcode) is supposed
to process not just the "error" field, but the buffer populated by the
actual vmware disk read operation. The safety of that too depends on
posix thread mutexes implementing publish-subscribe.
 Instead the code now only updates the new cmd->status flag
 while holding the lock, which in turn solves the abstraction problem
 because there are now only a couple of places which update
 cmd->status. 
Thanks,
Laszlo