On 10/28/21 16:24, Richard W.M. Jones wrote:
This version incorporates many of the changes suggested by Laszlo
and
Nir in their reviews (thanks).
We now require VDDK 6.5 because it turned out that 6.0 is broken.
There is a reliable way to detect VDDK 6.0 by the absence of
VixDiskLib_Wait, so we now give an accurate error message if someone
tries to use libvixDiskLib.so.6 and it's < 6.5.
Although the 5th patch is large, I didn't split it into two parts,
because that's hard work.
*chuckle* :)
About checking the return value from pthread_mutex_init and
pthread_cond_init. I checked the implementation of these (in the only
implementation we really care about, glibc), and my reading is these
cannot realistically fail, especially if the attr parameter is NULL.
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_mutex_init....
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_cond_init.c...
OK!
cmd->error / cmd->completed have been combined into a single
cmd->status enum.
Thanks.
Laszlo raised a good point about abstraction (11c here:
https://listman.redhat.com/archives/libguestfs/2021-October/msg00218.html)
which I broadly agree with. I addressed it in a slightly different
way. I think there was likely a problem with the old code in that it
set cmd->error outside the mutex. I couldn't prove that this was
definitely wrong, but it seemed either unsafe or a source of future
problems.
It was safe.
The synchronization point was the "completed" field, which was properly
protected by the mutex. Whatever you data you set/expose, before
"publishing" through the mutex-protected "completed" field, will be
safely consumable in the consumer, assuming that one too goes through
the mutex-protected "completed" field. There are various consistency
models (see Paolo's article series on LWN
<
https://lwn.net/Articles/844224/>), and posix thread mutexes implement
about the heaviest-weight of them.
In this regard, the READ opcode's handling is not much different from
"error". After all, the command consumer thread populates the buffer
(from the vmware disk image) outside of the mutex. Then the mutex is
acquired, the "completed" field is updated, the original command
producer is signaled, and the mutex is released. At this point the
original command producer (the submitter of the READ opcode) is supposed
to process not just the "error" field, but the buffer populated by the
actual vmware disk read operation. The safety of that too depends on
posix thread mutexes implementing publish-subscribe.
Instead the code now only updates the new cmd->status flag
while holding the lock, which in turn solves the abstraction problem
because there are now only a couple of places which update
cmd->status.
Thanks,
Laszlo