On 02/02/22 09:23, Laszlo Ersek wrote:
On 02/01/22 20:44, Eric Blake wrote:
> These are the trivial patches modeled after Nir's commit 7ba6ef67,
> fixing further places that fail to check *error during callbacks.
I find these interfaces very difficult to use.
https://libguestfs.org/nbd_aio_block_status.3.html
We have two callbacks here, each takes a *error, and I have zero idea of
the order the callbacks are supposed to be called in.
The last paragraph of the commit message speaks about this, and the code
updates seem to suggest some ordering, but I'm still unsure.
... In fact, I'm now uncertain whether ignoring "err" in virt-v2v's
"lib/utils.ml", function "get_disk_allocated", is safe. It's a
"sync"
(not aio) nbd_block_status() call, but the documentation
<
https://libguestfs.org/nbd_block_status.3.html> says:
"On entry to the callback, the error parameter contains the errno value
of any previously detected error."
So what should we do then, in "get_disk_allocated"? Return -1
immediately?
From patch#4, it seems that we need exactly this. Here's the
function:
let get_disk_allocated ~dir ~disknr =
let socket = sprintf "%s/out%d" dir disknr
and alloc_ctx = "base:allocation" in
with_nbd_connect_unix ~socket ~meta_contexts:[alloc_ctx]
(fun nbd ->
if NBD.can_meta_context nbd alloc_ctx then (
(* Get the list of extents, using a 2GiB chunk size as hint. *)
let size = NBD.get_size nbd
and allocated = ref 0_L
and fetch_offset = ref 0_L in
while !fetch_offset < size do
let remaining = size -^ !fetch_offset in
let fetch_size = min 0x8000_0000_L remaining in
NBD.block_status nbd fetch_size !fetch_offset
(fun ctx offset entries err ->
assert (ctx = alloc_ctx);
for i = 0 to Array.length entries / 2 - 1 do
let len = entries.(i * 2)
and typ = entries.(i * 2 + 1) in
assert (len > 0_L);
if typ &^ 1_L = 0_L then
allocated := !allocated +^ len;
fetch_offset := !fetch_offset +^ len
done;
0
)
done;
Some !allocated
) else None
)
If our (nameless) function is entered with a nonzero "err", then we
cannot trust the "entries" array. Which means not only "allocated"
may
be miscalculated, but even "fetch_offset" -- and the latter can as well
break the chunking (= outer) loop. This seems to imply we *must not*
continue the outer loop.
So how do we handle this in OCaml? If we return -1 from the (nameless)
extents callback, will NBD.block_status raise an exception?
Hmmmm: maybe. In the generated nbd_internal_ocaml_nbd_block_status()
function, in file "ocaml/nbd-c.c", we have:
caml_enter_blocking_section ();
r = nbd_block_status (h, count, offset, extent_callback, flags);
caml_leave_blocking_section ();
if (r == -1)
nbd_internal_ocaml_raise_error ();
However, in extent_wrapper() -> extent_wrapper_locked(), there is more
confusion for me:
metacontextv = caml_copy_string (metacontext);
offsetv = caml_copy_int64 (offset);
entriesv = nbd_internal_ocaml_alloc_int64_from_uint32_array (entries, nr_entries);
errorv = caml_alloc_tuple (1);
Store_field (errorv, 0, Val_int (*error));
args[0] = metacontextv;
args[1] = offsetv;
args[2] = entriesv;
args[3] = errorv;
rv = caml_callbackN_exn (data->fnv, 4, args);
*error = Int_val (Field (errorv, 0));
if (Is_exception_result (rv)) {
nbd_internal_ocaml_exception_in_wrapper ("extent", rv);
CAMLreturnT (int, -1);
}
r = Int_val (rv);
assert (r >= 0);
CAMLreturnT (int, r);
If I understand correctly, if we returned (-1) from the (nameless)
extents callback, that would trip the assert(); would it not? Thus, we
would not reach the nbd_internal_ocaml_raise_error() in the above-quoted
nbd_internal_ocaml_nbd_block_status() function.
Thanks,
Laszlo
And if we do so, what is that going to mean for
"NBD.block_status"? Will it raise an exception?
Thanks,
Laszlo
> nbdcopy still has a bigger bug in that it is NOT properly checking for
> *error in the pread/pwrite completion handlers, but fixing that
> gracefully (rather than just quitting the process immediately, as we
> can do in the examples), requires more invasive patches which I will
> be posting separately for review, while the ones posted here are
> trivial enough that I'm pushing them now (commits 23bcea4a..1373d423).
>
> I also audited other uses of completion callbacks:
> - in tests, closure-lifetimes, errors, server-death, and
> shutdown-flags are all safe (either the callback returns 0 and the
> code later calls nbd_aio_command_completed, or we are explicitly
> testing aspects of completion callbacks)
> - in fuse/operations.c, the completion callback returns 0 and calls
> nbd_aio_command_completed later
> - in examples/strict-structured-reads.c, the completion callback
> properly checks *error
>
> I also checked that nbdkit's nbd plugin properly checks *error.
>
> Eric Blake (4):
> examples: aio-connect-read.c: Fix error handling
> examples: glib-main-loop: don't strand commands when gsource
> disappears
> examples: glib-main-loop: Fix error handling
> copy: Ignore extents if we encountered earlier error
>
> examples/aio-connect-read.c | 7 +++++++
> examples/glib-main-loop.c | 14 ++++++++++++--
> copy/nbd-ops.c | 4 ++--
> 3 files changed, 21 insertions(+), 4 deletions(-)
>