On 5/31/23 20:29, Jürgen Hötzel wrote:
 
 "Richard W.M. Jones" <rjones(a)redhat.com> writes:
 
> On Sat, May 27, 2023 at 03:35:38PM +0200, Jürgen Hötzel wrote:
>> Fixes deadlocks on OCaml5 when trying to get the lock that is already
>> held:
>>
>> Fatal error during lock: Resource deadlock avoided
>> ---
>>  ocaml/guestfs-c.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/ocaml/guestfs-c.c b/ocaml/guestfs-c.c
>> index 3888c9456..bcf8e6ab3 100644
>> --- a/ocaml/guestfs-c.c
>> +++ b/ocaml/guestfs-c.c
>> @@ -395,12 +395,16 @@ event_callback_wrapper (guestfs_h *g,
>>    /* Ensure we are holding the GC lock before any GC operations are
>>     * possible. (RHBZ#725824)
>>     */
>> -  caml_leave_blocking_section ();
>> +  bool in_blocking_section = (caml_state == NULL);
>> +
>> +  if (in_blocking_section)
>> +    caml_leave_blocking_section ();
>>  
>>    event_callback_wrapper_locked (g, data, event, event_handle, flags,
>>                                   buf, buf_len, array, array_len);
>>  
>> -  caml_enter_blocking_section ();
>> +  if (in_blocking_section)
>> +    caml_enter_blocking_section ();
>>  }
>
> I don't understand the reason why this patch is needed.
 
 the event_callback_wrapper is ALSO called from libguestfs code that already
 holds the OCaml domain lock. (when blocking = false). 
Where does that happen? I don't see it.
Either way, wherever it happens: that particular call site should be
modified to call event_callback_wrapper_locked() directly, IMO.
event_callback_wrapper() does nothing other than bracket
event_callback_wrapper_locked() with leave+enter, so if another call
site needs everything that event_callback_wrapper() does *except*
leave+enter (because it already holds the OCaml domain lock), then that
site should just directly call event_callback_wrapper_locked(). For this
purpose, event_callback_wrapper_locked() may have to be made a public
function (but again I don't know where that new call is supposed to come
from).
Laszlo
 
 My Patch is just a (confusing) workaround to fix this issue at runtime.
 
 An better solution might be using different callbacks at compile time:
 
 event_callback_wrapper_locked (when called from a non-blocking function)
 event_callback_wrapper (when called from a blocking function).
 
 This problem also occurs only with OCaml 5
 
 Minimal invalid OCAML/C Code:
 
 ============================== stubs.c ===============================
 #include <caml/alloc.h>
 #include <caml/custom.h>
 #include <caml/memory.h>
 #include <caml/mlvalues.h>
 
 void dummy_leave_blocking() {
   CAMLparam0();
   /* we did not call  caml_enter_blocking_section_before in C */
   caml_leave_blocking_section();
   caml_enter_blocking_section();
   CAMLreturn0;
 }
 ============================== main.ml ===============================
 external dummy_leave_blocking : unit -> unit = "dummy_leave_blocking"
 let () =
   Printf.printf "Hello from OCaml %s\n%!" Sys.ocaml_version;
   dummy_leave_blocking();
   Printf.printf "Bye from OCaml %s\n%!" Sys.ocaml_version;
 ======================================================================
 
 
 Using Ocaml 5 results in crash:
 
 # dune exec --  bin/main.exe
 Hello from OCaml 5.0.0
 Fatal error: Fatal error during lock: Resource deadlock avoided
 
 Aborted (core dumped)
 
 VS OCaml 4:
 # dune exec --  bin/main.exe
 Hello from OCaml 4.11.1
 Bye from OCaml 4.11.1
 
 Jürgen
 
 
 
 
 _______________________________________________
 Libguestfs mailing list
 Libguestfs(a)redhat.com
 
https://listman.redhat.com/mailman/listinfo/libguestfs