[PATCH common 0/4] mltools: Introduce priority for ordering actions

[PATCH virt-v2v 0/2] -o rhv: Wait...

[PATCH virt-v2v 0/3] -o rhv: Try...

Richard W.M. Jones

Thursday, 14 July 2022 Thu, 14 Jul '22

7:36 a.m.

https://bugzilla.redhat.com/show_bug.cgi?id=1953286#c26 This set of changes enhances the common/mltools On_exit module to: - allow actions to be ordered using a priority field - allow waiting for killed subprocesses to exit (with some caveats) Related set of changes to virt-v2v will follow. This will require some changes to libguestfs and guestfs-tools projects which I have not done yet because I want to get feedback on the approach first. Rich.

Show replies by date

Richard W.M. Jones

Thursday, 14 July Thu, 14 Jul

7:36 a.m.

New subject: [PATCH common 1/4] mltools: Rename On_exit.rmdir to On_exit.rm_rf

Make it clearer what this function does and that it's potentially dangerous. The functionality itself is unchanged. --- mltools/on_exit.ml | 2 +- mltools/on_exit.mli | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml index 53ccb68..9cdc496 100644 --- a/mltools/on_exit.ml +++ b/mltools/on_exit.ml @@ -102,7 +102,7 @@ let unlink filename = register (); List.push_front filename files -let rmdir dir = +let rm_rf dir = register (); List.push_front dir rmdirs diff --git a/mltools/on_exit.mli b/mltools/on_exit.mli index a02e3db..9bcf104 100644 --- a/mltools/on_exit.mli +++ b/mltools/on_exit.mli @@ -47,7 +47,7 @@ val f : (unit -> unit) -> unit val unlink : string -> unit (** Unlink a single temporary file on exit. *) -val rmdir : string -> unit +val rm_rf : string -> unit (** Recursively remove a temporary directory on exit (using [rm -rf]). *) val kill : ?signal:int -> int -> unit -- 2.37.0.rc2

Laszlo Ersek

Friday, 15 July Fri, 15 Jul

1:36 a.m.

New subject: [PATCH common 1/4] mltools: Rename On_exit.rmdir to On_exit.rm_rf

On 07/14/22 14:36, Richard W.M. Jones wrote:

...

Reviewed-by: Laszlo Ersek <lersek(a)redhat.com> [ Potential small improvement (separate, independent patch; feel free to push together with the rest, if you think it makes sense -- I didn't test it though, of course): diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml index 53ccb68ab0ce..a9e7f3201eba 100644 --- a/mltools/on_exit.ml +++ b/mltools/on_exit.ml @@ -52,7 +52,7 @@ let do_actions () = List.iter (do_action (fun file -> Unix.unlink file)) !files; List.iter (do_action ( fun dir -> - let cmd = sprintf "rm -rf %s" (Filename.quote dir) in + let cmd = sprintf "rm -rf -- %s" (Filename.quote dir) in ignore (Tools_utils.shell_command cmd) ) ) !rmdirs; This would be to pacify my inner pedant. :) I don't expect "dir" to start with a hyphen, but this is supposed to be a generic function. ] Thanks! Laszlo

Richard W.M. Jones

2:48 a.m.

New subject: [PATCH common 1/4] mltools: Rename On_exit.rmdir to On_exit.rm_rf

On Fri, Jul 15, 2022 at 08:36:19AM +0200, Laszlo Ersek wrote:

...

On 07/14/22 14:36, Richard W.M. Jones wrote: > Make it clearer what this function does and that it's potentially > dangerous. The functionality itself is unchanged. > --- > mltools/on_exit.ml | 2 +- > mltools/on_exit.mli | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml > index 53ccb68..9cdc496 100644 > --- a/mltools/on_exit.ml > +++ b/mltools/on_exit.ml > @@ -102,7 +102,7 @@ let unlink filename = > register (); > List.push_front filename files > > -let rmdir dir = > +let rm_rf dir = > register (); > List.push_front dir rmdirs > > diff --git a/mltools/on_exit.mli b/mltools/on_exit.mli > index a02e3db..9bcf104 100644 > --- a/mltools/on_exit.mli > +++ b/mltools/on_exit.mli > @@ -47,7 +47,7 @@ val f : (unit -> unit) -> unit > val unlink : string -> unit > (** Unlink a single temporary file on exit. *) > > -val rmdir : string -> unit > +val rm_rf : string -> unit > (** Recursively remove a temporary directory on exit (using [rm -rf]). *) > > val kill : ?signal:int -> int -> unit > Reviewed-by: Laszlo Ersek <lersek(a)redhat.com> [ Potential small improvement (separate, independent patch; feel free to push together with the rest, if you think it makes sense -- I didn't test it though, of course): diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml index 53ccb68ab0ce..a9e7f3201eba 100644 --- a/mltools/on_exit.ml +++ b/mltools/on_exit.ml @@ -52,7 +52,7 @@ let do_actions () = List.iter (do_action (fun file -> Unix.unlink file)) !files; List.iter (do_action ( fun dir -> - let cmd = sprintf "rm -rf %s" (Filename.quote dir) in + let cmd = sprintf "rm -rf -- %s" (Filename.quote dir) in ignore (Tools_utils.shell_command cmd) ) ) !rmdirs; This would be to pacify my inner pedant. :) I don't expect "dir" to start with a hyphen, but this is supposed to be a generic function.

It's sensible just in case, thanks. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW

Richard W.M. Jones

3:09 a.m.

New subject: [PATCH common 1/4] mltools: Rename On_exit.rmdir to On_exit.rm_rf

On Fri, Jul 15, 2022 at 08:36:19AM +0200, Laszlo Ersek wrote:

...

This patch and your suggested enhancement are: https://github.com/libguestfs/libguestfs-common/commit/f92b8b2b65dfdb930b... https://github.com/libguestfs/libguestfs-common/commit/fd964c1ba94d4d72b2... I made this change to guestfs-tools which is a simple renaming: https://github.com/rwmjones/guestfs-tools/commit/f5baf83e464c276d3dae6f8e... And this is the virt-v2v commit: https://github.com/libguestfs/virt-v2v/commit/2eb6441264deb0411d36dabaf8f... No change seems to be necessary for libguestfs. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html

Richard W.M. Jones

Thursday, 14 July Thu, 14 Jul

7:36 a.m.

New subject: [PATCH common 2/4] mltools: Reimplement On_exit to use a list of actions

Previously we used separate lists of files, dirs, pids, etc. This makes it harder to introduce new features to reorder actions. Reimplement the module so we use a simple list of actions, where each action can have type File, Rm_rf, Kill, etc. Iterate through this list on exit to execute the actions. The actions will run in a different order from before, but we didn't guarantee the ordering before. Apart from that the functionality is unchanged. --- mltools/on_exit.ml | 48 ++++++++++++++++++++++------------------------ 1 file changed, 23 insertions(+), 25 deletions(-) diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml index 9cdc496..4fa2c3b 100644 --- a/mltools/on_exit.ml +++ b/mltools/on_exit.ml @@ -23,23 +23,29 @@ open Common_gettext.Gettext open Unix open Printf -(* List of files to unlink. *) -let files = ref [] +type action = + | Unlink of string (* filename *) + | Rm_rf of string (* directory *) + | Kill of int * int (* signal, pid *) + | Fn of (unit -> unit) (* generic function *) -(* List of directories to remove. *) -let rmdirs = ref [] - -(* List of PIDs to kill. *) -let kills = ref [] - -(* List of functions to call. *) -let fns = ref [] +(* List of actions. *) +let actions = ref [] (* Perform a single exit action, printing any exception but * otherwise ignoring failures. *) -let do_action f arg = - try f arg with exn -> debug "%s" (Printexc.to_string exn) +let do_action action = + try + match action with + | Unlink file -> Unix.unlink file + | Rm_rf dir -> + let cmd = sprintf "rm -rf %s" (Filename.quote dir) in + ignore (Tools_utils.shell_command cmd) + | Kill (signal, pid) -> + kill pid signal + | Fn f -> f () + with exn -> debug "%s" (Printexc.to_string exn) (* Make sure the actions are performed only once. *) let done_actions = ref false @@ -47,15 +53,7 @@ let done_actions = ref false (* Perform the exit actions. *) let do_actions () = if not !done_actions then ( - List.iter (do_action (fun f -> f ())) !fns; - List.iter (do_action (fun (signal, pid) -> kill pid signal)) !kills; - List.iter (do_action (fun file -> Unix.unlink file)) !files; - List.iter (do_action ( - fun dir -> - let cmd = sprintf "rm -rf %s" (Filename.quote dir) in - ignore (Tools_utils.shell_command cmd) - ) - ) !rmdirs; + List.iter do_action !actions ); done_actions := true @@ -96,16 +94,16 @@ let register () = let f fn = register (); - List.push_front fn fns + List.push_front (Fn fn) actions let unlink filename = register (); - List.push_front filename files + List.push_front (Unlink filename) actions let rm_rf dir = register (); - List.push_front dir rmdirs + List.push_front (Rm_rf dir) actions let kill ?(signal = Sys.sigterm) pid = register (); - List.push_front (signal, pid) kills + List.push_front (Kill (signal, pid)) actions -- 2.37.0.rc2

Laszlo Ersek

Friday, 15 July Fri, 15 Jul

2:12 a.m.

New subject: [PATCH common 2/4] mltools: Reimplement On_exit to use a list of actions

On 07/14/22 14:36, Richard W.M. Jones wrote:

...

(1) feel free to sneak in the "--" option/operand separator here, rather than in a separate patch :) (2) Shouldn't we use two spaces for indentation here, relative to "R"?

...

+ ignore (Tools_utils.shell_command cmd) + | Kill (signal, pid) -> + kill pid signal + | Fn f -> f () + with exn -> debug "%s" (Printexc.to_string exn) (* Make sure the actions are performed only once. *) let done_actions = ref false @@ -47,15 +53,7 @@ let done_actions = ref false (* Perform the exit actions. *) let do_actions () = if not !done_actions then ( - List.iter (do_action (fun f -> f ())) !fns; - List.iter (do_action (fun (signal, pid) -> kill pid signal)) !kills; - List.iter (do_action (fun file -> Unix.unlink file)) !files; - List.iter (do_action ( - fun dir -> - let cmd = sprintf "rm -rf %s" (Filename.quote dir) in - ignore (Tools_utils.shell_command cmd) - ) - ) !rmdirs; + List.iter do_action !actions ); done_actions := true @@ -96,16 +94,16 @@ let register () = let f fn = register (); - List.push_front fn fns + List.push_front (Fn fn) actions let unlink filename = register (); - List.push_front filename files + List.push_front (Unlink filename) actions let rm_rf dir = register (); - List.push_front dir rmdirs + List.push_front (Rm_rf dir) actions let kill ?(signal = Sys.sigterm) pid = register (); - List.push_front (signal, pid) kills + List.push_front (Kill (signal, pid)) actions

For some reason I feel like this patch is a good demonstration of OCaml features :) Reviewed-by: Laszlo Ersek <lersek(a)redhat.com>

Richard W.M. Jones

3:12 a.m.

New subject: [PATCH common 2/4] mltools: Reimplement On_exit to use a list of actions

On Fri, Jul 15, 2022 at 09:12:24AM +0200, Laszlo Ersek wrote:

...

> +let do_action action = > + try > + match action with > + | Unlink file -> Unix.unlink file > + | Rm_rf dir -> > + let cmd = sprintf "rm -rf %s" (Filename.quote dir) in

...

(2) Shouldn't we use two spaces for indentation here, relative to "R"?

Possibly, but this is what emacs tuareg-mode gives me. I guess we need to get more serious about OCaml formatting at some point which may require fixing tuareg-mode too since it's quite erratic sometimes. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com nbdkit - Flexible, fast NBD server with plugins https://gitlab.com/nbdkit/nbdkit

Laszlo Ersek

3:25 a.m.

New subject: [PATCH common 2/4] mltools: Reimplement On_exit to use a list of actions

On 07/15/22 10:12, Richard W.M. Jones wrote:

...

On Fri, Jul 15, 2022 at 09:12:24AM +0200, Laszlo Ersek wrote: >> +let do_action action = >> + try >> + match action with >> + | Unlink file -> Unix.unlink file >> + | Rm_rf dir -> >> + let cmd = sprintf "rm -rf %s" (Filename.quote dir) in ... > > (2) Shouldn't we use two spaces for indentation here, relative to "R"? Possibly, but this is what emacs tuareg-mode gives me. I guess we need to get more serious about OCaml formatting at some point which may require fixing tuareg-mode too since it's quite erratic sometimes.

Right, I'm happy to adopt any particular guidelines; currently I'm sometimes just disoriented and looking for ideas how to format a particular construct I want to do. Especially because functions are first class citizens and can be passed around as arguments, formatting can become tricky. And "turn it into a named function" is not the right answer; it takes away quite a but of OCaml's power -- we *want* to be able to open-code function arguments to List.iter, List.map, List.filter, and so on! Laszlo

Richard W.M. Jones

Thursday, 14 July Thu, 14 Jul

7:36 a.m.

New subject: [PATCH common 3/4] mltools: Introduce priority for ordering actions in On_exit

Introduce a new, optional ?prio parameter which can be used to control the order that actions run on exit. By default actions have priority 1000. Higher numbered actions run first. Lower numbered actions run last. So to have an action run at the very end before exit you might use ~prio:0 Note that even with this change, some actions (eg kill) are still asynchronous. --- mltools/on_exit.ml | 24 +++++++++++++----------- mltools/on_exit.mli | 18 +++++++++++++----- 2 files changed, 26 insertions(+), 16 deletions(-) diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml index 4fa2c3b..e8353df 100644 --- a/mltools/on_exit.ml +++ b/mltools/on_exit.ml @@ -29,7 +29,7 @@ type action = | Kill of int * int (* signal, pid *) | Fn of (unit -> unit) (* generic function *) -(* List of actions. *) +(* List of (priority, action). *) let actions = ref [] (* Perform a single exit action, printing any exception but @@ -50,10 +50,12 @@ let do_action action = (* Make sure the actions are performed only once. *) let done_actions = ref false -(* Perform the exit actions. *) +(* Perform the exit actions in reverse priority order (highest first). *) let do_actions () = if not !done_actions then ( - List.iter do_action !actions + let actions = List.sort (fun (a, _) (b, _) -> compare b a) !actions in + let actions = List.map snd actions in + List.iter do_action actions ); done_actions := true @@ -92,18 +94,18 @@ let register () = ); registered := true -let f fn = +let f ?(prio = 1000) fn = register (); - List.push_front (Fn fn) actions + List.push_front (prio, Fn fn) actions -let unlink filename = +let unlink ?(prio = 1000) filename = register (); - List.push_front (Unlink filename) actions + List.push_front (prio, Unlink filename) actions -let rm_rf dir = +let rm_rf ?(prio = 1000) dir = register (); - List.push_front (Rm_rf dir) actions + List.push_front (prio, Rm_rf dir) actions -let kill ?(signal = Sys.sigterm) pid = +let kill ?(prio = 1000) ?(signal = Sys.sigterm) pid = register (); - List.push_front (Kill (signal, pid)) actions + List.push_front (prio, Kill (signal, pid)) actions diff --git a/mltools/on_exit.mli b/mltools/on_exit.mli index 9bcf104..910783e 100644 --- a/mltools/on_exit.mli +++ b/mltools/on_exit.mli @@ -28,6 +28,12 @@ killing another process, so we provide simple wrappers for those common actions here. + Actions can be ordered by setting the optional [?prio] + parameter. By default all actions have priority 1000. + Higher numbered actions run first. Lower numbered + actions run last. So to have an action run at the + very end before exit you might use [~prio:0] + Note this module registers signal handlers for SIGINT, SIGQUIT, SIGTERM and SIGHUP. This means that any program that links with mltools.cmxa @@ -39,18 +45,20 @@ Your cleanup action might no longer run unless the program calls {!Stdlib.exit}. *) -val f : (unit -> unit) -> unit +val f : ?prio:int -> (unit -> unit) -> unit (** Register a function [f] which runs when the program exits. Similar to [Stdlib.at_exit] but also runs if the program is - killed with a signal that we can catch. *) + killed with a signal that we can catch. -val unlink : string -> unit + [?prio] is the priority, default 1000. See the description above. *) + +val unlink : ?prio:int -> string -> unit (** Unlink a single temporary file on exit. *) -val rm_rf : string -> unit +val rm_rf : ?prio:int -> string -> unit (** Recursively remove a temporary directory on exit (using [rm -rf]). *) -val kill : ?signal:int -> int -> unit +val kill : ?prio:int -> ?signal:int -> int -> unit (** Kill [PID] on exit. The signal sent defaults to [Sys.sigterm]. Use this with care since you can end up unintentionally killing -- 2.37.0.rc2

Laszlo Ersek

Friday, 15 July Fri, 15 Jul

2:17 a.m.

New subject: [PATCH common 3/4] mltools: Introduce priority for ordering actions in On_exit

On 07/14/22 14:36, Richard W.M. Jones wrote:

...

I think this patch is good, but I'm surprised that ~prio:0 is for placing stuff at the *end* of the list, where 1000 is the default. This ordering differs from both the firstboot script priorities we did recently (lower prio number -> runs earlier) and also from the operation ordering in virt-sysprep (default is 0, 99 puts stuff at the end). Swapping around the arguments to "compare" wouldn't be difficult, the more laborious update would be for the documentation (commit message, comments). Up to you... Reviewed-by: Laszlo Ersek <lersek(a)redhat.com>

Richard W.M. Jones

3:13 a.m.

New subject: [PATCH common 3/4] mltools: Introduce priority for ordering actions in On_exit

On Fri, Jul 15, 2022 at 09:17:26AM +0200, Laszlo Ersek wrote:

...

On 07/14/22 14:36, Richard W.M. Jones wrote: > Introduce a new, optional ?prio parameter which can be used to control > the order that actions run on exit. By default actions have priority > 1000. Higher numbered actions run first. Lower numbered actions run > last. So to have an action run at the very end before exit you might > use ~prio:0 > > Note that even with this change, some actions (eg kill) are still > asynchronous. > --- > mltools/on_exit.ml | 24 +++++++++++++----------- > mltools/on_exit.mli | 18 +++++++++++++----- > 2 files changed, 26 insertions(+), 16 deletions(-) > > diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml > index 4fa2c3b..e8353df 100644 > --- a/mltools/on_exit.ml > +++ b/mltools/on_exit.ml > @@ -29,7 +29,7 @@ type action = > | Kill of int * int (* signal, pid *) > | Fn of (unit -> unit) (* generic function *) > > -(* List of actions. *) > +(* List of (priority, action). *) > let actions = ref [] > > (* Perform a single exit action, printing any exception but > @@ -50,10 +50,12 @@ let do_action action = > (* Make sure the actions are performed only once. *) > let done_actions = ref false > > -(* Perform the exit actions. *) > +(* Perform the exit actions in reverse priority order (highest first). *) > let do_actions () = > if not !done_actions then ( > - List.iter do_action !actions > + let actions = List.sort (fun (a, _) (b, _) -> compare b a) !actions in > + let actions = List.map snd actions in > + List.iter do_action actions > ); > done_actions := true > > @@ -92,18 +94,18 @@ let register () = > ); > registered := true > > -let f fn = > +let f ?(prio = 1000) fn = > register (); > - List.push_front (Fn fn) actions > + List.push_front (prio, Fn fn) actions > > -let unlink filename = > +let unlink ?(prio = 1000) filename = > register (); > - List.push_front (Unlink filename) actions > + List.push_front (prio, Unlink filename) actions > > -let rm_rf dir = > +let rm_rf ?(prio = 1000) dir = > register (); > - List.push_front (Rm_rf dir) actions > + List.push_front (prio, Rm_rf dir) actions > > -let kill ?(signal = Sys.sigterm) pid = > +let kill ?(prio = 1000) ?(signal = Sys.sigterm) pid = > register (); > - List.push_front (Kill (signal, pid)) actions > + List.push_front (prio, Kill (signal, pid)) actions > diff --git a/mltools/on_exit.mli b/mltools/on_exit.mli > index 9bcf104..910783e 100644 > --- a/mltools/on_exit.mli > +++ b/mltools/on_exit.mli > @@ -28,6 +28,12 @@ > killing another process, so we provide simple > wrappers for those common actions here. > > + Actions can be ordered by setting the optional [?prio] > + parameter. By default all actions have priority 1000. > + Higher numbered actions run first. Lower numbered > + actions run last. So to have an action run at the > + very end before exit you might use [~prio:0] > + > Note this module registers signal handlers for > SIGINT, SIGQUIT, SIGTERM and SIGHUP. This means > that any program that links with mltools.cmxa > @@ -39,18 +45,20 @@ > Your cleanup action might no longer run unless the > program calls {!Stdlib.exit}. *) > > -val f : (unit -> unit) -> unit > +val f : ?prio:int -> (unit -> unit) -> unit > (** Register a function [f] which runs when the program exits. > Similar to [Stdlib.at_exit] but also runs if the program is > - killed with a signal that we can catch. *) > + killed with a signal that we can catch. > > -val unlink : string -> unit > + [?prio] is the priority, default 1000. See the description above. *) > + > +val unlink : ?prio:int -> string -> unit > (** Unlink a single temporary file on exit. *) > > -val rm_rf : string -> unit > +val rm_rf : ?prio:int -> string -> unit > (** Recursively remove a temporary directory on exit (using [rm -rf]). *) > > -val kill : ?signal:int -> int -> unit > +val kill : ?prio:int -> ?signal:int -> int -> unit > (** Kill [PID] on exit. The signal sent defaults to [Sys.sigterm]. > > Use this with care since you can end up unintentionally killing > I think this patch is good, but I'm surprised that ~prio:0 is for placing stuff at the *end* of the list, where 1000 is the default. This ordering differs from both the firstboot script priorities we did recently (lower prio number -> runs earlier) and also from the operation ordering in virt-sysprep (default is 0, 99 puts stuff at the end). Swapping around the arguments to "compare" wouldn't be difficult, the more laborious update would be for the documentation (commit message, comments). Up to you...

I thought (without checking of course) that this way was consistent with firstboot. I'll change it to use the same way as firstboot.

...

Reviewed-by: Laszlo Ersek <lersek(a)redhat.com>

Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com nbdkit - Flexible, fast NBD server with plugins https://gitlab.com/nbdkit/nbdkit

Richard W.M. Jones

4:26 a.m.

New subject: [PATCH common 3/4] mltools: Introduce priority for ordering actions in On_exit

On Fri, Jul 15, 2022 at 09:17:26AM +0200, Laszlo Ersek wrote:

...

So, here's the change to libguestfs-common: https://github.com/libguestfs/libguestfs-common/commit/290a2cecbe679660de... https://github.com/libguestfs/libguestfs-common/commit/1000604ff391e49f0b... Priorities are 0..9999 (limit is not actually enforced) with higher numbers running later and 5000 being the default. The change to virt-v2v: https://github.com/libguestfs/virt-v2v/commit/e96357fc3b26aaf96eaa21afa36... I don't think there are other places where we umount host devices, but I'm sure there are other places that could use On_exit priorities (in guestfs-tools as well). Let's keep an eye out for those. Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html

Laszlo Ersek

4:54 a.m.

New subject: [PATCH common 3/4] mltools: Introduce priority for ordering actions in On_exit

On 07/15/22 11:26, Richard W.M. Jones wrote:

...

On Fri, Jul 15, 2022 at 09:17:26AM +0200, Laszlo Ersek wrote: > I think this patch is good, but I'm surprised that ~prio:0 is for > placing stuff at the *end* of the list, where 1000 is the default. This > ordering differs from both the firstboot script priorities we did > recently (lower prio number -> runs earlier) and also from the operation > ordering in virt-sysprep (default is 0, 99 puts stuff at the end). > > Swapping around the arguments to "compare" wouldn't be difficult, the > more laborious update would be for the documentation (commit message, > comments). Up to you... > > Reviewed-by: Laszlo Ersek <lersek(a)redhat.com> So, here's the change to libguestfs-common: https://github.com/libguestfs/libguestfs-common/commit/290a2cecbe679660de... https://github.com/libguestfs/libguestfs-common/commit/1000604ff391e49f0b... Priorities are 0..9999 (limit is not actually enforced) with higher numbers running later and 5000 being the default. The change to virt-v2v: https://github.com/libguestfs/virt-v2v/commit/e96357fc3b26aaf96eaa21afa36... I don't think there are other places where we umount host devices, but I'm sure there are other places that could use On_exit priorities (in guestfs-tools as well). Let's keep an eye out for those.

Thanks for these! Laszlo

Richard W.M. Jones

Thursday, 14 July Thu, 14 Jul

7:36 a.m.

New subject: [PATCH common 4/4] mltools: Allow waiting for killed PIDs

Add a new, optional [?wait] parameter to On_exit.kill, allowing programs to wait for a number of seconds for the subprocess to exit. --- mltools/on_exit.ml | 30 +++++++++++++++++++++--------- mltools/on_exit.mli | 14 ++++++++++++-- 2 files changed, 33 insertions(+), 11 deletions(-) diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml index e8353df..cdaa83c 100644 --- a/mltools/on_exit.ml +++ b/mltools/on_exit.ml @@ -24,10 +24,10 @@ open Unix open Printf type action = - | Unlink of string (* filename *) - | Rm_rf of string (* directory *) - | Kill of int * int (* signal, pid *) - | Fn of (unit -> unit) (* generic function *) + | Unlink of string (* filename *) + | Rm_rf of string (* directory *) + | Kill of int * int * int (* wait, signal, pid *) + | Fn of (unit -> unit) (* generic function *) (* List of (priority, action). *) let actions = ref [] @@ -35,18 +35,30 @@ let actions = ref [] (* Perform a single exit action, printing any exception but * otherwise ignoring failures. *) -let do_action action = +let rec do_action action = try match action with | Unlink file -> Unix.unlink file | Rm_rf dir -> let cmd = sprintf "rm -rf %s" (Filename.quote dir) in ignore (Tools_utils.shell_command cmd) - | Kill (signal, pid) -> - kill pid signal + | Kill (wait, signal, pid) -> + do_kill ~wait ~signal ~pid | Fn f -> f () with exn -> debug "%s" (Printexc.to_string exn) +and do_kill ~wait ~signal ~pid = + kill pid signal; + + let rec loop i = + if i > 0 then ( + let pid', _ = waitpid [ WNOHANG ] pid in + if pid' = 0 then + loop (i-1) + ) + in + loop wait + (* Make sure the actions are performed only once. *) let done_actions = ref false @@ -106,6 +118,6 @@ let rm_rf ?(prio = 1000) dir = register (); List.push_front (prio, Rm_rf dir) actions -let kill ?(prio = 1000) ?(signal = Sys.sigterm) pid = +let kill ?(prio = 1000) ?(wait = 0) ?(signal = Sys.sigterm) pid = register (); - List.push_front (prio, Kill (signal, pid)) actions + List.push_front (prio, Kill (wait, signal, pid)) actions diff --git a/mltools/on_exit.mli b/mltools/on_exit.mli index 910783e..dd35101 100644 --- a/mltools/on_exit.mli +++ b/mltools/on_exit.mli @@ -58,12 +58,22 @@ val unlink : ?prio:int -> string -> unit val rm_rf : ?prio:int -> string -> unit (** Recursively remove a temporary directory on exit (using [rm -rf]). *) -val kill : ?prio:int -> ?signal:int -> int -> unit +val kill : ?prio:int -> ?wait:int -> ?signal:int -> int -> unit (** Kill [PID] on exit. The signal sent defaults to [Sys.sigterm]. Use this with care since you can end up unintentionally killing another process if [PID] goes away or doesn't exist before the - program exits. *) + program exits. + + The optional [?wait] flag attempts to wait for a specified + number of seconds for the subprocess to go away. For example + using [~wait:5] will wait for up to 5 seconds. Since this + runs when virt-v2v is exiting, it is best to keep waiting times + as short as possible. Also there is no way to report errors + in the subprocess. If reliable cleanup of a subprocess is + required then this is not the correct place to do it. + + [?wait] defaults to [0] which means we do not try to wait. *) val register : unit -> unit (** Force this module to register its at_exit function and signal -- 2.37.0.rc2

Laszlo Ersek

Friday, 15 July Fri, 15 Jul

2:30 a.m.

New subject: [PATCH common 4/4] mltools: Allow waiting for killed PIDs

On 07/14/22 14:36, Richard W.M. Jones wrote:

...

I'd slightly prefer (a) do_kill to be introduced either before do_action, or (b) for do_kill to be defined inside do_action, to using "let rec" here. I think I understand what "rec" does to scoping, but still, we don't have actual recursion here. (I've noticed this pattern in many places in the v2v projects, and it always confuses me -- it doesn't bring too much convenience IMO, so I'd rather restrict it to actual recursion.)

...

try match action with | Unlink file -> Unix.unlink file | Rm_rf dir -> let cmd = sprintf "rm -rf %s" (Filename.quote dir) in ignore (Tools_utils.shell_command cmd) - | Kill (signal, pid) -> - kill pid signal + | Kill (wait, signal, pid) -> + do_kill ~wait ~signal ~pid | Fn f -> f () with exn -> debug "%s" (Printexc.to_string exn) +and do_kill ~wait ~signal ~pid = + kill pid signal; + + let rec loop i = + if i > 0 then ( + let pid', _ = waitpid [ WNOHANG ] pid in + if pid' = 0 then + loop (i-1)

Missing: "sleep 1;" before the tail-recursive call, I think.

...

+ ) + in + loop wait + (* Make sure the actions are performed only once. *) let done_actions = ref false @@ -106,6 +118,6 @@ let rm_rf ?(prio = 1000) dir = register (); List.push_front (prio, Rm_rf dir) actions -let kill ?(prio = 1000) ?(signal = Sys.sigterm) pid = +let kill ?(prio = 1000) ?(wait = 0) ?(signal = Sys.sigterm) pid = register (); - List.push_front (prio, Kill (signal, pid)) actions + List.push_front (prio, Kill (wait, signal, pid)) actions diff --git a/mltools/on_exit.mli b/mltools/on_exit.mli index 910783e..dd35101 100644 --- a/mltools/on_exit.mli +++ b/mltools/on_exit.mli @@ -58,12 +58,22 @@ val unlink : ?prio:int -> string -> unit val rm_rf : ?prio:int -> string -> unit (** Recursively remove a temporary directory on exit (using [rm -rf]). *) -val kill : ?prio:int -> ?signal:int -> int -> unit +val kill : ?prio:int -> ?wait:int -> ?signal:int -> int -> unit (** Kill [PID] on exit. The signal sent defaults to [Sys.sigterm]. Use this with care since you can end up unintentionally killing another process if [PID] goes away or doesn't exist before the - program exits. *) + program exits. + + The optional [?wait] flag attempts to wait for a specified + number of seconds for the subprocess to go away. For example + using [~wait:5] will wait for up to 5 seconds. Since this + runs when virt-v2v is exiting, it is best to keep waiting times + as short as possible. Also there is no way to report errors + in the subprocess. If reliable cleanup of a subprocess is + required then this is not the correct place to do it. + + [?wait] defaults to [0] which means we do not try to wait. *) val register : unit -> unit (** Force this module to register its at_exit function and signal

(please consider formatting *.mli before *.ml) I believe I take the opposite position on this; I'd rather wait forever. No subprocess is expected to hang, and we should leave no subprocess behind. If a subprocess hangs, the parent process should (apparently) hang forever too, and let users report bugs. That would also eliminate the question of *how* to wait for N seconds; we'd just drop WNOHANG from waitpid. But this is just my opinion; food for thought. :) Thanks, Laszlo

Richard W.M. Jones

4:31 a.m.

New subject: [PATCH common 4/4] mltools: Allow waiting for killed PIDs

On Fri, Jul 15, 2022 at 09:30:38AM +0200, Laszlo Ersek wrote:

...

On 07/14/22 14:36, Richard W.M. Jones wrote: > Add a new, optional [?wait] parameter to On_exit.kill, allowing > programs to wait for a number of seconds for the subprocess to exit. > --- > mltools/on_exit.ml | 30 +++++++++++++++++++++--------- > mltools/on_exit.mli | 14 ++++++++++++-- > 2 files changed, 33 insertions(+), 11 deletions(-) > > diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml > index e8353df..cdaa83c 100644 > --- a/mltools/on_exit.ml > +++ b/mltools/on_exit.ml > @@ -24,10 +24,10 @@ open Unix > open Printf > > type action = > - | Unlink of string (* filename *) > - | Rm_rf of string (* directory *) > - | Kill of int * int (* signal, pid *) > - | Fn of (unit -> unit) (* generic function *) > + | Unlink of string (* filename *) > + | Rm_rf of string (* directory *) > + | Kill of int * int * int (* wait, signal, pid *) > + | Fn of (unit -> unit) (* generic function *) > > (* List of (priority, action). *) > let actions = ref [] > @@ -35,18 +35,30 @@ let actions = ref [] > (* Perform a single exit action, printing any exception but > * otherwise ignoring failures. > *) > -let do_action action = > +let rec do_action action = I'd slightly prefer (a) do_kill to be introduced either before do_action, or (b) for do_kill to be defined inside do_action, to using "let rec" here. I think I understand what "rec" does to scoping, but still, we don't have actual recursion here. (I've noticed this pattern in many places in the v2v projects, and it always confuses me -- it doesn't bring too much convenience IMO, so I'd rather restrict it to actual recursion.)

It just allows you to write code top-down instead of bottom-up. As you can see I mix both styles when I fancy it :-/

...

> try > match action with > | Unlink file -> Unix.unlink file > | Rm_rf dir -> > let cmd = sprintf "rm -rf %s" (Filename.quote dir) in > ignore (Tools_utils.shell_command cmd) > - | Kill (signal, pid) -> > - kill pid signal > + | Kill (wait, signal, pid) -> > + do_kill ~wait ~signal ~pid > | Fn f -> f () > with exn -> debug "%s" (Printexc.to_string exn) > > +and do_kill ~wait ~signal ~pid = > + kill pid signal; > + > + let rec loop i = > + if i > 0 then ( > + let pid', _ = waitpid [ WNOHANG ] pid in > + if pid' = 0 then > + loop (i-1) Missing: "sleep 1;" before the tail-recursive call, I think.

Ugh, indeed.

...

> + ) > + in > + loop wait > + > (* Make sure the actions are performed only once. *) > let done_actions = ref false > > @@ -106,6 +118,6 @@ let rm_rf ?(prio = 1000) dir = > register (); > List.push_front (prio, Rm_rf dir) actions > > -let kill ?(prio = 1000) ?(signal = Sys.sigterm) pid = > +let kill ?(prio = 1000) ?(wait = 0) ?(signal = Sys.sigterm) pid = > register (); > - List.push_front (prio, Kill (signal, pid)) actions > + List.push_front (prio, Kill (wait, signal, pid)) actions > diff --git a/mltools/on_exit.mli b/mltools/on_exit.mli > index 910783e..dd35101 100644 > --- a/mltools/on_exit.mli > +++ b/mltools/on_exit.mli > @@ -58,12 +58,22 @@ val unlink : ?prio:int -> string -> unit > val rm_rf : ?prio:int -> string -> unit > (** Recursively remove a temporary directory on exit (using [rm -rf]). *) > > -val kill : ?prio:int -> ?signal:int -> int -> unit > +val kill : ?prio:int -> ?wait:int -> ?signal:int -> int -> unit > (** Kill [PID] on exit. The signal sent defaults to [Sys.sigterm]. > > Use this with care since you can end up unintentionally killing > another process if [PID] goes away or doesn't exist before the > - program exits. *) > + program exits. > + > + The optional [?wait] flag attempts to wait for a specified > + number of seconds for the subprocess to go away. For example > + using [~wait:5] will wait for up to 5 seconds. Since this > + runs when virt-v2v is exiting, it is best to keep waiting times > + as short as possible. Also there is no way to report errors > + in the subprocess. If reliable cleanup of a subprocess is > + required then this is not the correct place to do it. > + > + [?wait] defaults to [0] which means we do not try to wait. *) > > val register : unit -> unit > (** Force this module to register its at_exit function and signal > (please consider formatting *.mli before *.ml) I believe I take the opposite position on this; I'd rather wait forever. No subprocess is expected to hang, and we should leave no subprocess behind. If a subprocess hangs, the parent process should (apparently) hang forever too, and let users report bugs. That would also eliminate the question of *how* to wait for N seconds; we'd just drop WNOHANG from waitpid. But this is just my opinion; food for thought. :)

So another way to think about this is that if the process hasn't gone away after N seconds, something has gone wrong, so we should report an error (but also exit). How about that? I'm also not convinced that lazy umount is wrong in this situation. In general, and particularly in the livecd-creator case, it was wrong. But in this case we know that nbdcopy flushed the data successfully, so the data and metadata of the file must have been written to the NFS server. If the umount fails here it's caused by some other problem (eg. stuck nbdkit) but not one that should ever lead to data loss. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v

Laszlo Ersek

4:54 a.m.

New subject: [PATCH common 4/4] mltools: Allow waiting for killed PIDs

On 07/15/22 11:31, Richard W.M. Jones wrote:

...

On Fri, Jul 15, 2022 at 09:30:38AM +0200, Laszlo Ersek wrote: > On 07/14/22 14:36, Richard W.M. Jones wrote: >> Add a new, optional [?wait] parameter to On_exit.kill, allowing >> programs to wait for a number of seconds for the subprocess to exit. >> --- >> mltools/on_exit.ml | 30 +++++++++++++++++++++--------- >> mltools/on_exit.mli | 14 ++++++++++++-- >> 2 files changed, 33 insertions(+), 11 deletions(-) >> >> diff --git a/mltools/on_exit.ml b/mltools/on_exit.ml >> index e8353df..cdaa83c 100644 >> --- a/mltools/on_exit.ml >> +++ b/mltools/on_exit.ml >> @@ -24,10 +24,10 @@ open Unix >> open Printf >> >> type action = >> - | Unlink of string (* filename *) >> - | Rm_rf of string (* directory *) >> - | Kill of int * int (* signal, pid *) >> - | Fn of (unit -> unit) (* generic function *) >> + | Unlink of string (* filename *) >> + | Rm_rf of string (* directory *) >> + | Kill of int * int * int (* wait, signal, pid *) >> + | Fn of (unit -> unit) (* generic function *) >> >> (* List of (priority, action). *) >> let actions = ref [] >> @@ -35,18 +35,30 @@ let actions = ref [] >> (* Perform a single exit action, printing any exception but >> * otherwise ignoring failures. >> *) >> -let do_action action = >> +let rec do_action action = > > I'd slightly prefer (a) do_kill to be introduced either before > do_action, or (b) for do_kill to be defined inside do_action, to using > "let rec" here. I think I understand what "rec" does to scoping, but > still, we don't have actual recursion here. > > (I've noticed this pattern in many places in the v2v projects, and it > always confuses me -- it doesn't bring too much convenience IMO, so I'd > rather restrict it to actual recursion.) It just allows you to write code top-down instead of bottom-up. As you can see I mix both styles when I fancy it :-/ >> try >> match action with >> | Unlink file -> Unix.unlink file >> | Rm_rf dir -> >> let cmd = sprintf "rm -rf %s" (Filename.quote dir) in >> ignore (Tools_utils.shell_command cmd) >> - | Kill (signal, pid) -> >> - kill pid signal >> + | Kill (wait, signal, pid) -> >> + do_kill ~wait ~signal ~pid >> | Fn f -> f () >> with exn -> debug "%s" (Printexc.to_string exn) >> >> +and do_kill ~wait ~signal ~pid = >> + kill pid signal; >> + >> + let rec loop i = >> + if i > 0 then ( >> + let pid', _ = waitpid [ WNOHANG ] pid in >> + if pid' = 0 then >> + loop (i-1) > > Missing: "sleep 1;" before the tail-recursive call, I think. Ugh, indeed. >> + ) >> + in >> + loop wait >> + >> (* Make sure the actions are performed only once. *) >> let done_actions = ref false >> >> @@ -106,6 +118,6 @@ let rm_rf ?(prio = 1000) dir = >> register (); >> List.push_front (prio, Rm_rf dir) actions >> >> -let kill ?(prio = 1000) ?(signal = Sys.sigterm) pid = >> +let kill ?(prio = 1000) ?(wait = 0) ?(signal = Sys.sigterm) pid = >> register (); >> - List.push_front (prio, Kill (signal, pid)) actions >> + List.push_front (prio, Kill (wait, signal, pid)) actions >> diff --git a/mltools/on_exit.mli b/mltools/on_exit.mli >> index 910783e..dd35101 100644 >> --- a/mltools/on_exit.mli >> +++ b/mltools/on_exit.mli >> @@ -58,12 +58,22 @@ val unlink : ?prio:int -> string -> unit >> val rm_rf : ?prio:int -> string -> unit >> (** Recursively remove a temporary directory on exit (using [rm -rf]). *) >> >> -val kill : ?prio:int -> ?signal:int -> int -> unit >> +val kill : ?prio:int -> ?wait:int -> ?signal:int -> int -> unit >> (** Kill [PID] on exit. The signal sent defaults to [Sys.sigterm]. >> >> Use this with care since you can end up unintentionally killing >> another process if [PID] goes away or doesn't exist before the >> - program exits. *) >> + program exits. >> + >> + The optional [?wait] flag attempts to wait for a specified >> + number of seconds for the subprocess to go away. For example >> + using [~wait:5] will wait for up to 5 seconds. Since this >> + runs when virt-v2v is exiting, it is best to keep waiting times >> + as short as possible. Also there is no way to report errors >> + in the subprocess. If reliable cleanup of a subprocess is >> + required then this is not the correct place to do it. >> + >> + [?wait] defaults to [0] which means we do not try to wait. *) >> >> val register : unit -> unit >> (** Force this module to register its at_exit function and signal >> > > (please consider formatting *.mli before *.ml) > > I believe I take the opposite position on this; I'd rather wait forever. > No subprocess is expected to hang, and we should leave no subprocess > behind. If a subprocess hangs, the parent process should (apparently) > hang forever too, and let users report bugs. > > That would also eliminate the question of *how* to wait for N seconds; > we'd just drop WNOHANG from waitpid. > > But this is just my opinion; food for thought. :) So another way to think about this is that if the process hasn't gone away after N seconds, something has gone wrong, so we should report an error (but also exit). How about that?

Good point! That's actually the best for the user. Don't let them wonder forever, also don't pretend everything is fine when the child process doesn't exist in 5 seconds and just silently give up.

...

I'm also not convinced that lazy umount is wrong in this situation. In general, and particularly in the livecd-creator case, it was wrong. But in this case we know that nbdcopy flushed the data successfully, so the data and metadata of the file must have been written to the NFS server. If the umount fails here it's caused by some other problem (eg. stuck nbdkit) but not one that should ever lead to data loss.

I'm nervous about lazy unmount; it's too indiscriminate. We just stop waiting and assume nothing else is currently keeping a reference to the mount point. I think your idea about preserving the timeout, but failing hard when it actually elapses, is the most user-friendly approach. I guess that could mean an "else" branch in the "loop" function, where we should abort hard, or something like that (we can't exit cleanly, as we're already exiting and running cleanup handlers... but maybe we have a counter-measure for that already. I seem to recall "run these actions only once".) Really, anything but lazy unmount please. :) Thanks! Laszlo

1343

days inactive

1344

days old

guestfs@lists.libguestfs.org

Manage subscription

17 comments

2 participants

tags (0)

participants (2)

Laszlo Ersek
Richard W.M. Jones

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[PATCH common 0/4] mltools: Introduce priority for ordering actions