On Monday, 27 June 2016 15:42:46 CEST Richard W.M. Jones wrote:
On Fri, Jun 24, 2016 at 05:42:37PM +0200, Pino Toscano wrote:
> Add a new Getopt module to mllib, to parse command line arguments with
> handlers close to the ones used with Arg, but using getopt(3) (actually
> getopt_long_only) to do the real parsing. This allow us to provide
> options for OCaml tools with a syntax similar to the C tools, and use
> the additional features getopt offers and Arg does not.
>
> Do a single-step conversion of Common_utils and all the OCaml tools to
> the syntax of Getopt.
>
> As side-change due to the conversion, extra arguments for sysprep
> operation can have more keys for the same argument.
In general terms, it's a very good change which really improves the
tools.
I have a few fairly minor issues below. ACK if you can clean all of
those up.
[...]
> diff --git a/mllib/getopt-c.c b/mllib/getopt-c.c
> new file mode 100644
> index 0000000..d44448f
> --- /dev/null
> +++ b/mllib/getopt-c.c
> @@ -0,0 +1,398 @@
> +value
> +guestfs_int_mllib_getopt_parse (value argsv, value specsv, value anon_funv, value
usage_msgv)
I'm not convinced that this function is safe against OCaml GC compaction.
In particular there are problems such as:
...
> + for (j = 0; j < len; ++j) {
> + const char *key = String_val (Field (keysv, j));
key now points to a string on the OCaml heap, and then ...
> + if (newopts == NULL)
> + caml_raise_out_of_memory ();
> + longopts = newopts;
> + longopts[longopts_len].name = key;
the same pointer is copied to longopts, but ...
> + case 0: /* Unit of (unit -> unit) */
> + do_call1 (Field (actionv, 0), Val_unit);
At this point you're calling an OCaml function which is likely to
allocate, and could therefore call the GC, and could therefore compact
the heap, which would move that string around, and make your pointer
invalid. (You could try adding `Gc.compact ()' to one of these
callback functions -- I'm fairly sure at least some of the time you
could get a segfault, and if not, valgrind wouldn't be happy).
I see, it fails that way indeed. Fixed it by:
- copying long option strings on the heap (with the ugly cast in
cleanup_option_list, but almost unavoidable)
- making sure that operations that trigger allocations are executed
one-by-one (i.e. caml_copy_string out of direct function parameters,
etc)
> +external getopt_parse : string array -> (c_keys * spec *
doc) array -> ?anon_fun:anon_fun -> usage_msg -> unit =
"guestfs_int_mllib_getopt_parse"
> +
> +let parse_argv argv specs ?anon_fun usage_msg =
I feel this function could do some sanity checking on the inputs,
especially the keys (but see my comment below). If the sanity check
fails, it should either assert or failwith, but definitely not
continue.
Moved the sanity checks in the OCaml part.
> + let specs = specs @ [
> + (* Handled internally by getopt_parse. *)
> + [ "-h"; "-help"; "--help" ], Unit (fun () ->
()), s_"Display brief help";
> + ] in
> + let specs = List.map (
> + fun (keys, spec, doc) ->
> + (Array.of_list keys), spec, doc
You don't need parens here, since function application always binds tightest.
Removed, thanks (leftover of other code previously there).
> +type spec =
> + | Unit of (unit -> unit)
> + (* Simple option with no argument; call the function. *)
> + | Set of bool ref
> + (* Simple option with no argument; set the reference to true. *)
> + | Clear of bool ref
> + (* Simple option with no argument; set the reference to false. *)
Does getopt_long create the '--no-X' options automatically?
No, it doesn't.
> + | String of string * (string -> unit)
> + (* Option requiring an argument; the first element in the tuple
> + is the documentation string of the argument, and the second
> + is the function to call. *)
> + | Set_string of string * string ref
> + (* Option requiring an argument; the first element in the tuple
> + is the documentation string of the argument, and the second
> + is the reference to be set. *)
> + | Int of string * (int -> unit)
> + (* Option requiring an integer value as argument; the first
> + element in the tuple is the documentation string of the
> + argument, and the second is the function to call. *)
> + | Set_int of string * int ref
> + (* Option requiring an integer value as argument; the first
> + element in the tuple is the documentation string of the
> + argument, and the second is the reference to be set. *)
> +
> +type keys = string list
I had a vague idea that you might make this more type safe by
changing this type to:
type optstring = S of char (** short option *) | L of string (** --long *)
and keys = optstring list
It requires a bunch of extra changes through the code, but also avoids
needing to write horrible unsafe code like:
if String.is_prefix arg "-" && not (String.is_prefix arg
"--") then
You could also put optstring into a submodule so that just the L and S
definitions can be imported into client modules without needing to
import the whole of Getopt, so that makes the syntax quite brief:
open Getopt.Optstring
...
[ S'l'; L"list" ], Getopt.Unit list_mode, s_"List available
templates";
or if you prefer:
module O = Getopt.Optstring
...
[ O.S'l'; O.L"list" ], Getopt.Unit list_mode, s_"List
available templates";
I thought about this, and at the moment it feels to me a bit too
complicated. I might revise that in the future though.
> --- a/v2v/cmdline.ml
> +++ b/v2v/cmdline.ml
> @@ -164,55 +164,47 @@ let parse_cmdline () =
> and o_options =
> String.concat "|" (Modules_list.output_modules ()) in
>
> - let ditto = " -\"-" in
> let argspec = [
> - "-b", Arg.String add_bridge, "in:out " ^
s_"Map bridge 'in' to 'out'";
> - "--bridge", Arg.String add_bridge, "in:out " ^ ditto;
> - "--compressed", Arg.Set compressed, " " ^
s_"Compress output file";
> - "--dcpath", Arg.String (set_string_option_once "--dcpath"
dcpath),
> - "path " ^ s_"Override
dcPath (for vCenter)";
> - "--dcPath", Arg.String (set_string_option_once "--dcPath"
dcpath),
> - "path " ^ ditto;
> - "--debug-overlay",Arg.Set debug_overlays,
> - " " ^ s_"Save overlay files";
> - "--debug-overlays",Arg.Set debug_overlays,
> - ditto;
> - "-i", Arg.String set_input_mode, i_options ^ " " ^
s_"Set input mode (default: libvirt)";
> - "-ic", Arg.String (set_string_option_once "-ic"
input_conn),
> - "uri " ^ s_"Libvirt
URI";
> - "-if", Arg.String (set_string_option_once "-if"
input_format),
> - "format " ^ s_"Input
format (for -i disk)";
> - "--in-place", Arg.Set in_place, " " ^ s_"Only
tune the guest in the input VM";
> - "--machine-readable", Arg.Set machine_readable, " " ^
s_"Make output machine readable";
> - "-n", Arg.String add_network, "in:out " ^
s_"Map network 'in' to 'out'";
> - "--network", Arg.String add_network, "in:out " ^ ditto;
> - "--no-copy", Arg.Clear do_copy, " " ^ s_"Just
write the metadata";
> - "--no-trim", Arg.String no_trim_warning,
> - "-" ^ " " ^
s_"Ignored for backwards compatibility";
> - "-o", Arg.String set_output_mode, o_options ^ " " ^
s_"Set output mode (default: libvirt)";
> - "-oa", Arg.String set_output_alloc,
> - "sparse|preallocated " ^
s_"Set output allocation mode";
> - "-oc", Arg.String (set_string_option_once "-oc"
output_conn),
> - "uri " ^ s_"Libvirt
URI";
> - "-of", Arg.String (set_string_option_once "-of"
output_format),
> - "raw|qcow2 " ^ s_"Set
output format";
> - "-on", Arg.String (set_string_option_once "-on"
output_name),
> - "name " ^ s_"Rename
guest when converting";
> - "-os", Arg.String (set_string_option_once "-os"
output_storage),
> - "storage " ^ s_"Set
output storage location";
> - "--password-file", Arg.String (set_string_option_once
"--password-file" password_file),
> - "file " ^ s_"Use
password from file";
> - "--print-source", Arg.Set print_source, " " ^ s_"Print
source and stop";
> - "--qemu-boot", Arg.Set qemu_boot, " " ^ s_"Boot
in qemu (-o qemu only)";
> - "--root", Arg.String set_root_choice,"ask|... " ^
s_"How to choose root filesystem";
> - "--vdsm-image-uuid", Arg.String add_vdsm_image_uuid, "uuid
" ^ s_"Output image UUID(s)";
> - "--vdsm-vol-uuid", Arg.String add_vdsm_vol_uuid, "uuid " ^
s_"Output vol UUID(s)";
> - "--vdsm-vm-uuid", Arg.String (set_string_option_once
"--vdsm-vm-uuid" vdsm_vm_uuid),
> - "uuid " ^ s_"Output VM
UUID";
> - "--vdsm-ovf-output", Arg.String (set_string_option_once
"--vdsm-ovf-output" vdsm_ovf_output),
> - " " ^ s_"Output OVF
file";
> - "--vmtype", Arg.String vmtype_warning,
> - "- " ^ s_"Ignored for
backwards compatibility";
> + [ "-b"; "--bridge" ], Getopt.String
("in:out", add_bridge), s_"Map bridge 'in' to
'out'";
> + [ "--compressed" ], Getopt.Set compressed, s_"Compress
output file";
> + [ "--dcpath"; "--dcPath" ], Getopt.String
("path", set_string_option_once "--dcpath" dcpath),
> + s_"Override dcPath (for
vCenter)";
> + [ "--debug-overlay"; "--debug-overlays" ], Getopt.Set
debug_overlays, s_"Save overlay files";
> + [ "-i" ], Getopt.String (i_options, set_input_mode),
s_"Set input mode (default: libvirt)";
> + [ "-ic" ], Getopt.String ("uri",
set_string_option_once "-ic" input_conn),
> + s_"Libvirt URI";
> + [ "-if" ], Getopt.String ("format",
set_string_option_once "-if" input_format),
> + s_"Input format (for -i
disk)";
I'm interested to know if these awkward "single dash long options"
actually work now? And the -o* ones below.
Thanks to getopt_long_only, they work (and thus the v2v test suite
passes).
Thanks for the thorough review, v2 coming in a moment.
--
Pino Toscano