On Fri, Oct 01, 2021 at 12:35:50PM +0200, Laszlo Ersek wrote:
Hi Rich,
(dropping libvirt-devel)
On 09/30/21 13:53, Richard W.M. Jones wrote:
> Also we don't currently try to find or rewrite /dev/disk/ paths in
> guest configuration files. The only rewriting that happens is for
> /dev/[hs]d* block device filenames and a few others. The actual code
> that does this is convert/convert_linux.ml:remap_block_devices
>
> So I wouldn't over-think this. It's likely fine to identify such
> devices and rewrite them as "/dev/cdrom", assuming that (I didn't
> check) udev creates that symlink for any reasonably modern Linux. And
> if there's more than one attached CD to the source, only convert the
> first one and warn about but drop the others.
So I've started reading "convert/convert_linux.ml" in parallel with the
OCaml manual. (I dabbled for a few weeks in Haskell when everyone else
did a few years ago, so it's not 100% unfamiliar.)
Question:
let family =
match inspect.i_distro with
| "fedora"
| "rhel" | "centos" | "scientificlinux" |
"redhat-based"
| "oraclelinux" -> `RHEL_family
| "altlinux" -> `ALT_family
| "sles" | "suse-based" | "opensuse" ->
`SUSE_family
| "debian" | "ubuntu" | "linuxmint" |
"kalilinux" -> `Debian_family
Here, RHEL_family, ALT_family, SUSE_family, Debian_family are not plain
constructors ("fixed" variants) [1] but polymorphic ones [2].
Why?
Was this done *only* in order so that an explicit type definition such
as
type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family
could be avoided?
There are two types of variants in OCaml. The first is the normal one
and it requires a type definition. eg:
type t = A | B | C of int | D of string * int
with values like “A” or “C 3” or “D ("hello", 1)”.
These are ones we'd normally use and have the (arguable?) advantage
that you have to write (and therefore, document) the full type somewhere.
The second type are polymorphic variants. These are actually way more
complicated than I'm going to describe below because they have a load
of set theory behind them (if you really want to know see the docs
that you've already linked to). They are written with backtick before
the name of the variant. The brief idea is that you let the OCaml
compiler work out the type for you.
For example:
let f = function 1 -> `A | 2 -> `B | i -> `C i
creates a function f which returns the type:
type pv = [ `A | `B | `C of int ]
Actually if you type this into the toplevel it says:
$ rlwrap ocaml
OCaml version 4.12.0
# let f = function 1 -> `A | 2 -> `B | i -> `C i ;;
val f : int -> [> `A | `B | `C of int ] = <fun>
As you observe below there's an extra ">" character. That means that
the type is "at least" that one, and there might be more cases, but
the function that I wrote only contains the 3 cases.
Anyway the short of this is that these open-ended, automatically
resolved types don't require a type definition so they're less
documented but more flexible.
(And another thing: It's possible to reuse the constructors in
unrelated types. eg. `A might be part of another type. But don't do
this unless it's for a very good reason).
In all cases the OCaml compiler enforces type safety, so although
these types are less explicit, they're still safe. If I extend my
example above and forget about one of the cases, the compiler will
warn (in the v2v code, we've set up the flags to give a hard error):
# let g j = match f j with `A -> 1 | `B -> 2 ;;
Warning 8 [partial-match]: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
`C _
val g : int -> int = <fun>
Based on my (incomplete understanding) of the ocaml docs, this looks
like a bad idea. We need no polymorphism here, and using a static type
(a fixed variant) is generally beneficial [3].
The following (minimally modified) definition at the OCaml REPL:
let family1 distro =
match distro with
| "fedora"
| "rhel" | "centos" | "scientificlinux" |
"redhat-based"
| "oraclelinux" -> `RHEL_family
| "altlinux" -> `ALT_family
| "sles" | "suse-based" | "opensuse" ->
`SUSE_family
| "debian" | "ubuntu" | "linuxmint" |
"kalilinux" -> `Debian_family
| _ -> assert false;;
deduces the following type:
string -> [> `ALT_family | `Debian_family | `RHEL_family | `SUSE_family ] =
<fun>
with the nasty "[>" mark at the start (allowing for further type
refinement [2], which I think we don't need here). Conversely.
type os_family = RHEL_family | ALT_family | SUSE_family | Debian_family;;
let family2 distro =
match distro with
| "fedora"
| "rhel" | "centos" | "scientificlinux" |
"redhat-based"
| "oraclelinux" -> RHEL_family
| "altlinux" -> ALT_family
| "sles" | "suse-based" | "opensuse" -> SUSE_family
| "debian" | "ubuntu" | "linuxmint" |
"kalilinux" -> Debian_family
| _ -> assert false;;
(note the explicit os_family type definition, and the removal of the
backticks from the constructor names) comes back with the type:
val family2 : string -> os_family = <fun>
Yes, this is all fair comment.
Basically converting a string to an enum constant.
So, what's the reason for the polymorphic variant?
I guess you could look at git history, but most likely it was "just
because". It's a bit quicker and adding a new cases involves less
code churn.
I'm not opposed if you want to make the change above if you think it
improves readability. In general I try to avoid tricksy features of
OCaml, like functors, and this could (arguably) be one of those cases.
BTW as an aside the implementation of the two types of constructors is
completely different, and polymorphic variants have a slower
implementation. For ordinary constructors they generally map to
integers, which makes them very efficient (eg. they can be stored in
registers etc). Polymorphic variants use a hash of the name instead
making them slower overall.
(see table in
https://rwmj.wordpress.com/2009/08/05/ocaml-internals-part-2-strings-and-...)
But a fun thing is you can cause hash collisions (although it is
picked up by the compiler - it's still type safe!):
# type collision = [`Eric_Cooper | `azdwbie];;
Error: Variant tags `azdwbie and `Eric_Cooper have the same hash value.
Change one of them.
(
https://groups.google.com/g/fa.caml/c/68oTntkRLyU)
There's even a link-time pass to find hash collisions across the whole
program.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/