On Wed, Nov 25, 2020 at 10:29:45AM +0000, Richard W.M. Jones wrote:
For a long time I've wanted to split up virt-v2v into smaller
components to make it easier to consume. It's never been clear how to
do this, but I think I have a workable plan now, described in this email.
In contrast I am replying for a long time here as well. I know we actually
talked about all this on IRC, but I them planned to post the summary here and I
did not. Unfortunately I cannot find this in my IRC logs any more, so I will
try to remember as many things as possible.
----------------------------------------------------------------------
First, the AIMS, which are:
(a) Preserve current functionality, including copying conversion,
in-place conversion, and the virt-v2v command line.
(b) Allow warm migration to use virt-v2v without requiring the
"--debug-overlays hack".
(c) Allow threads, multi-conn, and parallel copying of guest disks, all
for better copying performance.
(d) Allow an alternate supervisor to convert and copy many guests in
parallel, given that the supervisor has a global view of the
system/network (I'm not intending to implement this, only to make
it possible).
(e) Better progress bars.
(f) Better logging.
(g) Reuse as much existing code as possible. This is NOT a rewrite!
So my idea was that this could be split into phases similar to what is shown in
v2v/types.mli, separately per each input and output object and then some core
functionality. Any internal state could be kept in a specific file so that it
is accessible to all these helpers. After that virt-v2v _could_ be implemented
by a shell script if someone wanted. Of course that is just for illustration of
how usable that would be, not that it would be viable to make that a shell
script, of course.
There are some command line options to limit virt-v2v to do only some phase or
skip a phase (--print-source, --no-copy, -o null) and this would make it more
approachable. Serialising the internal state into a file is something that does
not have to be parsed by anyone else and can also be pretty relaxed when it
comes to backward compatibility (two input helpers will probably not need to be
run from two different versions of virt-v2v).
What would be nice to have exposed is the internal representation of all the
information needed to construct an output guest description. I would have to be
in an extensible format, but it would not be prone to getting stale as often.
Think of it as `-o json` and then few helpers that convert this type of
information into the output format. What would be nice about this is that
supporting any new format would not require a new output type for each tiny
change. Keeping some existing format up to date could also be easier because
the clear split could lower the barrier for developers to support their cloud
solution format. Of course I do not know how much of an issue this currently is
(or is not), but it seemed like a good idea to me.
----------------------------------------------------------------------
Here's my PLAN:
/usr/bin/virt-v2v still exists, but it's now a supervisor program
(possibly even a shell script) that runs the steps below:
(1) Set up the input side by running "helper-v2v-input-<type>". For
all input types this creates a temporary directory containing:
/tmp/XXXXXX/in1 NBD endpoints overlaying the source disk(s)
/tmp/XXXXXX/in2 (these are actually Unix domain sockets)
/tmp/XXXXXX/in3
/tmp/XXXXXX/metadata.in Metadata parsed from the source.
Currently for most inputs we have a running nbdkit process for
each source disk, and we'd do the same here, except we add
nbdkit-cow-filter on top so that the source disk is protected from
being modified. Another small difference is that for -i disk
(local input) we would need an active nbdkit process on top of the
disk, whereas currently we set the disk as a qcow2 backing file.
(2) Perform the conversion by running "helper-v2v-convert". This does
the conversion and sparsification. It writes directly to the NBD
endpoints (in*) above. The writes are stored in the COW overlay
so the source disk is not modified.
This would make it easy for someone to copy the disks themselves and then
provide them as nbd sockets if they want to modify the copies in place, locally,
*after* copying and without any extra cow layer on top. That is good.
Conversion will also create an output metadata file:
/tmp/XXXXXX/metadata.out Target metadata
Exact format of the metadata files is to be decided, but some kind
of not-quite-libvirt-XML may be suitable. It's also not clear if
the metadata format is an internal detail of virt-v2v, or if we
document it as a stable API.
(3) Set up the output side by running "helper-v2v-output-<type>
setup". This will read the output metadata and do whatever is
needed to set up the empty output disks (perhaps by creating a
guest on the target, but also this could be done in step (5)
below).
This will create:
/tmp/XXXXXX/out1 NBD endpoints overlaying the target disk(s)
/tmp/XXXXXX/out2 (these are actually Unix domain sockets)
/tmp/XXXXXX/out3
(4) Do the copy. By default this will run either nbdcopy or qemu-img
convert from in* -> out*.
Copying could be done in parallel, currently it is done serially.
(5) Finalize the output by running "helper-v2v-output-<type> final".
This might create the target guest and whatever else is needed.
(6) Kill the NBD servers and clean up the temporary directory.
Of course the suggested split is not required, what you suggest here would work
just as well. I just wanted share the idea I had because I thought it could
actually be easier to do, maintain, and future-proof.
----------------------------------------------------------------------
Let's see how this plan matches the aims.
Aim (a):
Copying conversion works as outlined above. In-place conversion
works by placing an NBD server on top of the files you want to
convert and running helper-v2v-convert (virt-v2v --in-place would
also still work for backwards compat).
I remember --in-place doing some input-related shenanigans that made it
different from "just convert this". But I think keeping the original
--in-place
will not cause any issues.
Aim (b):
Warm migration: Should be fairly clear this can work in the same way
as in-place conversion, but I'll discuss this further with Martin K
and Tomas to make sure I'm not missing anything.
The separation of steps works a bit better. I think keeping the pre-checks and
everything is good, it's just that when one is messing up with the workflow it
is easier to plug various phases together at different times when it is more
split apart. I would imagine high-level debugging by non-expert is easier as
well.
Aims (c), (d):
Threads etc for performance: Although I don't plan to implement
this, it's clear that an alternate supervisor program could improve
performance here by either doing copies of a single guest / multiple
disks in parallel, but even better by having a global view of the
system and doing copies of multiple guests' disks in parallel.
This is outside the scope of the virt-v2v project, but in scope for
something like MTV.
And easy to do with the split ;-)
Aim (e):
Better progress bars: nbdcopy should have support for
machine-readable progress bars, once I push the changes. It will
mean no more need to parse debug logs.
Aim (f):
Better logging: I hope we can log each step separately.
A custom supervisor program would also be able to tell which
particular step failed (eg. did it fail in conversion? did it fail
copying a disk and which one?)
Aim (g):
This works by splitting up the existing v2v code base into separate
binaries. It is already broadly structured (internally) like this.
So it's not a rewrite, it's a big refactoring.
However I'd probably write a new virt-v2v supervisor binary, because
the existing command line parsing code is extremely complex.
Sounds similar to what I thought. And if it is simplified, then virt-v2v can
just forward arguments to appropriate places. Sounds good.
I do not know if I mentioned everything and I am not sure how deep we went into
some of the details, but I guess the best way will show up later on no matter
what.
Once again sorry for such a late public reply, hopefully this will at least keep
part of the conversation archived =)
Have a great day,
Martin