On Wed, Sep 26, 2018 at 4:25 PM Richard W.M. Jones <rjones@redhat.com> wrote:
On Wed, Sep 26, 2018 at 02:40:54PM +0200, Fabien Dupont wrote:
> [Adding Tomas Golembiovsky]
>
> Well, that's mainly IMS related challenges. We're working on
> OpenStack output support and migration throttling and this implies
> changes to virt-v2v-wrapper.  This is then the opportunity to think
> about virt-v2v-wrapper maintenance and feature set. It has been
> created in the first place to simplify interaction with virt-v2v
> from ManageIQ.

Stepping back here, the upstream community on this mailing list have
no idea what you mean by the terms "IMS", "virt-v2v-wrapper" and
"ManageIQ".  I'll try to explain briefly:

* ManageIQ (http://manageiq.org/) = a kind of scriptable, universal
  management tool for cloudy things, using Ansible.

* Ansible = automates remote management of machines using ssh.

* IMS = an internal Red Hat project to add virt-v2v support to
  ManageIQ.

* virt-v2v-wrapper = a wrapper around virt-v2v which allows it to be
  called from Ansible.  The reason we need this is because Ansible
  doesn't support managing long-running processes like virt-v2v, so we
  need to have another component which provides an API which can be
  queried remotely, while tending to the long-running virt-v2v behind
  the scenes.  [Tomáš: Got a link to the code?  I can't find it right now]

Thanks for adding explanation. FWIW, Ansible is not involved in IMS.
There are some rogue code out there
(https://github.com/fdupont-redhat/ims-v2v-engine_ansible)
doing things with Ansible, because I like experimenting.

Virt-v2v-wrapper code is available here:
https://github.com/oVirt/ovirt-ansible-v2v-conversion-host/blob/master/files/virt-v2v-wrapper.py
 
> The first challenge we faced is the interaction with virt-v2v. It's
> highly versatile and proposes a lot of options for input and
> output. The downside of it is that over time is becomes more and
> more difficult to know them all.

The options are all documented in the manual.  I have thought for a
while that we need to tackle the virt-v2v manual: It's too big, and
unapproachable.  Really I think we need to split it into topic
sections and rewrite parts of it.  Unfortunately I've not had time to
do that so far.

> And all the error messages are made for human beings, not machines,
> so providing feedback through a launcher, such as virt-v2v-wrapper,
> is difficult.

This is indeed an issue.  Pino recently added enhanced support for the
‘--machine-readable’ option which should address some problems:

  https://github.com/libguestfs/libguestfs/commit/afa8111b751ed33e1989e6d9bb03928cefa17917

If this change still doesn't fully address the issues with automating
virt-v2v then please let us know what specifically can be improved
here.

Thanks for the link. virt-v2v-wrapper is using the standalone call with
--machine-readable to get the capabilities of virt-v2v. It's used to
check for --mac option support.

[...]
> For progress, the only way to know what happens is to run virt-v2v
> in debug mode (-v -x) and parse the (very extensive)
> output. Virt-v2v-wrapper does it for us in IMS, but it is merely a
> workaround.

Right, this is indeed another problem which we should address.  I
thought we had an RFE filed for this, but I cannot find it.  At the
moment the workaround you mention is very ugly and clunky, but AFAIK
it does work.

You're right. I works. We'll test if the --machine-readable enhances the
machine experience, and make RFEs if needed.
 
> I'd expect a conversion tool to provide a comprehensive progress,
> such as "I'm converting VM 'my_vm' and more specifically disk X/Y
> (XX%). Total conversion progress is XX%". Of course, I'd also expect
> a machine readable output (JSON, CSV, YAML…). Debug mode ensures we
> have all the data in case of failure, so I don't say remove it, but
> simply add specialized outputs.

We can discuss debug output vs progress output and formats to use
separately when fixing the above, but yes, point taken.

> The third challenge was to clean up in case of virt-v2v failure. For
> example, when it fails converting a disk to RHV, it doesn't clean
> the finished and unfinished disks.

This is a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1616226).
It's been on my to-do list for quite a while , but I haven't got to
it, so patches welcome ...

Thanks for the BZ reference. IIUC, this will clean the disk being converted
at the kill interruption time. What about the already converted disks for
a multi-disks VM ? IMO, they should also be removed.
 
> Virt-v2v-wrapper was initially written by RHV team (Tomas) for RHV
> migrations, so it sounded fair(ish). But, extending the outputs to
> OpenStack, we'll have to deal with leftovers in OpenStack too. Maybe
> a cleanup on failure option would be a good idea, with a default to
> false to not break existing behaviour.

The issue of cleaning up disks in general is a hard one to solve.

With the OpenStack backend we try our best as long as virt-v2v
exits on a normal failure path:

  https://github.com/libguestfs/libguestfs/blob/e2bafffce24cd8c0436bf887ee166a3ae2257bbb/v2v/output_openstack.ml#L370-L384

However there are always going to be cases where that is not possible
(eg. virt-v2v segfaults or is kill -9'd or whatever), and in that case
I envisaged for OpenStack some sort of external garbage collector.  To
this end, disks which have not been finalized are given a special
description so it should be possible to find them after a full
migration has completed:

  https://github.com/libguestfs/libguestfs/blob/e2bafffce24cd8c0436bf887ee166a3ae2257bbb/v2v/output_openstack.ml#L386-L392

IIRC virt-v2v-wrapper is sending kill -9 to virt-v2v, which it should
not do.

It's not virt-v2v-wrapper that kills virt-v2v, it's ManageIQ. We have the
PID from virt-v2v-wrapper state file. What would be the preferred way
to interrupt it ?
 
> The fourth challenge is to limit the resources allocated to virt-v2v
> during conversion, because concurrent conversions may have a huge
> impact on conversion host performance. In the case of an oVirt host,
> this can impact the virtual machines that run on it. This is not
> covered yet by the wrapper, but implementation will likely be based
> on Linux cgroups and tc.

Right, sounds sensible.

> The wrapper also adds an interesting feature: both virt-v2v and
> virt-v2v-wrapper run daemonized and we can asynchronously poll the
> progress. This is really key for IMS (and maybe for others): this
> allows us to start as many conversions in parallel as needed and
> monitor them. Currently, the Python code forks and detaches itself,
> after providing the paths to the state file. In the discussion about
> cgroups, it was mentioned that systemd units could be used, and it
> echoes with the daemonization, as systemd-run allows running
> processes under systemd and in their own slice, on which cgroups
> limits can be set.

"the Python code" meaning the code in virt-v2v-wrapper?

Yes, it is.

I also agree that using a systemd temporary unit is the way to go
here.  As well as providing a natural way to limit the resources used
by virt-v2v (since systemd unit implies a cgroup), it also solves the
problems around logging and collecting debug logs.

> About the evolution of virt-v2v-wrapper that I'm going to describe,
> let me state that this is my personal view and it endorses only
> myself.
>
> I would like to see the machine-to-machine interaction, logging and
> cleanup in virt-v2v itself because it is valuable to everyone, not
> only IMS.

Here's I think where we are going to disagree.  virt-v2v is a command
line tool, with lots of users outside of the internal IMS Red Hat
project.  Here in the upstream community I'm afraid we make decisions
which are best for all our users, not for one particular user!

I agree with you. That's why I propose that M2M interaction (e.g.
--machine-readable), logging (format and output options, not only
journald capture) and cleanup (handling interruption) are done at
the virt-v2v level. IMO, these would be beneficial to everybody.

As you say, invocation via systemd is useful for IMS, and
documenting it would make it valuable to others. The other three
topics are not linked to systemd and should be addressed on their
own.
 
But in any case I don't see how adding systemd unit support to
virt-v2v itself helps very much.  It's really easy to run virt-v2v in
a systemd unit -- see the attached email for full details.

This gains all the benefits I mention above and is hardly any effort
at all.  You can even adjust the properties on the fly.

(I will admit this is really obscure and undocumented, it took me
quite a lot of time last month to work it out.  We should add this to
the virt-v2v documentation, but at least the docs are available on the
mailing list now.)

> I would also like to convert virt-v2v-wrapper to a conversion API
> and Scheduler service. The idea is that it would provide an
> as-a-Service endpoint for conversions, that would allow creation of
> conversion jobs (POST), fetching of the status (GET), cancelation of
> a conversion (DELETE) and changing of the limits (PATCH). In the
> background, a basic scheduler would simply ensure that all the jobs
> are running. Each virt-v2v process would be run as a systemd unit
> (journald could capture the debug output), so that it is independent
> from the API and Scheduler processes.

This sounds like an interesting and useful evolution of the wrapper,
and we should try to add pieces to virt-v2v to make it easier to run
under the wrapper, but at the end of the day virt-v2v is a command
line tool used by many different projects and purposes so actually
adding all this to virt-v2v itself is a non-starter.

Again, agreed. That's why I talk about an evolution of
virt-v2v-wrapper. Some of the clunky workarounds virt-v2v-wrapper
implements should be handled by virt-v2v. The launch and
monitoring of virt-v2v is out of scope of the virt-v2v command.
 
> I know that I can propose patches for changes to virt-v2v, or at
> least file RFEs in Bugzilla (my developer skills and programing
> languages breadth are limited). For the evolved wrapper, my main
> concern is its housing and maintenance. It doesn't work only for
> oVirt, so having its lifecycle tied to oVirt doesn't seem relevant
> in the long term. In fact, it can be for any virt-v2v output, so my
> personal opinion is that it should live in the virt-v2v ecosystem
> and follow it's lifecycle. As for its maintenance, we still have to
> figure out who will be responsible for it, i.e. who will be able to
> dedicate time to it.

There's certainly a case for making the wrapper into a standalone
project, with a proper upstream etc.  It could even be shipped under
the libguestfs umbrella.  But that's Tomáš's domain so I leave it up
to him to decide what to do.

That's why I added him to this thread :) 
 
Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



---------- Forwarded message ----------
From: "Richard W.M. Jones" <rjones@redhat.com>
To: v2v-devel@redhat.com
Cc: 
Bcc: 
Date: Wed, 22 Aug 2018 14:42:51 +0100
Subject: Limiting virt-v2v block I/O using cgroups - network I/O still unknown
Turns out this is fairly easy, although quite obscure.

Just use ‘systemd-run --pipe’ to run the virt-v2v command in a cgroup.
The ‘--pipe’ option ensures it is still connected to stdin/stdout/
stderr (but see below).

  $ systemd-run --user --pipe \
      -p BlockIOWriteBandwidth="/dev/sda2 1K" \
      virt-v2v -i disk /var/tmp/fedora-27.img -o local -os /var/tmp

  Running as unit: run-u4429.service
  [   0.0] Opening the source -i disk /var/tmp/fedora-27.img
  [   0.0] Creating an overlay to protect the source from being modified
  etc.

See systemd.resource-control(5) for a list of controls.  If you are
experimenting with this then it is easier to start with a command like
‘sleep 1000000’ rather than using virt-v2v.

Some notes:

(1) systemd-run changes the directory to ‘/’ so all path parameters to
virt-v2v must be absolute.

(2) If using something called "cgroup v2" you have to use
IOWriteBandwithMax.  However even though I'm using a very recent
Fedora & Linux kernel, I'm apparently not using cgroup v2.

(3) You can modify the settings on the fly using:

  $ systemctl [--user] set-property --runtime run-uXXX.service \
      BlockIOWriteBandwith="/dev/sda2 NEW_SETTING"

where run-uXXX.service is the service name printed by systemd-run
before it starts virt-v2v.

Also:

  $ systemctl --user show run-u4466.service | grep BlockIO
  BlockIOAccounting=no
  BlockIOWeight=[not set]
  StartupBlockIOWeight=[not set]
  BlockIOWriteBandwidth=/dev/sda2 10000

Note you have to enable BlockIOAccounting to collect stats.

Also:

  systemctl [--user] status run-uXXX.service

to read information about the status of the service.

Also there are other systemd/cgroup tools like ‘systemd-cgtop’ and
‘systemd-cgls’ which may be useful to administrators.

(4) Specifying the block device is a PITA.  My reading of the
documentation makes me think you can use filesystem paths instead of
block device names, but it didn't work for me
(https://github.com/systemd/systemd/issues/9908).

                - * - * - * -

Network prioritization (what we actually care about) is quite a lot
more complex.  First of all, documentation everywhere refers to
net_cls (NetClass), but that is apparently obsolete.  The new thing is
net_prio, but virtual nothing talks about how to use that.  In any
case we'd use it in conjunction with ‘tc’.

                - * - * - * -

While looking into this I wondered if it wouldn't be better to run
virt-v2v in a proper systemd unit.  We'd use systemd to set up
logging.  The wrapper would simply become a small script that controls
the unit through systemctl.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top


--

Fabien Dupont

PRINCIPAL SOFTWARE ENGINEER

Red Hat - Solutions Engineering

fabien@redhat.com     M: +33 (0) 662 784 971

  TRIED. TESTED. TRUSTED.

Twitter: @redhatway | Instagram: @redhatinc | Snapchat: @redhatsnaps