Background
----------
A complaint I'm hearing is that some tools which take a long time to
run (virt-resize in particular) need to have progress bars to indicate
how long they are expected to run. This is also a basic usability
principle, see for example this paper:
http://www.chrisharrison.net/projects/progressbars/ProgBarHarrison.pdf
If you look at how virt-resize is implemented, the bulk of the time is
spent copying partitions in a single long-running function, copy_size:
http://libguestfs.org/guestfs.3.html#guestfs_copy_size
The current protocol is entirely synchronous so that while the daemon
is doing the copy_size, the library is blocked waiting for a reply
message and does nothing else in that thread. To implement a progress
bar for this call we'd have to have a way to query into the call while
it was running to find out how far it has got, or to have it send
regular status messages out. The plan below outlines a way to do
this.
Protocol changes
----------------
This plan does require changing the protocol. We're allowed to change
the library/daemon protocol (it's not ABI) but we tend to avoid doing
it because it means people can't repackage our appliance and use it in
other distributions (see
http://libguestfs.org/FAQ.html#distros). So
if we do change it now, we should:
(a) Only do it at the start of the 1.5 development cycle, indicating
that now would be a good time to release 1.4. See:
https://www.redhat.com/archives/libguestfs/2010-June/msg00069.html
(b) We should make the other protocol changes we've been wanting to
do, see:
http://libguestfs.org/guestfs.3.html#libguestfs_gotchas
(c) We need to make sure everyone understands the change to the
appliance and protocol.
The current protocol (ignoring file transfers) implements a very
simple RPC mechanism which is entirely synchronous:
library daemon
request message --------------->
daemon processes message
<----------------- reply message
I have discarded the idea that we should change to using an
asynchronous system, eg. allowing the library to issue more than one
request message simultaneously, because at this stage it adds great
complexity to both ends, and is I believe not necessary in order to
implement progress messages. This means that we can't make additional
"get status" requests during a call.
Instead I'd like to change the protocol like this:
library daemon
request message --------------->
daemon processes message
<---------------- status message
daemon processes message
<---------------- status message
daemon processes message
<----------------- reply message
In a long running call such as copy_size, the daemon would send
periodic status messages to the library.
Points to note:
(i) The daemon should self-limit these messages, eg. to once per
second, starting at least one second after the request.
(ii) An initial implementation of the library would simply discard
these status messages. This would allow us to test the daemon side,
only making trivial library changes, and be reasonably sure that the
daemon side is correct.
(iii) The status message contains just two (64 bit) numbers:
- Total size of the current operation
- Current progress (0 <= current <= total size)
The meaning of these two numbers is defined by the context of the
call, but would usually indicate, eg. total size in bytes and number
of bytes processed/copied so far. Callers are only interested in
the ratio of these two numbers when displaying a progress meter.
Library changes
---------------
On receiving a status message, the library can ignore it, or can:
(1) update the total and current fields in the guestfs_h handle, and/or
(2) call a prearranged callback function.
We would add a way to query these numbers for an existing handle:
void guestfs_get_progress (guestfs_h *g, int64_t *total, int64_t *current);
Callers can poll this function on any open handle from any thread to
retrieve the progress of the currently running call.
[Side note: In general you cannot call libguestfs APIs from multiple
threads:
http://libguestfs.org/guestfs.3.html#multiple_handles_and_multiple_threads]
Also callers may register a callback function using:
typedef void (*guestfs_progress_cb) (guestfs_h *g, void *data,
int64_t total, int64_t current);
extern void guestfs_set_progress_callback (guestfs_h *g,
guestfs_progress_cb cb, void *data);
which is called on receipt of a status message.
The numbers are reset to (-1, -1) when the final reply is received for
a call. The numbers are only meaningful when the handle state is busy
(see
http://libguestfs.org/guestfs.3.html#guestfs_is_busy).
All access to the numbers should be thread safe. Access to the busy
state should be made thread safe.
Note that we already implement thread safety using weak pthread
function calls in libguestfs (via gnulib), so this doesn't add any new
dependencies.
Daemon changes
--------------
Long running calls tend to be of two forms:
while (n < size) {
copy_a_buffer ();
n += size_copied;
}
or:
command ("long_running_external_command");
The first form can be changed easily:
while (n < size) {
copy_a_buffer ();
n += size_copied;
notify_progress (n, size);
}
The 'notify_progress' function can be called as often as needed, and
it would have to contain its own rate limiting functionality so that
it doesn't actually send progress messages back more often than
desired (eg. once per second). This allows for a very simple change
to all the potentially long-running daemon functions of the first
form.
The second form are more difficult to change. With each one we would
have to consider the nature of the external command, if it provides
some sort of progress feature or if we need to poll it from the daemon
(eg. polling the size of the input and output files).
Some functions of the second form could be changed. eg. The
implementation of guestfs_dd runs the external "dd" command, but could
be modified to an internal copy function of the first form, and this
might have other benefits too.
Other functions of the second form would be changed to poll status
over time. There is nothing in the library API which requires any
function to provide status messages.
Tools
-----
Tools, such as guestfish and virt-resize, must also be changed to poll
for status or set a callback for some long-running operations.
Obviously setting a callback would be preferred.
We would need to change the Perl and OCaml bindings at least to
support the progress callback.
Generator
---------
After making these changes we'd have a few API calls which generate
status messages, and many calls which don't. You can call
guestfs_get_progress on any call, but it wouldn't return any useful
information for the majority of calls.
It would be useful to mark those calls which generate status messages
in the generator. This would allow us to:
(a) Generate documentation.
(b) Modify guestfish to print progress bars for long-running calls.
Summary
-------
The above plan allows progress bars to be added to long-running
operations in our tools. The changes are relatively non-invasive to
all parts of libguestfs. However we do need to change the protocol,
and we should only consider doing this at the start of an unstable
development cycle.
Known deficiencies
------------------
guestfs_launch() cannot be metered this way. In fact there's no way
to know how long the launch operation will take, and the daemon [by
definition] is not running during this time anyway.
It sends extra messages (short ones) over the socket, and these
messages may be ignored by the library.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v