On Fri, Jul 02, 2010 at 11:27:26PM +0100, Richard W.M. Jones wrote:
Background
----------
A complaint I'm hearing is that some tools which take a long time to
run (virt-resize in particular) need to have progress bars to indicate
how long they are expected to run. This is also a basic usability
principle, see for example this paper:
http://www.chrisharrison.net/projects/progressbars/ProgBarHarrison.pdf
If you look at how virt-resize is implemented, the bulk of the time is
spent copying partitions in a single long-running function, copy_size:
http://libguestfs.org/guestfs.3.html#guestfs_copy_size
The current protocol is entirely synchronous so that while the daemon
is doing the copy_size, the library is blocked waiting for a reply
message and does nothing else in that thread. To implement a progress
bar for this call we'd have to have a way to query into the call while
it was running to find out how far it has got, or to have it send
regular status messages out. The plan below outlines a way to do
this.
Protocol changes
----------------
This plan does require changing the protocol. We're allowed to change
the library/daemon protocol (it's not ABI) but we tend to avoid doing
it because it means people can't repackage our appliance and use it in
other distributions (see
http://libguestfs.org/FAQ.html#distros). So
if we do change it now, we should:
(a) Only do it at the start of the 1.5 development cycle, indicating
that now would be a good time to release 1.4. See:
https://www.redhat.com/archives/libguestfs/2010-June/msg00069.html
(b) We should make the other protocol changes we've been wanting to
do, see:
http://libguestfs.org/guestfs.3.html#libguestfs_gotchas
(c) We need to make sure everyone understands the change to the
appliance and protocol.
The current protocol (ignoring file transfers) implements a very
simple RPC mechanism which is entirely synchronous:
library daemon
request message --------------->
daemon processes message
<----------------- reply message
I have discarded the idea that we should change to using an
asynchronous system, eg. allowing the library to issue more than one
request message simultaneously, because at this stage it adds great
complexity to both ends, and is I believe not necessary in order to
implement progress messages. This means that we can't make additional
"get status" requests during a call.
Allowing overlapping parallel requests isn't really an asychronous
system, at least from the protocol POV. It could be from the daemon
implementation POV if the daemon didn't want to use threads for RPC
dispatch.
In libvirt we didn't make the individual API calls asynchronous,
but we did allow for multiple threads to use a single connection
concurrently, so one thread doesn't block another at a protocol
level. It certainly added alot of complexity to the code, but
becoming fully multithread capable was a big win
Instead I'd like to change the protocol like this:
library daemon
request message --------------->
daemon processes message
<---------------- status message
daemon processes message
<---------------- status message
daemon processes message
<----------------- reply message
In a long running call such as copy_size, the daemon would send
periodic status messages to the library.
Points to note:
(i) The daemon should self-limit these messages, eg. to once per
second, starting at least one second after the request.
(ii) An initial implementation of the library would simply discard
these status messages. This would allow us to test the daemon side,
only making trivial library changes, and be reasonably sure that the
daemon side is correct.
(iii) The status message contains just two (64 bit) numbers:
- Total size of the current operation
- Current progress (0 <= current <= total size)
The meaning of these two numbers is defined by the context of the
call, but would usually indicate, eg. total size in bytes and number
of bytes processed/copied so far. Callers are only interested in
the ratio of these two numbers when displaying a progress meter.
This need not be incompatible. The async status message packets can
use a new value for 'guest_message_direction'. If existing clients
don't already ignore packets with unknown direction field, then you
could include a post-connect RPC call to turn on the status messages
for the connection. So a old client talking to new daemon would never
see the async messages, unless they enabled this feature explicitly.
Library changes
---------------
On receiving a status message, the library can ignore it, or can:
(1) update the total and current fields in the guestfs_h handle, and/or
(2) call a prearranged callback function.
We would add a way to query these numbers for an existing handle:
void guestfs_get_progress (guestfs_h *g, int64_t *total, int64_t *current);
Callers can poll this function on any open handle from any thread to
retrieve the progress of the currently running call.
[Side note: In general you cannot call libguestfs APIs from multiple
threads:
http://libguestfs.org/guestfs.3.html#multiple_handles_and_multiple_threads]
In the wire protocol, when sending back the status messages, it is
probably wise to make sure the 'serial' field of the status message
always matches the 'serial' field of the initial RPC request. This
means if you do make the library multi-threaded in the future, you
have the ability to correlate async status messages to the correct
thread's RPC call.
Daemon changes
--------------
Long running calls tend to be of two forms:
while (n < size) {
copy_a_buffer ();
n += size_copied;
}
or:
command ("long_running_external_command");
The first form can be changed easily:
while (n < size) {
copy_a_buffer ();
n += size_copied;
notify_progress (n, size);
}
The 'notify_progress' function can be called as often as needed, and
it would have to contain its own rate limiting functionality so that
it doesn't actually send progress messages back more often than
desired (eg. once per second). This allows for a very simple change
to all the potentially long-running daemon functions of the first
form.
The second form are more difficult to change. With each one we would
have to consider the nature of the external command, if it provides
some sort of progress feature or if we need to poll it from the daemon
(eg. polling the size of the input and output files).
Some functions of the second form could be changed. eg. The
implementation of guestfs_dd runs the external "dd" command, but could
be modified to an internal copy function of the first form, and this
might have other benefits too.
Other functions of the second form would be changed to poll status
over time. There is nothing in the library API which requires any
function to provide status messages.
Tools
-----
Tools, such as guestfish and virt-resize, must also be changed to poll
for status or set a callback for some long-running operations.
Obviously setting a callback would be preferred.
We would need to change the Perl and OCaml bindings at least to
support the progress callback.
Generator
---------
After making these changes we'd have a few API calls which generate
status messages, and many calls which don't. You can call
guestfs_get_progress on any call, but it wouldn't return any useful
information for the majority of calls.
If you're not very careful adding a 'guestfs_get_progress' API
could make it very hard to introduce multithread support. Since
the main thread is blocked on the original API, I assume this
get_progress will be called from a separate thread, so you can't
just use thread locals to associate this info.
I can see two options to make this extensible to allow potential
for future concurrent usage
- Associate a 'status' object with a thread.
guestfs_status_h s = guestfs_status_new();
guestfs_set_status_object(g, s);
guestfs_do_some_api_call(g);
This would store 's' in a thread-local. Any API calls
made in this thread would use that object.
Another thread can query that same object. Thus many
threads can setup their own status objects and each
can be monitored independantly.
- Add variants of each API which take a status object
guestfs_status_h s = guestfs_status_new();
guestfs_do_some_api_call_with_status(g, s);
The status object is associated explicitly with an API call, so again
concurrently API calls from different threads can be monitored
independantly.
Regards,
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://deltacloud.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|