On Mon, Feb 14, 2022 at 04:08:21PM +0000, Richard W.M. Jones wrote:
On Mon, Feb 14, 2022 at 04:52:17PM +0100, Laszlo Ersek wrote:
> On 02/14/22 14:01, Richard W.M. Jones wrote:
> > But nbdcopy needs to be reworked to make the input and output requests
> > separate, so that nbdcopy will coalesce and split blocks as it copies.
> > This is difficult.
> >
> > Another problem I'm finding (eg
> >
https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c9) is that
> > performance of new virt-v2v is extremely specific to input and output
> > mode, and hardware and network configurations. For reasons that I
> > don't fully understand.
>
> How are the nbdcopy source and destination coupled with each other? From
> work I'd done a decade ago, I remember that connecting two
> network-oriented (UDP) processes with a small-buffer pipe between them
> caused very bad effects. Whenever either process was blocked on the
> network (or on a timer, for example), the pipe went immediately full or
> empty (dependent on the particular blocked process), which in turn
> blocked the other process almost immediately. So the mitigation for that
> was to create a simple local app, to be inserted between the two
> network-oriented processes in the pipeline, just to de-couple them from
> each other, and make sure that a write to the pipe, or a read from it,
> would effectively never block. (The app-in-the-middle did have a maximum
> buffer size, but it was configurable, so not a practical limitation; it
> could be multiple tens of MB if needed.)
>
> If nbdcopy does some internal queueing (perhaps implicitly, i.e. by
> allowing multiple requests to be in flight at the same time), then
> seeing some stats on those "in real time" could be enlightening.
So the way it works at the moment is it's event driven. Ignoring
extents to keep the description simple, we issue asynch read requests
(ie. nbd_aio_pread) and in the completion callbacks of those requests,
asynchronous write requests are started (ie. nbd_aio_pwrite).
https://gitlab.com/nbdkit/libnbd/-/blob/6725fa0e129f9a60d7b89707ef8604e0a...
There is a limit on the number of parallel requests in flight
(nbdcopy --requests, default 64). This limits the implicit buffer to
max_requests * request_size. That's 16MB in the default
configuration. Quite small actually ...
https://gitlab.com/nbdkit/libnbd/-/blob/6725fa0e129f9a60d7b89707ef8604e0a...
You might be on to something!
I asked Ming Xie to run a special build of virt-v2v with all datapath
debugging enabled and this allows me to calculate the size of the
nbdcopy implicit buffer, ie. the value returned by the in_flight
function (see second link above).
The results (attached) show that the internal buffer is full (~ 64
requests) just about the whole time. (Note that because of request
splitting, it's possible for the buffer to grow larger than 64
requests, which explains occasional bursts above this "limit".)
Anyway I've done another build of virt-v2v which calls nbdcopy with
--requests=1024, so we'll see if that improves performance.
It may not do if the problem is really that one side is just slow.
The above problem might combine with the small HTTP request size +
synchronous request issue that Nir pointed out in his patch, if there
are longer round trips on the QE machines than in my local testing.
If --requests=1024 alone doesn't make any difference I'll try another
test build that combines this with larger request size.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages.
http://libguestfs.org