On 02/15/22 11:43, Richard W.M. Jones wrote:
> On Mon, Feb 14, 2022 at 04:08:21PM +0000, Richard W.M. Jones wrote:
>> On Mon, Feb 14, 2022 at 04:52:17PM +0100, Laszlo Ersek wrote:
>>> On 02/14/22 14:01, Richard W.M. Jones wrote:
>>>> But nbdcopy needs to be reworked to make the input and output requests
>>>> separate, so that nbdcopy will coalesce and split blocks as it copies.
>>>> This is difficult.
>>>>
>>>> Another problem I'm finding (eg
>>>>
https://bugzilla.redhat.com/show_bug.cgi?id=2039255#c9) is that
>>>> performance of new virt-v2v is extremely specific to input and output
>>>> mode, and hardware and network configurations. For reasons that I
>>>> don't fully understand.
>>>
>>> How are the nbdcopy source and destination coupled with each other? From
>>> work I'd done a decade ago, I remember that connecting two
>>> network-oriented (UDP) processes with a small-buffer pipe between them
>>> caused very bad effects. Whenever either process was blocked on the
>>> network (or on a timer, for example), the pipe went immediately full or
>>> empty (dependent on the particular blocked process), which in turn
>>> blocked the other process almost immediately. So the mitigation for that
>>> was to create a simple local app, to be inserted between the two
>>> network-oriented processes in the pipeline, just to de-couple them from
>>> each other, and make sure that a write to the pipe, or a read from it,
>>> would effectively never block. (The app-in-the-middle did have a maximum
>>> buffer size, but it was configurable, so not a practical limitation; it
>>> could be multiple tens of MB if needed.)
>>>
>>> If nbdcopy does some internal queueing (perhaps implicitly, i.e. by
>>> allowing multiple requests to be in flight at the same time), then
>>> seeing some stats on those "in real time" could be enlightening.
>>
>> So the way it works at the moment is it's event driven. Ignoring
>> extents to keep the description simple, we issue asynch read requests
>> (ie. nbd_aio_pread) and in the completion callbacks of those requests,
>> asynchronous write requests are started (ie. nbd_aio_pwrite).
>>
>>
https://gitlab.com/nbdkit/libnbd/-/blob/6725fa0e129f9a60d7b89707ef8604e0a...
>>
>> There is a limit on the number of parallel requests in flight
>> (nbdcopy --requests, default 64). This limits the implicit buffer to
>> max_requests * request_size. That's 16MB in the default
>> configuration. Quite small actually ...
>>
>>
https://gitlab.com/nbdkit/libnbd/-/blob/6725fa0e129f9a60d7b89707ef8604e0a...
>
> You might be on to something!
>
> I asked Ming Xie to run a special build of virt-v2v with all datapath
> debugging enabled and this allows me to calculate the size of the
> nbdcopy implicit buffer, ie. the value returned by the in_flight
> function (see second link above).
>
> The results (attached) show that the internal buffer is full (~ 64
> requests) just about the whole time. (Note that because of request
> splitting, it's possible for the buffer to grow larger than 64
> requests, which explains occasional bursts above this "limit".)
>
> Anyway I've done another build of virt-v2v which calls nbdcopy with
> --requests=1024, so we'll see if that improves performance.
>
> It may not do if the problem is really that one side is just slow.
> The above problem might combine with the small HTTP request size +
> synchronous request issue that Nir pointed out in his patch, if there
> are longer round trips on the QE machines than in my local testing.
>
> If --requests=1024 alone doesn't make any difference I'll try another
> test build that combines this with larger request size.
It could be interesting to see src in_flight vs. dst in_flight.
Also, even if one side is significantly slower than the other, that
slower speed may still not be reached end-to-end, if either end is
"bursty" (regardless of speed). A big buffer in the middle can smooth
over bursts, and so the experienced speed can approach the expected
average speed. The current log is not necessarily a sign of bad things
happening; after all we expect nbdcopy to be busy, so requests should be
in flight. I guess what I had in mind was this: one of the ends attempts
to produce or consume many requests in a burst, but there's no room (or
no data) for that burst in the buffer, so even though the wire speed
would allow for more, we're blocked elsewhere. Eventually the queue
should fill up, but then a bursty "get" should never block,
alternatively, eventually the queue should be almost always empty, but
then a bursty "put" should never block. With a small buffer however,
those "should not" cases can occur easily.
Sorry that I can't express myself better... and this "burst" stuff may
not apply to nbdcopy in the first place.
VixDiskLib_ReadAsynch command (from submission through to completion),
and it's very small, approx. 0.003s.
This isn't a surprise - VDDK reads and writes are fast and can be
overlapped. (It's VDDK extent querying which is slow and serialized.)
Unfortunately there's not enough information in the log to find out
precisely how long it takes the Python plugin on the RHV side to
service writes. Although I guess it's not fast given the findings below.
To address your point above, I can find src vs dst in flight very
easily:
$ grep "nbd_aio_in_flight: leave:"
virt-v2v-1.45.98-1-bz2039255-esx7.0-vddk7.0.2-rhv-direct_true.log | less
I won't reproduce the full log, but the jist of it is:
Feb 14 23:03:43 libnbd: debug: nbd1: nbd_aio_in_flight: leave: ret=0
Feb 14 23:03:43 libnbd: debug: nbd2: nbd_aio_in_flight: leave: ret=64
Feb 14 23:03:43 libnbd: debug: nbd1: nbd_aio_in_flight: leave: ret=0
Feb 14 23:03:43 libnbd: debug: nbd2: nbd_aio_in_flight: leave: ret=63
Feb 14 23:03:43 libnbd: debug: nbd1: nbd_aio_in_flight: leave: ret=1
Feb 14 23:03:43 libnbd: debug: nbd2: nbd_aio_in_flight: leave: ret=63
Feb 14 23:03:43 libnbd: debug: nbd1: nbd_aio_in_flight: leave: ret=1
Feb 14 23:03:43 libnbd: debug: nbd2: nbd_aio_in_flight: leave: ret=63
Feb 14 23:03:43 libnbd: debug: nbd1: nbd_aio_in_flight: leave: ret=1
Feb 14 23:03:43 libnbd: debug: nbd2: nbd_aio_in_flight: leave: ret=63
where nbd1 = source handle, nbd2 = destination handle. In the vast
majority of lines examined at random, most requests are waiting on the
destination (write to RHV) side.
Increasing request size should help here, but we'll see.
To truly increase request size (eg. to 8+ MB) we'll need changes in
the VDDK plugin so that it doesn't forward these huge reads to the
VMware server, because hostd will run out of memory. (Or split
read/write request sizes in nbdcopy as discussed).
Let's wait and see what the actual results are before making any
changes.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.