On 6/25/19 4:03 AM, Richard W.M. Jones wrote:
On Wed, Jun 19, 2019 at 01:18:01PM -0500, Eric Blake wrote:
> Oddly enough, I am not getting any measurable performance difference
> with this patch applied and using examples/threaded-reads-and-writes
> coupled with nbdkit. My explanation is that in the common case, once
> a server has something to send, it is going to send the entire reply
> as fast as it can, rather than sending a partial reply and waiting;
> and our attempts to keep up to 64 commands in flight mean that our
> real bottleneck is not sending our next command (even if we have to
> wait for the server's reply to finish), but at the server (if we are
> saturating the server's amount of in-flight requests, the server won't
> read our next request until it has finished its current reply).
> Still, I'm sure that it is possible to construct corner cases where
> this shows more of an effect.
What I'm going to say is I think stating the obvious, but my intuition
is there are going to be two types of load. In the first and easiest
case you want to read or write sequentially over the whole or a large
section of the image (think: qemu-img convert). In this case you know
well in advance what parts of the image you want to read/write and can
keep the in flight queue full at all times. This is what
threaded-reads-and-writes actually tests.
The harder case is random access (think: qemu with a database guest).
There is a short queue, for example because of data dependencies
between the requests, so the issued commands buffer is always short.
Also there may be server latency because of the random access pattern.
This kind of workload should show the benefit of this commit, but we
don't really have a test for this kind of workload.
Yes, that's a fair assessment.
> /* STATE MACHINE */ {
> REPLY.START:
> + /* If rlen is non-zero, we are resuming an earlier reply cycle. */
> + if (h->rlen > 0) {
> + if (h->reply_cmd) {
> + assert (nbd_internal_is_state_processing (h->reply_cmd->state));
> + SET_NEXT_STATE (h->reply_cmd->state);
This is essentially the "stack of states" idea that I had early on,
but with a maximum stack depth of 1.
I originally rejected the idea then because it means that the
generator would not longer have a complete view of all state
transitions in the state machine, and thus wouldn't be able to enforce
invariants such as not jumping to a state in the middle of a state
group. (In fact with this patch that is now the case -- I thought the
generator would give an error about this, but maybe my test is wrong).
Nevertheless given that we need to do this, maybe it's better to drop
the idea that the generator needs to have a complete view.
In my original plan we would have had "push state" and "pop state"
operations (pop state is a general jump to "any state", which is why
it breaks the assumption in the generator).
In addition to only stacking a queue of at most 1 state, this also
limits the stacking to occur during the REPLY sub-state. But I was also
surprised that the generator did not balk at me setting the state to the
contents of a variable rather than to the usual situation of setting it
to an all-caps %FOO magic identifier.
> +++ b/lib/internal.h
> @@ -253,6 +253,7 @@ struct command_in_flight {
> uint32_t count;
> void *data; /* Buffer for read/write */
> struct command_cb cb;
> + enum state state; /* State to resume with on next POLLIN */
> bool data_seen; /* For read, true if at least one data chunk seen */
> uint32_t error; /* Local errno value */
> };
The patch seems reasonable. Does this obviate any need to split the
state machine?
Yes, as far as I can see, this is all the more we ever have to worry
about. It is possible to determine the entirety of wstate (either we're
blocked under the ISSUE_COMMAND sub-state, or h->request and
h->in_write_payload are sufficient to see what remains on a current
request), and the entirety of rstate (either we're under the REPLY
sub-state, or h->reply_cmd coupled with h->rlen and h->reply_cmd->state
determine what remains on a current reply).
ACK
Okay, I'll go ahead and push this one today.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org