On Thu, Mar 22, 2018 at 5:06 PM Richard W.M. Jones <rjones@redhat.com> wrote:
> > +
> > +    r = http.getresponse()
> > +    if r.status != 200:
> > +        transfer_service.pause()
> > +        h['failed'] = True
> > +        raise RuntimeError("could not write sector (%d, %d): %d: %s" %
> > +                           (offset, count, r.status, r.reason))
>
> +
> > +def zero(h, count, offset, may_trim):
> >
>
> What is may_trim? trim can never replace zero.

As Eric said.  We're ignoring this parameter which seems like the
right thing to do unless we can guarantee that trimmed data is read
back as zeroes.

I don't think we will be able to provide this guarantee for trim, only best
effort trim that may also do nothing on some images.

> > +# qemu-img convert starts by trying to zero/trim the whole device.
> > +# Since we've just created a new disk it's safe to ignore these
> > +# requests as long as they are smaller than the highest write seen.
> > +# After that we must emulate them with writes.
> >
>
> Isn't this needed also for the real zero? In the first version it will not
> be very efficient. We don't want to zero the entire image.

Can't the server use ioctl(BLKZEROOUT) to implement fast zeroing?

It will but I cannot promise it for 4.2.3. My goal is complete API with 
minimal implementation, and improving later as needed.

> If qemu-img wants to trim, better call trim - it will be very fast because
> it will
> do nothing in the first version :-)

We don't control what qemu-img does now or in the future.  Also
different output drivers and formats could behave differently.

Sure, but if qemu wants to trim, why use zero?

Since BLKDISCARD may or may not cause future reads to return zeros,
I assume that qemu-img is not using trim as a replacement for zeroing
data.
 
> > +def flush(h):
> > +    http = h['http']
> > +    transfer=h['transfer']
> > +    transfer_service=h['transfer_service']
> > +
> > +    # Construct the JSON request for flushing.  Note the offset
> > +    # and count must be present but are ignored by imageio.
> > +    buf = json.dumps({'op': "flush",
> > +                      'offset': 0,
> > +                      'size': 0,
> > +                      'flush': True})
> >
>
> flush does not accept any arguments - it will not fail, but better not have
> code that has no effect.

I didn't understand this.  Do you mean ‘'flush'’ key doesn't have a
parameter?  The documentation seems to indicate that what I'm doing
above is correct so maybe the docs are wrong or unclear.

I mean the flush operation expects {"op": "flush"}. The docs are not clear
enough.
 
> > +def close(h):
> > +    http = h['http']
> > +    connection = h['connection']
> > +
> > +    http.close()
> >
>
> If qemu-img does not flush at the end, we can ensure flush here.
> Maybe keep h["dirty"] flag, setting to True after very write/zero/trim,
> and setting the False after flush?

This is more of a philosophical question.  The nbdkit plugin just
follows flush requests issued by the client (ie. qemu-img).  qemu-img
probably won't issue its own flush request.  Should it issue a flush
or is that the job of oVirt?

oVirt cannot flush since it does not have open/close semantics.

The client also need to know if flush was successful, so we cannot flush
automatically when deleting a ticket.  If you choose to optimize by
avoid flushing in PUT/PATCH, you must flush explicitly.

Eric wrote this in
https://www.redhat.com/archives/libguestfs/2018-March/msg00151.html:
+Remember that most of the feature check functions return merely a
+boolean success value, while C<.can_fua> has three success values.
+The difference between values may affect choices made in the filter:
+when splitting a write request that requested FUA from the client, if
+C<next_ops-E<gt>can_fua> returns C<NBDKIT_FUA_NATIVE>, then the filter
+should pass the FUA flag on to each sub-request; while if it is known
+that FUA is emulated by a flush because of a return of
+C<NBDKIT_FUA_EMULATE>, it is more efficient to only flush once after
+all sub-requests have completed (often by passing C<NBDKIT_FLAG_FUA>
+on to only the final sub-request, or by dropping the flag and ending
+with a direct call to C<next_ops-E<gt>flush>).
If qemu-img does not flush on the last request you must flush, and close()
seems to be the best place, assuming that qemu-img checks close() return value
and will fail the convert it flush fails.

Thanks for the issues you raised here, I'll update the docs to make them
clear.

Nir