On Mon, Mar 12, 2018 at 12:32 PM Richard W.M. Jones <rjones@redhat.com> wrote:
On Mon, Mar 12, 2018 at 07:13:52AM +0000, Nir Soffer wrote:
> On Fri, Mar 9, 2018 at 4:25 PM Richard W.M. Jones <rjones@redhat.com> wrote:
>
> > It has to be said it would be really convenient to have a 'zero'
> > and/or 'trim' method of some sort.
> >
>
> 'trim' means discard?

Yes.  The 5 functions we could support are:

* pread  - done
* pwrite - done
* flush  - does fdatasync(2) on the block device

Currently we do fsync() on every PUT request, so flush is not very
useful.
 
* zero   - write a range of zeroes without having to send zeroes
* trim   - punch hole, can be emulated using zero if not possile

Makes sense.
 
Also (not implemented in nbdkit today, but coming soon), pwrite, zero
and trim can be extended with a FUA (force unit access) flag, which
would mean that the range should be persisted to disk before
returning.  It can be emulated by calling flush after the operation. 
It wasn't clear if anything in this process flushes the content to
disk.  Is that what transfer.finalize does?

All PUT requests fsync() before returning. We optimize for complete image
trasfer, not for random io.

> Currently we cannot support discard on block storage since ovirt may
> need to wipe lvs when deleting a disk, and discarding may leave
> unwiped user data. This may change in 4.3 if we switch to wipe on
> creation instead of wipe after delete.

I think this depends on BLKDISCARDZEROES[1] for the block device?  Of
course if you're worried about data remanence for someone who has
access to the physical device then that wouldn't be enough.

BLKDISCARDZEROES was never reliable and it was removed from the
kernel recently. Please check https://patchwork.kernel.org/patch/9903757/
We are not relying on it since ovirt 4.2, hopefully also in 4.1.
 
[1] https://rwmj.wordpress.com/2014/03/11/blkdiscard-blkzeroout-blkdiscardzeroes-blksecdiscard/

> POST /images/ticket-id ...
> ...
> {
>     "op": "zero",
>     "offset": X,
>     "size": Y
> }
>
> I would like to support only aligned offset and size - do you think it
> should work
> for qemu-img?

It depends a bit on what you mean by "aligned" and what the alignment
is.  We'd probably have to work around it in the plugin so that it can
round in the request, issues a zero operation for the aligned part,
and writes zeroes at each end.  There's no guarantee that qemu-img
will be well-behaved in the future even if it is now.

Aligned for direct I/O (we use direct I/O for read/write). We can support
non-aligned ranges by doing similar emulation in the server, but I prefer to do
it only if we have such requirement. If you need to do this in the client, we
probably need to do this in the server otherwise all clients may need to 
emulate this.

I think there is no reason that qemu-img will zero unaligned ranges, but
I guess Eric can have a better answer.
 
Anyway this sounds do-able, is it something we can get for 4.2?

I think it is for 4.2.z.

Is zero support in the daemon and proxy enough, or we need the other
options now?
 
How will we detect that the server supports it?

Because we don't support OPTIONS yet, the only way is to send a POST
request and checking for 405 response.

Nir