On Fri, Aug 7, 2020 at 5:07 PM Richard W.M. Jones <rjones(a)redhat.com> wrote:
On Fri, Aug 07, 2020 at 04:43:12PM +0300, Nir Soffer wrote:
> On Fri, Aug 7, 2020, 16:16 Richard W.M. Jones <rjones(a)redhat.com> wrote:
> > I'm not sure if or even how we could ever do a robust O_DIRECT
> >
>
> We can let the plugin an filter deal with that. The simplest solution is to
> drop it on the user and require aligned requests.
I mean this is very error prone. It requires the end user to know
about the basically unknowable restrictions of O_DIRECT and isn't even
possible in one common case - if the size of the file isn't an exact
multiple of the filesystem block size.
Yes, doing direct I/O is hard, even qemu still has bugs in this area that pop
from time to time.
It is fine to fail open if the size of the imgae is not aligned to
underlying block size.
However finding the underlying block size can of worms :-)
> Maybe a filter can handle alignment?
>
> > implementation, but my idea was that it might be an alternate
> > implementation of cache=none. But if we thought we might use O_DIRECT
> > as a separate mode, then maybe we should rename cache=none.
> > cache=advise? cache=dontneed? I can't think of a good name!
> >
>
> Yes, don't call it none if you use the cache.
>
> How about advise=?
>
> I would keep cache semantics similar to qemu.
qemu uses cache=none as a synonym for O_DIRECT, but AFAIK it has
nothing that tries to use posix_fadvise(DONTNEED) with or without
Linus's double buffering technique.
Yes, this is the right way. posix_fadvise is not a replacement for O_DIRECT.
qemu does use
posix_fadvise(DONTNEED) in one place but AFAICT it is only used for
live migration.
...
> We already tried this with dd and the results were not good.
These ones?
https://www.redhat.com/archives/libguestfs/2020-August/msg00078.html
No, we had a bug when copying image from glance caused sanlock timeouts
because of the unpredictable page cache flushes.
We tried to use fadvice but it did not help. The only way to avoid such issues
is with O_SYNC or O_DIRECT. O_SYNC is much slower but this is the path
we took for now in this flow.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html