On Tue, May 31, 2022 at 11:12:50PM +0300, Nir Soffer wrote:
On Tue, May 31, 2022 at 5:15 PM Eric Blake <eblake(a)redhat.com>
wrote:
>
> The Py_BuildValue "y#" format copies data; this is because Python
> wants to allow our C memory to go out of scope without worrying about
> whether the user's python callback has stashed off a longer-living
> reference to its incoming parameter. But it is inefficient; we can do
> better by utilizing Python's memoryview for a zero-copy exposure to
> the callback's C buffer, as well as a .release method that we can
> utilize just before our C memory goes out of scope. Now, if the user
> stashes away a reference, they will get a clean Python error if they
> try to access the memory after the fact. This IS an API change (code
> previously expecting a stashed copy to be long-lived will break), but
> we never promised Python API stability, and anyone writing a callback
> that saves off data was already risky (neither libnbd nor nbdkit's
> testsuite had such a case). For a demonstration of the new error,
> where the old code succeeded:
>
> $ ./run nbdsh
> nbd> h.connect_command(["nbdkit", "-s", "memory",
"10"])
> nbd> copy=None
> nbd> def f(b,o,s,e):
> ... global copy
> ... copy = b
> ... print(b[0])
> ...
> nbd> print(copy)
> None
> nbd> h.pread_structured(10,0,f)
> 0
> bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
> nbd> copy
> <released memory at 0x7ff97410de40>
> nbd> copy[0]
> Traceback (most recent call last):
> File "/usr/lib64/python3.10/code.py", line 90, in runcode
> exec(code, self.locals)
> File "<console>", line 1, in <module>
> ValueError: operation forbidden on released memoryview object
>
> To demonstrate the speedup, I tested:
>
> $ export script='
> def f(b,o,s,e):
> pass
> m=1024*1024
> size=h.get_size()
> for i in range(size // m):
> buf = h.pread_structured(m, m*i, f)
> '
> $ time ./run nbdkit -U - memory 10G --run 'nbdsh -u "$uri" -c
"$script"'
>
> On my machine, this took 9.1s pre-patch, and 3.0s post-patch, for an
> approximate 65% speedup.
Looks like ~300% speedup to me. I guess 3 times faster is the most clear way
to describe the change.
I guess it depends whether you compare the delta in execution speed to
the old or the new speed. But either way, it is a nice speedup! (I'll
probably re-write to use the ~3x wording, as that sounds more
impressive)
> The corresponding diff to the generated code is:
>
> | --- python/methods.c.bak 2022-05-31 07:57:25.256293091 -0500
> | +++ python/methods.c 2022-05-31 08:14:09.570567858 -0500
> | @@ -73,8 +73,12 @@ chunk_wrapper (void *user_data, const vo
> |
> | PyGILState_STATE py_save = PyGILState_UNLOCKED;
> | PyObject *py_args, *py_ret;
> | + PyObject *py_subbuf = NULL;
> | PyObject *py_error = NULL;
> |
> | + /* Cast away const */
I think it will be more clear and helpful as:
/* Casting subbuf to char* is safe since we use PyBUF_READ. */
> | + py_subbuf = PyMemoryView_FromMemory ((char *) subbuf, count, PyBUF_READ);
Yes, good idea.
Maybe py_view?
And also change the doscring to mention "view" instead of "subbuf".
That's determined by the public API parameter name. The name is not
an API change, per-se, but changing it (in API.ml) would have knock-on
effects to other language bindings. And 'view' is potentially
misleading, since while it has always been a view in C, and this patch
is getting us closer to it being a view in Python (you need patch 3/2
to make it an actual view of the final buffer for h.pread_structured,
and an as-yet unfinished 4/2 before it is an actual view of the buffer
passed to h.aio_pread_structured), we do not guarantee that all other
language bindings are able to avoid the memcpy.
> | out:
> | + if (py_subbuf) {
> | + PyObject *tmp = PyObject_CallMethod(py_subbuf, "release", NULL);
What it the user does:
def f(b,o,s,e):
use b...
b.release()
Or:
def f(b,o,s,e):
with b:
use b...
We would release a released memoryview here.
I don't think this is likely to happen though.
Even if it does happen, it's harmless.
https://docs.python.org/3/library/stdtypes.html#memoryview.release
says "After this method has been called, any further operation on the
view raises a ValueError (except release() itself which can be called
multiple times):".
I'm not sure about this change - the performance improvement is great,
but the API change is surprising.
Since pread_structured() returns a buffer with the read data, can
we arrange that the memoryview argument to the chunk callback
is pointing into the returned buffer, so the view does not have to be
released by the C extension?
Yes, in patch 3/2 for h.pread_structured; and I'm still working on it
for h.aio_pread_structured.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org