On 8/25/20 2:21 PM, Richard W.M. Jones wrote:
On Tue, Aug 25, 2020 at 10:46:57AM -0500, Eric Blake wrote:
> + const char *nbdkit_string_intern (const char *str);
> +
> +Returns a copy of str, so that the caller may reclaim str and use the
> +copy in its place. If the copy is created outside the scope of a
> +connection (such as during C<.load>), the lifetime of the copy will
> +last at least through C<.unload>; if it was created within a
> +connection (such as during C<.preconnect>, C<.default_export>, or
> +C<.open>), the lifetime will last at least through C<.close>.
Interesting - this isn't exactly how I imagined this would work.
Would it make sense to intern strings for the lifetime of the server
no matter where they are allocated from?
If you allocate 100 bytes for every connection, but no longer reference
the memory after .close, then intern'ing for the lifetime of the server
will add up to a megabyte of usage after 10,000 guests even if most of
those clients are no longer connected; while intern'ing just for the
lifetime of the connection does not. It's actually pretty decent
heuristics (and thus I didn't give the API a knob) - if you are
allocating a string in response to .load, .config, or other callback
outside of a connection context, you probably want that string to last
until .unload; if you are allocating a string in response to
.list_exports, .default_export, .open, or anything else in response to a
given client connection, you probably don't need that string any longer
than the .close of that same client, except maybe for the first client
(but even then, you could use .get_ready to do that sort of global setup
without waiting for the first connection).
Another suggestion would be for the function to take a flag indicating
preferred lifetime, but that presents additional complications for the
caller.
Yeah, I did think about a flag, and decided it wasn't worth it. When
called outside the context of a connection, the API would fail if you
request in-connection lifetime; and when called inside the context of a
connection, requesting an out-of-connection lifetime gets us right back
to that appearance of a memory leak if it is repeated for every
connection (even though we aren't actually leaking anything, and
valgrind is happy, the constant growth of memory usage as more clients
come and go is annoying); and if it is not repeated for every
connection, it should be just as easy to do the initialization during
.get_ready instead of needing a knob.
Also I imagined that this function would be a true (LISP-style)
INTERN, in that it would return a single pointer if called twice with
the two equal strings. But I see from the implementation later on
that this isn't the case.
It isn't now, but the contract doesn't prevent it from doing so in the
future. Callers are only promised that the lifetime is at least as long
as .close, but nothing would stop us from making it longer if that made
our implementation use memory more compactly. But right now, I don't
even bother hashing or searching for duplicates, but always append at
the end. Performance wise, it may mean a bit more memory usage and/or
cleanup time, but so far there were few enough uses that the main
benefit (of a lifetime guaranteed by nbdkit, with automatic cleanup
without bookkeeping in the plugin) seemed to outweigh the need for more
complexity.
> - keys[optind] = strndup (argv[optind], n);
> - if (keys[optind] == NULL) {
> - perror ("strndup");
> + CLEANUP_FREE char *key = strndup (argv[optind], n);
> + const char *safekey = nbdkit_string_intern (key);
> + if (safekey == NULL)
> exit (EXIT_FAILURE);
It seems it might also be nice to have a "strndup" version of the
intern function.
Or even a memdup. Yeah, those are possibilities as well. Doing all
three (strdup, strndup, memdup) up front isn't hard. If API
proliferation is a problem, you can combine strdup and strndup by
accepting -1 as a length to mean stop at NUL termination; but I'm less
worried about API proliferation and more about ease-of-use.
I'm not completely sure which of these behaviours is better. Ideally
back when I started writing nbdkit I would have used a pool allocator,
but that's all water under the bridge now. So none of the above
indicates I have a preference for a particular implementation, just
ideas for further discussion.
The nice part of the API as proposed is that a pool allocator for
intern'd strings is still viable! Well, we have to be careful that a
realloc doesn't invalidate prior return values, but nothing is requiring
us to do a fresh malloc() per intern, rather than just carving out a
slice of a larger pool when room remains and only malloc'ing large
chunks (maybe 64k) when room is not available; where cleanup at the end
of a connection or server lifetime only has to free the chunks instead
of individual intern'd strings within those chunks. In contrast, if we
had instead proposed an API where the user transfers ownership of a
malloc()d string into the server, and the server is now responsible for
calling free(), is much more constrained (we can't change that interface
without breaking existing clients). Thus, avoiding a hard dependence on
malloc() in our ABI is a good thing.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org