On 7/21/20 7:46 AM, Richard W.M. Jones wrote:
Hi Eric, Nir.
$SUBJECT - it's complicated! Imagine a scenario where we have
extended the file plugin so you can point it at a directory and it
will use the client-supplied export name to select a file to serve.
Also we have extended the tar filter so that you can map tar file
components into export names. You may be able to see where this is
going ...
Now we point the file plugin at a directory of tar files, and layer
the tar filter on top, and then what happens? The client can only
supply a single export name, so how can it specify component X of tar
file Y?
tar filter ---> file plugin
tar-exportname=on exportname=on
directory=/my-files ---> directory of tar files
It gets worse! At the moment filters and plugins can read the export
name sent by the client by calling the global function
nbdkit_export_name(). I'm not so worried about filters, but plugins
must be able to continue to use this global function. In the scenario
above both the tar filter and the file plugin would see the same
string (if we did nothing), and it would only make sense to one or the
other but never to both.
Listing exports is also complicated: Should we list only the nearest
filter's exports? Or the plugin's exports? Cartesian product of both
somehow?
Definitely something for filters to decide, and not the core engine.
But it does sound like we can come up with quite a bit of flexibility,
and the idea of a Cartesian product (a single export name can tell both
the tar filter which component to extract and the file plugin which tar
file to serve) is indeed tempting.
That's the background, but how do we solve it?
----------------------------------------------
I'd like to say first of all that I believe export names are already a
dumpster fire of potential insecurity, and that we shouldn't have the
core server attempt to parse them, so solutions involving splitting
the export names into components (like tar-component/file-name) should
not be considered because they are too risky, and run into arbitrary
limits (what escape character to use? what is the max length?) Also
changing the NBD protocol is not under consideration.
Not directly, but maybe it is worth adding optional support to the NBD
protocol to make it possible for NBD_OPT_LIST to support a hierarchical
return. That is, pointing the file plugin to serve directory=base
containing the following layout:
base
+ f1
+ dir1
+ f2
+ f3
+ dir2
+ f4
the current semantics of NBD_OPT_LIST only lend themselves to returning
4 NBD_REP_SERVER replies in a row: "f1", "dir1/f2",
"dir1/f3",
"dir2/f4". But what if the protocol added NBD_OPT_LIST_HIER which took
a starting point argument, and new replies NBD_REP_SERVER_DIR and
NBD_REP_ERR_NOTDIR, such that we could then have:
NBD_OPT_LIST_HIER("") => SERVER("f1"),
SERVER_DIR("dir1/"),
SERVER_DIR("dir2/")
NBD_OPT_LIST_HIER("dir1/") => SERVER("dir1/f2"),
SERVER("dir1/f3")
NBD_OPT_LIST_HIER("dir2/") => SERVER("dir2/f4")
NBD_OPT_LIST_HIER("f1") => NBD_REP_ERR_NOTDIR
NBD_OPT_LIST_HIER("nosuch") => NBD_REP_ERR_UNKNOWN
True, this hierarchy is only available to new client/server pairs (if
either side lacks the new code, you're back to flat listings), but it is
food for thought. I'll propose something on the upstream NBD list if we
think it has potential.
I think we need to consider these aspects separately:
(a) How filters and plugins get the client exportname.
(b) How filters and plugins respond when the client issues NBD_OPT_LIST.
(c) How the server, filters and plugins respond to NBD_OPT_INFO.
I think (c) is already handled - they respond the same as they do to
NBD_OPT_GO, but rather than starting to serve traffic, they go back to
waiting for the next NBD_OPT_*.
(a) Client exportname
---------------------
The client sends the export name of the export it wants to access with
NBD_OPT_EXPORT_NAME or the newer NBD_OPT_GO. This is an opaque string
(not a filename, pathname etc). Currently filters and plugins can
fetch this using nbdkit_export_name(). However it cannot be modified
by filters.
My proposal is that we extend the .open() callback with an extra
export_name parameter:
void *filter_open (int readonly, const char *export_name);
For plugins I have already proposed this for inclusion in API V3. We
already do this in nbdkit-sh-plugin. For backwards compatibility with
existing plugins we'd make nbdkit_export_name() return the export name
passed down by the last filter in the chain, and deprecate this
function in V3.
Yeah, for plugins, we can't do anything easy without moving to a v3
protocol; but making the existing nbdkit_export_name() smarter about
grabbing the name from the last filter rather than directly from the
client makes sense for v2 filters.
For filters the next_open field would take the export_name and this
would allow filters to modify the export name.
For filters, we have total liberty to change our .open signature right
now, even without introducing plugin v3.
nbdkit-tar-filter would take an explicit tar-export-name=<new>
parameter to pass a different export name down to the underlying
plugin. If the tar filter was implementing its own export name
processing then this parameter would either be required or would
default to "". (If the tar filter was not implementing export name
functionality it would pass it down unchanged).
Makes sense.
(b) Listing exports with NBD_OPT_LIST
-------------------------------------
I believe we need a new .list_exports() callback for filters and
plugins to handle this case. I'm not completely clear on the API, but
it could be something as simple as:
const char **list_exports (void);
returning a NULL-terminated list of strings.
But what is the lifetime on those strings? Can the list change over
time (that is, if I'm running 'nbdkit file dir=base', does it rerun
opendir()/stat()/closedir() to populate a fresh list for each client, or
does it only populate the list once, regardless of later underlying
changes in the directory hierarchy it is wrapping)? Returning const
char ** means the plugin has to manage the memory (the server will not
touch anything), so whether the plugin computes the list up-front
(compute the list during .load, free it during .unload) or per-client
(compute the list during .preconnect or .list_exports, free during
.close) should be reasonable.
Thus, I'm assuming the sequence would be:
client connects
.preconnect
if client calls NBD_OPT_LIST
.list_exports
if client calls NBD_OPT_INFO or NBD_OPT_GO
.open
.get_size
.can_FOO...
client disconnects
.close
that is, .preconnect always corresponds to the initial client connection
before NBD_OPT_ handling, and .open corresponds to the point when the
client requests a specific export, but .close can gracefully clean up
whatever per-client results it generated for any client that actually
asked for a listing. It also gives us the flexibility that a single
client can both obtain a listing and choose which export to then
request, rather than having to do two separate connections (one for the
listing, the second specifying a specific export).
I was wondering if we should instead have the core server handle memory,
similar to how we handle .exports (that is, the server allocates a
container, passes it to the plugin, and the plugin calls
nbdkit_export_list_add("...") as many times as it wants, where the
server now frees the list instead of throwing the burden on the plugin).
Offhand, I'm thinking either approach could be made to work, so it
becomes a decision on which is easier to maintain (both from the
perspective of the core server code, and for an arbitrary plugin writer).
Is it conceivable we will need additional fields in future?
For now, only one field - a description string. That's all the more
that NBD_REP_SERVER supports, but 'qemu-nbd -D' is an example of setting it.
If my idea of adding NBD_OPT_INFO_HIER and NBD_REP_SERVER_DIR goes
anywhere, that may be an opportunity to pass other information as well.
Plugins which do not implement this would be assumed to return a
single export "" (or perhaps the -e parameter??)
Both sound fine. In turn, what happens if -e is in use when a plugin
does have .list_exports? Do we just ignore -e in that case?
Filters could ignore or replace exports by optionally implementing
this callback. I wouldn't recommend that filters modify export names
however.
(c) Handling NBD_OPT_INFO
-------------------------
I believe our current implementation of this option (which is
definitely not being robustly tested!) is something like this:
- NBD_OPT_INFO received from client (containing the export name)
- open a backend handle
- query backend for flags
- reply to client
- close backend handle
- process the next option
Assuming this is actually correct, probably we can make sure that it
passes the export name to the backend_open, and everything should just
work. In fact this would happen as a side effect of fixing (a) above
so I don't believe there's anything to do here except to have some
test coverage.
Yep, which is why I stated above that I think we already do (c).
----------------------------------------------------------------------
Thoughts welcomed as always,
Rich.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org