On Fri, Feb 24, 2023 at 06:41:49AM +0100, Laszlo Ersek wrote:
On 2/23/23 21:40, Eric Blake wrote:
> libnbd's copy/copy-nbd-error.sh was triggering an assertion failure in
> nbdkit:
>
> $ nbdcopy -- [ nbdkit --exit-with-parent -v --filter=error pattern 5M
error-pread-rate=0.5 ] null:
> ...
> nbdkit: pattern.2: debug: error-inject: pread count=262144 offset=4718592
> nbdkit: pattern.2: debug: pattern: pread count=262144 offset=4718592
> nbdkit: pattern.1: debug: error-inject: pread count=262144 offset=4456448
> nbdkit: pattern.1: error: injecting EIO error into pread
> nbdkit: pattern.1: debug: sending error reply: Input/output error
> nbdkit: pattern.3: debug: error-inject: pread count=262144 offset=4980736
> nbdkit: pattern.3: debug: pattern: pread count=262144 offset=4980736
> nbdkit: pattern.4: error: write data: NBD_CMD_READ: Broken pipe
> nbdkit: pattern.4: debug: exiting worker thread pattern.4
> nbdkit: connections.c:402: raw_send_socket: Assertion `sock >= 0' failed.
>
> When there are multiple queued requests, and the client hangs up
> abruptly as soon as the first error response is seen (rather than
> waiting until all other queued responses are flushed then sending
> NBD_CMD_DISC), another thread that has a queued response ready to send
> sees EPIPE (the client is no longer reading) and closes the socket,
> then the third thread triggers the assertion failure because it not
> expecting to attempt a send to a closed socket. Thus, my claim that
> sock would always be non-negative after collecting data from the
> plugin was wrong. The fix is to gracefully handle an abrupt client
> disconnect, by not attempting to send on an already-closed socket.
>
> For the nbdcopy example, --exit-with-parent means it's just an
> annoyance (nbdcopy has already exited, and no further client will be
> connecting); but for a longer-running nbdkit server accepting parallel
> clients, it means any one client can trigger the SIGABRT by
> intentionally queueing multiple NBD_CMD_READ then disconnecting early,
> and thereby kill nbdkit and starve other clients. Whether it rises to
> the level of CVE depends on whether you consider one client being able
> to starve others a privilege escalation (if you are not using TLS,
> there are other ways for a bad client to starve peers; if you are
> using TLS, then the starved client has the same credentials as the
> client that caused the SIGABRT so there is no privilege boundary
> escalation).
>
> Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=2173054
> Fixes: daef505e ("server: Give client EOF when we are done writing",
v1.32.4)
> ---
> server/connections.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/server/connections.c b/server/connections.c
> index 4d776f2a..1b63eb73 100644
> --- a/server/connections.c
> +++ b/server/connections.c
> @@ -1,5 +1,5 @@
> /* nbdkit
> - * Copyright (C) 2013-2022 Red Hat Inc.
> + * Copyright (C) 2013-2023 Red Hat Inc.
> *
> * Redistribution and use in source and binary forms, with or without
> * modification, are permitted provided that the following conditions are
> @@ -400,7 +400,10 @@ raw_send_socket (const void *vbuf, size_t len, int flags)
> ssize_t r;
> int f = 0;
>
> - assert (sock >= 0);
> + if (sock < 0) {
> + errno = EBADF;
> + return -1;
> + }
> #ifdef MSG_MORE
> if (flags & SEND_MORE)
> f |= MSG_MORE;
Is it possible that, after the second thread sees EPIPE and closes the
socket, a new client connects, the original file descriptor is
(effectively) repurposed, and the third thread now attempts to send on a
valid, but *wrong* socket?
Good question.
raw_send_socket() is never invoked directly, but rather used by the
function pointer conn->send() (that's how we switch between
raw_send_socket() and crypto_send() when enabling TLS). More
importantly, note how the function starts:
static int
raw_send_socket (const void *vbuf, size_t len, int flags)
{
GET_CONN;
int sock = conn->sockout;
and looking at the definition of GET_CONN, conn is a non-NULL pointer
in thread-local storage. You are indeed correct that once we close
our socket, another client can be connecting in parallel in a
different thread and open its own fd on the number we had previously
been storing in conn->sockout. So now we have to check if all
assignments/usages of conn->sockout are thread-safe.
conn->sockout is directly referenced during
protocol-handshake-newstyle.c calling crypto_negotiate_tls() - but at
that point, we know the socket is still open. All other references to
conn->sockout are in connections.c.
new_connection() assigns it as part of creating the socket - at that
point, conn is thread-local, and we haven't utilized the fd, so it is
not invalid.
raw_close() (reached by conn->close()) assigns -1 to conn->sockout
after closing it on SHUT_WR, but leaves the variable unchanged after
closing it on SHUT_RDWR. So next we check the two callers:
conn->close(SHUT_RDWR) is only called from free_connection(), which is
only reached after handle_single_connection() has joined all worker
threads for conn, so there is no other thread that can attempt to read
the value that we left stale. Meanwhile, conn->close(SHUT_WR) is only
called from connection_set_status(), which is guarded by the mutex
conn->status_lock if more than one thread can be using the connection.
We'll use that knowledge below.
The only other access of conn->sockout is when raw_send_socket() reads
its value to learn where to attempt to send, and here, we have to
widen our search to observe that conn->send() is reached in multiple
places; however, anything related to handshaking is done while conn is
still single-threaded (handle_single_connection() calls
protocol_handshake() before spawning worker threads), so the only time
we can have more than one thread using conn->send() simultaneously is
during protocol_recv_request_send_reply(). All such uses of
conn->send() from that function are guarded by the mutex
conn->write_lock.
But the question you asked is whether there can ever be a race where:
thread 1 thread 2 thread 3
protocol_recv_request_send_reply() protocol_recv_request_send_reply()
- read from socket
- call into plugin
- connection_get_status():
- grab conn->status_lock
- observe live socket (aka conn->sockout still valid)
- release status_lock
.. - grab conn->read_lock
.. - conn->recv() detects failure
.. - call connection_set_status(STATUS_DEAD)
.. - grab conn->status_lock
.. - call conn->close(SHUT_WR)
.. - call closesocket() > window where
another
- call send_*() to send reply .. > client can
connect,
- grab conn->write_lock .. > thereby
claiming the
- call conn->send() .. > fd still
present in
- sock=conn->sockout .. > thread
1's conn->sockout
.. - assign conn->sockout=-1
- write to sock
Eww. It looks like your hunch was right. I don't see adequate
protection to prevent thread 1 from attempting to use a stale
conn->sockout value, and it IS possible that another client connecting
at the same time can have re-allocated that fd.
I don't know if "multiple clients for a single nbdkit server" is a
thing, but I believe that at least "multiconn" could lead to the same
phenomenon.
multi-conn is precisely multiple clients for a single nbdkit server!
But yes, it is also possible for a single server to serve multiple
non-related clients even without multiconn (perhaps even different
data to parallel clients, such as the file plugin in dir mode where
the export name requested by the client determines which file is being
served over the connection).
If this problem exists, then I guess the socket abstraction (?) would
have to be extended with a refcount, under the protection of a mutex.
Whenever sending fails (and note that EPIPE is a persistent error, so
once one thread sees EPIPE, the others will keep seeing that too!), the
refcount is decreased, and the one thread that reaches refcount 0 closes
the socket.
This is just brainstorming, could be totally disconnected from nbdkit's
current behavior.
No, it is spot on. And it means I need to do a v2 of this patch,
which properly protects conn->sockout from stale reuse by a second
thread. I don't know if ref-counting is easiest, or just ensuring
that the assignment to conn->sockout always has a happens-before
relationship to another thread utilizing conn->sockout.
conn->read_lock and conn->write_lock are meant to be held for
potentially long periods of time (the duration of time for which one
direction of the socket is in exclusive use by one half of a message,
which may require several network packets). But conn->status_lock is
intended to be held as short as possible. Right now, it only protects
reads and writes of conn->status, but extending it to also cover reads
and writes of conn->sockin/sockout may be sufficient. (It may be
possible to use atomics instead of a mutex, except that I'm not sure
what code base to copy from - nbdkit's license means I can't copy from
Paolo's nice atomics framework in qemu).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization:
qemu.org |
libvirt.org