[libnbd PATCH] API: Add nbd_set_opt_mode to expose NEGOTIATING state
by Eric Blake
This is the bare minimum needed to allow the user to take control over
the rest of option negotiating. This patch adds several new API:
nbd_set_opt_mode() - called during Created to enable the new mode
nbd_get_opt_mode() - query whether opt mode is enabled
nbd_opt_go() - used in Negotiating state to attempt to use export
nbd_opt_abort() - used in Negotiating state to skip Connected state
nbd_aio_is_negotiating() - used to detect Negotiating state
Older clients that do not use nbd_set_opt_mode see no difference: the
new state is never reached, all configuration (export name, requesting
meta contexts) has to be done in Created state, and the state machine
moves to READY on success. But newer clients that opt in gain a new
state Negotiating, reached after TLS and structured replies have been
negotiated, and prior to attempting NBD_OPT_GO. In fact, this state
can be reached multiple times, as a failed NBD_OPT_GO can now be
recovered from (such as attempting to set a different export name).
The list-exports example is updated to use the new API, in order to
demonstrate that it is now possible to do everything with just one nbd
handle.
The idea is that future patches will rework the NBD_OPT_LIST code to
instead be under the control of the user that enables opt mode,
probably with a callback function rather than malloc'ing the list in
the handle. In fact, by making TLS negotiating (when tls=1) optional,
libnbd can be used to test 'nbdkit --tls=on' as both a plaintext and
encrypted client, under user control. Similarly, NBD_OPT_INFO support
should be trivial to add (reuse most of the NBD_OPT_GO state machine).
We may also want aio counterparts rather than always synchronously
waiting for the state machine to get back to negotiating or connected
states.
---
lib/internal.h | 4 +
generator/API.ml | 93 +++++++++++++++++--
generator/API.mli | 1 +
generator/C.ml | 4 +-
generator/state_machine.ml | 26 ++++++
generator/states-magic.c | 6 +-
generator/states-newstyle-opt-go.c | 4 +
.../states-newstyle-opt-structured-reply.c | 7 +-
generator/states-newstyle.c | 47 ++++++++--
lib/connect.c | 5 +-
lib/is-state.c | 14 ++-
lib/Makefile.am | 3 +-
examples/list-exports.c | 64 +++++++++----
13 files changed, 235 insertions(+), 43 deletions(-)
diff --git a/lib/internal.h b/lib/internal.h
index 4d0c4e1..8e1fe79 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -98,6 +98,9 @@ struct nbd_handle {
int uri_allow_tls;
bool uri_allow_local_file;
+ /* Option negotiation mode. */
+ bool opt_mode;
+
/* List exports mode. */
bool list_exports;
size_t nr_exports;
@@ -398,6 +401,7 @@ extern int nbd_internal_set_block_size (struct nbd_handle *h, uint32_t min,
/* is-state.c */
extern bool nbd_internal_is_state_created (enum state state);
extern bool nbd_internal_is_state_connecting (enum state state);
+extern bool nbd_internal_is_state_negotiating (enum state state);
extern bool nbd_internal_is_state_ready (enum state state);
extern bool nbd_internal_is_state_processing (enum state state);
extern bool nbd_internal_is_state_dead (enum state state);
diff --git a/generator/API.ml b/generator/API.ml
index 82fdf75..ff7f4c1 100644
--- a/generator/API.ml
+++ b/generator/API.ml
@@ -90,6 +90,7 @@ and flags = {
and permitted_state =
| Created
| Connecting
+| Negotiating
| Connected
| Closed | Dead
and link =
@@ -268,7 +269,7 @@ C<\"nbd2\">, etc.";
"set_export_name", {
default_call with
args = [ String "export_name" ]; ret = RErr;
- permitted_states = [ Created ];
+ permitted_states = [ Created; Negotiating ];
shortdesc = "set the export name";
longdesc = "\
For servers which require an export name or can serve different
@@ -361,7 +362,7 @@ on a particular connection use L<nbd_get_tls_negotiated(3)> instead.";
"get_tls_negotiated", {
default_call with
args = []; ret = RBool;
- permitted_states = [ Connected; Closed ];
+ permitted_states = [ Negotiating; Connected; Closed ];
shortdesc = "find out if TLS was negotiated on a connection";
longdesc = "\
After connecting you may call this to find out if the
@@ -525,7 +526,7 @@ C<nbd_get_structured_replies_negotiated> instead.";
"get_structured_replies_negotiated", {
default_call with
args = []; ret = RBool;
- permitted_states = [ Connected; Closed ];
+ permitted_states = [ Negotiating; Connected; Closed ];
shortdesc = "see if structured replies are in use";
longdesc = "\
After connecting you may call this to find out if the connection is
@@ -596,6 +597,65 @@ the time of compilation.";
Link "aio_is_created"; Link "aio_is_ready"];
};
+ "set_opt_mode", {
+ default_call with
+ args = [Bool "enable"]; ret = RErr;
+ permitted_states = [ Created ];
+ shortdesc = "set whether to allow user control over option negotiation";
+ longdesc = "\
+Set this flag to true in order to enter option negotiating mode with a
+newstyle server.
+
+In this mode, you have fine-grained control over which options are
+negotiated, compared to the default of the server negotiating everything
+on your behalf using settings made before starting the connection. To
+leave the mode and proceed on to the ready state, you must use
+C<nbd_opt_go> successfully. You may also use C<nbd_opt_abort> to end
+the connection without finishing negotiation.";
+ example = Some "examples/list-exports.c";
+ see_also = [Link "get_opt_mode"; Link "opt_go"; Link "opt_abort";
+ Link "get_list_exports";
+ Link "get_nr_list_exports"; Link "get_list_export_name"];
+ };
+
+ "get_opt_mode", {
+ default_call with
+ args = []; ret = RBool;
+ may_set_error = false;
+ shortdesc = "return whether option mode was enabled";
+ longdesc = "\
+Return true if option negotiation mode was enabled on this handle.";
+ see_also = [Link "set_opt_mode"];
+ };
+
+ "opt_go", {
+ default_call with
+ args = []; ret = RErr;
+ permitted_states = [ Negotiating ];
+ shortdesc = "end negotiation and move on to using an export";
+ longdesc = "\
+Request that the server finish negotiation and move on to serving the
+export previously specified by C<nbd_set_export_name>.
+
+If this fails, the server may still be in negotiation, where it is
+possible to attempt another option such as a different export name;
+although older servers will instead have killed the connection.";
+ example = Some "examples/list-exports.c";
+ see_also = [Link "set_opt_mode"; Link "opt_abort"];
+ };
+
+ "opt_abort", {
+ default_call with
+ args = []; ret = RErr;
+ permitted_states = [ Negotiating ];
+ shortdesc = "end negotiation and close the connection";
+ longdesc = "\
+Request that the server finish negotiation, gracefully if possible, then
+close the connection.";
+ example = Some "examples/list-exports.c";
+ see_also = [Link "set_opt_mode"; Link "opt_go"];
+ };
+
"set_list_exports", {
default_call with
args = [Bool "list"]; ret = RErr;
@@ -649,7 +709,7 @@ Return true if list exports mode was enabled on this handle.";
"get_nr_list_exports", {
default_call with
args = []; ret = RInt;
- permitted_states = [ Connected; Closed; Dead ];
+ permitted_states = [ Negotiating; Connected; Closed; Dead ];
shortdesc = "return the number of exports returned by the server";
longdesc = "\
If list exports mode was enabled on the handle and you connected
@@ -663,7 +723,7 @@ C<nbd_set_list_exports>.";
"get_list_export_name", {
default_call with
args = [ Int "i" ]; ret = RString;
- permitted_states = [ Connected; Closed; Dead ];
+ permitted_states = [ Negotiating; Connected; Closed; Dead ];
shortdesc = "return the i'th export name";
longdesc = "\
If list exports mode was enabled on the handle and you connected
@@ -675,7 +735,7 @@ from the list returned by the server.";
"get_list_export_description", {
default_call with
args = [ Int "i" ]; ret = RString;
- permitted_states = [ Connected; Closed; Dead ];
+ permitted_states = [ Negotiating; Connected; Closed; Dead ];
shortdesc = "return the i'th export description";
longdesc = "\
If list exports mode was enabled on the handle and you connected
@@ -690,7 +750,7 @@ is set with I<-D>.";
"add_meta_context", {
default_call with
args = [ String "name" ]; ret = RErr;
- permitted_states = [ Created ];
+ permitted_states = [ Created; Negotiating ];
shortdesc = "ask server to negotiate metadata context";
longdesc = "\
During connection libnbd can negotiate zero or more metadata
@@ -1240,7 +1300,7 @@ are free to pass in other contexts."
"get_protocol", {
default_call with
args = []; ret = RStaticString;
- permitted_states = [ Connected; Closed ];
+ permitted_states = [ Negotiating; Connected; Closed ];
shortdesc = "return the NBD protocol variant";
longdesc = "\
Return the NBD protocol variant in use on the connection. At
@@ -2051,6 +2111,18 @@ issue commands (see L<nbd_aio_is_ready(3)>).";
see_also = [Link "aio_is_ready"];
};
+ "aio_is_negotiating", {
+ default_call with
+ args = []; ret = RBool; is_locked = false; may_set_error = false;
+ shortdesc = "check if the connection is handshaking";
+ longdesc = "\
+Return true if this connection is ready to start another option
+negotiation command while handshaking with the server, before
+the handle becomes ready to issue commands (see L<nbd_aio_is_ready(3)>).
+This state can only be reached if L<nbd_set_opt_mode> was used.";
+ see_also = [Link "aio_is_ready"; Link "set_opt_mode"];
+ };
+
"aio_is_ready", {
default_call with
args = []; ret = RBool; is_locked = false; may_set_error = false;
@@ -2355,6 +2427,11 @@ let first_version = [
"get_list_export_name", (1, 4);
"get_list_export_description", (1, 4);
"get_block_size", (1, 4);
+ "set_opt_mode", (1, 4);
+ "get_opt_mode", (1, 4);
+ "opt_go", (1, 4);
+ "opt_abort", (1, 4);
+ "aio_is_negotiating", (1, 4);
(* These calls are proposed for a future version of libnbd, but
* have not been added to any released version so far.
diff --git a/generator/API.mli b/generator/API.mli
index 3dbd1ff..712e837 100644
--- a/generator/API.mli
+++ b/generator/API.mli
@@ -102,6 +102,7 @@ and flags = {
and permitted_state =
| Created (** can be called in the START state *)
| Connecting (** can be called when connecting/handshaking *)
+| Negotiating (** can be called when handshaking in opt_mode *)
| Connected (** when connected and READY or processing, but
not including CLOSED or DEAD *)
| Closed | Dead (** can be called when the handle is
diff --git a/generator/C.ml b/generator/C.ml
index afb8142..6b65f6e 100644
--- a/generator/C.ml
+++ b/generator/C.ml
@@ -397,7 +397,8 @@ let permitted_state_text permitted_states =
function
| Created -> "newly created"
| Connecting -> "connecting"
- | Connected -> "connected and finished handshaking with the server"
+ | Negotiating -> "negotiating"
+ | Connected -> "connected with the server"
| Closed -> "shut down"
| Dead -> "dead"
) permitted_states
@@ -421,6 +422,7 @@ let generate_lib_api_c () =
function
| Created -> "nbd_internal_is_state_created (state)"
| Connecting -> "nbd_internal_is_state_connecting (state)"
+ | Negotiating -> "nbd_internal_is_state_negotiating (state)"
| Connected -> "nbd_internal_is_state_ready (state) || nbd_internal_is_state_processing (state)"
| Closed -> "nbd_internal_is_state_closed (state)"
| Dead -> "nbd_internal_is_state_dead (state)"
diff --git a/generator/state_machine.ml b/generator/state_machine.ml
index 144a5f8..617cbf3 100644
--- a/generator/state_machine.ml
+++ b/generator/state_machine.ml
@@ -100,6 +100,13 @@ let rec state_machine = [
Group ("OLDSTYLE", oldstyle_state_machine);
Group ("NEWSTYLE", newstyle_state_machine);
+ State {
+ default_state with
+ name = "NEGOTIATING";
+ comment = "Connection is ready to negotiate an NBD option";
+ external_events = [ CmdIssue, "NEWSTYLE.START" ];
+ };
+
State {
default_state with
name = "READY";
@@ -271,6 +278,8 @@ and newstyle_state_machine = [
(* Options. These state groups are always entered unconditionally,
* in this order. The START state in each group will check if the
* state needs to run and skip to the next state in the list if not.
+ * When opt_mode is set, control is returned to the user in state
+ * NEGOTIATING after OPT_STRUCTURED_REPLY or any failed OPT_GO.
*)
Group ("OPT_STARTTLS", newstyle_opt_starttls_state_machine);
Group ("OPT_LIST", newstyle_opt_list_state_machine);
@@ -279,6 +288,23 @@ and newstyle_state_machine = [
Group ("OPT_GO", newstyle_opt_go_state_machine);
Group ("OPT_EXPORT_NAME", newstyle_opt_export_name_state_machine);
+ (* When opt_mode is enabled, option parsing can be cleanly ended without
+ * moving through the %READY state.
+ *)
+ State {
+ default_state with
+ name = "SEND_OPT_ABORT";
+ comment = "Send NBD_OPT_ABORT to end negotiation";
+ external_events = [ NotifyWrite, "" ];
+ };
+
+ State {
+ default_state with
+ name = "SEND_OPTION_SHUTDOWN";
+ comment = "Sending write shutdown notification to the remote server";
+ external_events = [ NotifyWrite, "" ];
+ };
+
(* When option parsing has successfully finished negotiation
* it will jump to this state for final steps before moving to
* the %READY state.
diff --git a/generator/states-magic.c b/generator/states-magic.c
index 944728d..2ad3a96 100644
--- a/generator/states-magic.c
+++ b/generator/states-magic.c
@@ -1,5 +1,5 @@
/* nbd client library in userspace: state machine
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2020 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -42,8 +42,10 @@ STATE_MACHINE {
}
version = be64toh (h->sbuf.new_handshake.version);
- if (version == NBD_NEW_VERSION)
+ if (version == NBD_NEW_VERSION) {
+ h->sbuf.option.option = 0;
SET_NEXT_STATE (%.NEWSTYLE.START);
+ }
else if (version == NBD_OLD_VERSION)
SET_NEXT_STATE (%.OLDSTYLE.START);
else {
diff --git a/generator/states-newstyle-opt-go.c b/generator/states-newstyle-opt-go.c
index 1e75e0a..daafe91 100644
--- a/generator/states-newstyle-opt-go.c
+++ b/generator/states-newstyle-opt-go.c
@@ -202,6 +202,10 @@ STATE_MACHINE {
set_error (0, "handshake: unknown reply from NBD_OPT_GO: 0x%" PRIx32,
reply);
}
+ if (h->opt_mode) {
+ SET_NEXT_STATE (%.NEGOTIATING);
+ return 0;
+ }
}
SET_NEXT_STATE (%.DEAD);
return 0;
diff --git a/generator/states-newstyle-opt-structured-reply.c b/generator/states-newstyle-opt-structured-reply.c
index cfe6e0d..d0ac4c5 100644
--- a/generator/states-newstyle-opt-structured-reply.c
+++ b/generator/states-newstyle-opt-structured-reply.c
@@ -1,5 +1,5 @@
/* nbd client library in userspace: state machine
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2020 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -83,7 +83,10 @@ STATE_MACHINE {
}
/* Next option. */
- SET_NEXT_STATE (%^OPT_SET_META_CONTEXT.START);
+ if (h->opt_mode)
+ SET_NEXT_STATE (%.NEGOTIATING);
+ else
+ SET_NEXT_STATE (%^OPT_SET_META_CONTEXT.START);
return 0;
} /* END STATE MACHINE */
diff --git a/generator/states-newstyle.c b/generator/states-newstyle.c
index 573f724..e61f373 100644
--- a/generator/states-newstyle.c
+++ b/generator/states-newstyle.c
@@ -1,5 +1,5 @@
/* nbd client library in userspace: state machine
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2020 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -112,6 +112,25 @@ handle_reply_error (struct nbd_handle *h)
STATE_MACHINE {
NEWSTYLE.START:
+ if (h->opt_mode) {
+ switch (h->sbuf.option.option) {
+ case NBD_OPT_GO:
+ SET_NEXT_STATE (%OPT_SET_META_CONTEXT.START);
+ return 0;
+ case NBD_OPT_ABORT:
+ h->sbuf.option.version = htobe64 (NBD_NEW_VERSION);
+ h->sbuf.option.option = htobe32 (NBD_OPT_ABORT);
+ h->sbuf.option.optlen = htobe32 (0);
+ h->wbuf = &h->sbuf;
+ h->wlen = sizeof h->sbuf.option;
+ SET_NEXT_STATE (%SEND_OPT_ABORT);
+ return 0;
+ case 0:
+ break;
+ default:
+ abort ();
+ }
+ }
h->rbuf = &h->sbuf;
h->rlen = sizeof h->sbuf.gflags;
SET_NEXT_STATE (%RECV_GFLAGS);
@@ -136,6 +155,11 @@ STATE_MACHINE {
return 0;
}
+ if ((h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE) == 0)
+ h->protocol = "newstyle";
+ else
+ h->protocol = "newstyle-fixed";
+
cflags = h->gflags;
h->sbuf.cflags = htobe32 (cflags);
h->wbuf = &h->sbuf;
@@ -155,12 +179,23 @@ STATE_MACHINE {
}
return 0;
+ NEWSTYLE.SEND_OPT_ABORT:
+ assert ((h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE) != 0);
+ assert (h->opt_mode);
+ switch (send_from_wbuf (h)) {
+ case -1: SET_NEXT_STATE (%.DEAD); return 0;
+ case 0:
+ SET_NEXT_STATE (%SEND_OPTION_SHUTDOWN);
+ }
+ return 0;
+
+ NEWSTYLE.SEND_OPTION_SHUTDOWN:
+ /* We don't care if the server replies to NBD_OPT_ABORT */
+ if (h->sock->ops->shut_writes (h, h->sock))
+ SET_NEXT_STATE (%.CLOSED);
+ return 0;
+
NEWSTYLE.FINISHED:
- if ((h->gflags & LIBNBD_HANDSHAKE_FLAG_FIXED_NEWSTYLE) == 0)
- h->protocol = "newstyle";
- else
- h->protocol = "newstyle-fixed";
-
SET_NEXT_STATE (%.READY);
return 0;
diff --git a/lib/connect.c b/lib/connect.c
index 92ed6b1..7e42b14 100644
--- a/lib/connect.c
+++ b/lib/connect.c
@@ -1,5 +1,5 @@
/* NBD client library in userspace
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2020 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -40,7 +40,8 @@
static int
error_unless_ready (struct nbd_handle *h)
{
- if (nbd_internal_is_state_ready (get_next_state (h)))
+ if (nbd_internal_is_state_ready (get_next_state (h)) ||
+ nbd_internal_is_state_negotiating (get_next_state (h)))
return 0;
/* Why did it fail? */
diff --git a/lib/is-state.c b/lib/is-state.c
index 1a85c7a..e019e53 100644
--- a/lib/is-state.c
+++ b/lib/is-state.c
@@ -1,5 +1,5 @@
/* NBD client library in userspace
- * Copyright (C) 2013-2019 Red Hat Inc.
+ * Copyright (C) 2013-2020 Red Hat Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
@@ -58,6 +58,12 @@ nbd_internal_is_state_connecting (enum state state)
return is_connecting_group (group);
}
+bool
+nbd_internal_is_state_negotiating (enum state state)
+{
+ return state == STATE_NEGOTIATING;
+}
+
bool
nbd_internal_is_state_ready (enum state state)
{
@@ -120,6 +126,12 @@ nbd_unlocked_aio_is_connecting (struct nbd_handle *h)
return nbd_internal_is_state_connecting (get_public_state (h));
}
+int
+nbd_unlocked_aio_is_negotiating (struct nbd_handle *h)
+{
+ return nbd_internal_is_state_negotiating (get_public_state (h));
+}
+
int
nbd_unlocked_aio_is_ready (struct nbd_handle *h)
{
diff --git a/lib/Makefile.am b/lib/Makefile.am
index 1c46c54..9fd6331 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -1,5 +1,5 @@
# nbd client library in userspace
-# Copyright (C) 2013-2019 Red Hat Inc.
+# Copyright (C) 2013-2020 Red Hat Inc.
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
@@ -47,6 +47,7 @@ libnbd_la_SOURCES = \
internal.h \
is-state.c \
nbd-protocol.h \
+ opt.c \
poll.c \
protocol.c \
rw.c \
diff --git a/examples/list-exports.c b/examples/list-exports.c
index 643e611..daea2ab 100644
--- a/examples/list-exports.c
+++ b/examples/list-exports.c
@@ -4,9 +4,27 @@
* $ qemu-nbd -x "hello" -t -k /tmp/sock disk.img
* $ ./run examples/list-exports /tmp/sock
* [0] hello
- * Which export to connect to? 0
+ * Which export to connect to (-1 to quit)? 0
* Connecting to hello ...
* /tmp/sock: hello: size = 2048 bytes
+ *
+ * To test this with nbdkit (requires 1.22):
+ * $ nbdkit -U /tmp/sock sh - <<\EOF
+ * case $1 in
+ * list_exports) echo NAMES; echo foo; echo foobar ;;
+ * open) echo "$3" ;;
+ * get_size) echo "$2" | wc -c ;;
+ * pread) echo "$2" | dd skip=$4 count=$3 \
+ * iflag=skip_bytes,count_bytes ;;
+ * *) exit 2 ;;
+ * esac
+ * EOF
+ * $ ./run examples/list-exports /tmp/sock
+ * [0] foo
+ * [1] foobar
+ * Which export to connect to (-1 to quit)? 1
+ * Connecting to foobar ...
+ * /tmp/sock: foobar: size = 7 bytes
*/
#include <stdio.h>
@@ -20,7 +38,7 @@
int
main (int argc, char *argv[])
{
- struct nbd_handle *nbd, *nbd2;
+ struct nbd_handle *nbd;
int r, i;
char *name, *desc;
int64_t size;
@@ -30,14 +48,15 @@ main (int argc, char *argv[])
exit (EXIT_FAILURE);
}
- /* Create the libnbd handle for querying exports. */
+ /* Create the libnbd handle. */
nbd = nbd_create ();
if (nbd == NULL) {
fprintf (stderr, "%s\n", nbd_get_error ());
exit (EXIT_FAILURE);
}
- /* Set the list exports mode in the handle. */
+ /* Set opt mode and request list exports. */
+ nbd_set_opt_mode (nbd, true);
nbd_set_list_exports (nbd, true);
/* Connect to the NBD server over a
@@ -52,8 +71,8 @@ main (int argc, char *argv[])
fprintf (stderr, "%s\n", nbd_get_error ());
exit (EXIT_FAILURE);
}
-
- if (nbd_get_nr_list_exports (nbd) == 0) {
+ if (!nbd_aio_is_negotiating (nbd) ||
+ nbd_get_nr_list_exports (nbd) == 0) {
fprintf (stderr, "Server does not support "
"listing exports.\n");
exit (EXIT_FAILURE);
@@ -73,29 +92,34 @@ main (int argc, char *argv[])
}
printf ("Which export to connect to? ");
if (scanf ("%d", &i) != 1) exit (EXIT_FAILURE);
+ if (i == -1) {
+ if (nbd_opt_abort (nbd) == -1) {
+ fprintf (stderr, "%s\n", nbd_get_error ());
+ exit (EXIT_FAILURE);
+ }
+ nbd_close (nbd);
+ exit (EXIT_SUCCESS);
+ }
name = nbd_get_list_export_name (nbd, i);
+ if (name == NULL) {
+ fprintf (stderr, "%s\n", nbd_get_error ());
+ exit (EXIT_FAILURE);
+ }
printf ("Connecting to %s ...\n", name);
- nbd_close (nbd);
- /* Connect again to the chosen export. */
- nbd2 = nbd_create ();
- if (nbd2 == NULL) {
+ /* Resume connecting to the chosen export. */
+ if (nbd_set_export_name (nbd, name) == -1 ||
+ nbd_opt_go (nbd) == -1) {
fprintf (stderr, "%s\n", nbd_get_error ());
exit (EXIT_FAILURE);
}
-
- if (nbd_set_export_name (nbd2, name) == -1) {
- fprintf (stderr, "%s\n", nbd_get_error ());
- exit (EXIT_FAILURE);
- }
-
- if (nbd_connect_unix (nbd2, argv[1]) == -1) {
- fprintf (stderr, "%s\n", nbd_get_error ());
+ if (!nbd_aio_is_ready (nbd)) {
+ fprintf (stderr, "server closed early\n");
exit (EXIT_FAILURE);
}
/* Read the size in bytes and print it. */
- size = nbd_get_size (nbd2);
+ size = nbd_get_size (nbd);
if (size == -1) {
fprintf (stderr, "%s\n", nbd_get_error ());
exit (EXIT_FAILURE);
@@ -104,7 +128,7 @@ main (int argc, char *argv[])
argv[1], name, size);
/* Close the libnbd handle. */
- nbd_close (nbd2);
+ nbd_close (nbd);
free (name);
--
2.28.0
4 years, 4 months
[PATCH nbdkit] file: Implement cache=none and fadvise=normal|random|sequential.
by Richard W.M. Jones
You can use these flags as described in the manual page to optimize
access patterns, and to get better behaviour with the page cache in
some scenarios.
For my testing I used the cachedel and cachestats utilities written by
Julius Plenz (https://github.com/Feh/nocache). I started with a 32 GB
file of random data on a machine with about 32 GB of RAM. At the
beginning of the test I evicted the file from the page cache:
$ cachedel /var/tmp/random
$ cachestats /var/tmp/random
pages in cache: 0/8388608 (0.0%) [filesize=33554432.0K, pagesize=4K]
Performing a normal sequential copy of the file to /dev/null shows
that the file is almost entirely pulled into page cache (thus evicting
useful programs and data):
$ free -m; time ./nbdkit file /var/tmp/random --run 'qemu-img convert -n -p -m 16 -W $nbd "json:{\"file.driver\":\"null-co\",\"file.size\":\"1E\"}"' ; free -m ; cachestats /var/tmp/random
total used free shared buff/cache available
Mem: 32083 1193 27816 1 3073 30435
Swap: 16135 16 16119
(100.00/100%)
real 0m12.437s
user 0m2.005s
sys 0m31.339s
total used free shared buff/cache available
Mem: 32083 1190 313 1 30578 30433
Swap: 16135 16 16119
pages in cache: 7053276/8388608 (84.1%) [filesize=33554432.0K, pagesize=4K]
Now we repeat the test using fadvise=sequential cache=none:
$ cachedel /var/tmp/random
$ cachestats /var/tmp/random
pages in cache: 106/8388608 (0.0%) [filesize=33554432.0K, pagesize=4K]
$ free -m; time ./nbdkit file /var/tmp/random fadvise=sequential cache=none --run 'qemu-img convert -n -p -m 16 -W $nbd "json:{\"file.driver\":\"null-co\",\"file.size\":\"1E\"}"' ; free -m ; cachestats /var/tmp/random
total used free shared buff/cache available
Mem: 32083 1188 27928 1 2966 30440
Swap: 16135 16 16119
(100.00/100%)
real 0m13.107s
user 0m2.051s
sys 0m37.556s
total used free shared buff/cache available
Mem: 32083 1196 27861 1 3024 30429
Swap: 16135 16 16119
pages in cache: 14533/8388608 (0.2%) [filesize=33554432.0K, pagesize=4K]
In this case the file largely avoids being pulled into the page cache,
and we do not evict useful stuff.
Notice that the test takes slightly longer to run. This is expected
because page cache eviction happens synchronously. I expect the cost
when doing sequential writes to be higher. Linus outlined a technique
to do this without the overhead, but unfortunately it is considerably
more complex and dangerous than I am comfortable adding to the file
plugin:
http://lkml.iu.edu/hypermail/linux/kernel/1005.2/01845.html
http://lkml.iu.edu/hypermail/linux/kernel/1005.2/01953.html
(See also scary warnings in the sync_file_range man page)
---
plugins/file/nbdkit-file-plugin.pod | 44 ++++++++++++++
plugins/file/file.c | 90 +++++++++++++++++++++++++++++
2 files changed, 134 insertions(+)
diff --git a/plugins/file/nbdkit-file-plugin.pod b/plugins/file/nbdkit-file-plugin.pod
index dac673ae..0d61b312 100644
--- a/plugins/file/nbdkit-file-plugin.pod
+++ b/plugins/file/nbdkit-file-plugin.pod
@@ -5,6 +5,7 @@ nbdkit-file-plugin - nbdkit file plugin
=head1 SYNOPSIS
nbdkit file [file=]FILENAME
+ [cache=default|none] [fadvise=normal|random|sequential]
=head1 DESCRIPTION
@@ -17,6 +18,28 @@ It serves the named C<FILENAME> over NBD. Local block devices
=over 4
+=item B<cache=default>
+
+=item B<cache=none>
+
+Using C<cache=none> tries to prevent the kernel from keeping parts of
+the file that have already been read or written in the page cache.
+
+=item B<fadvise=normal>
+
+=item B<fadvise=random>
+
+=item B<fadvise=sequential>
+
+This optional flag hints to the kernel that you will access the file
+normally, or in a random order, or sequentially. The exact behaviour
+depends on your operating system, but for Linux using C<normal> causes
+the kernel to read-ahead, C<sequential> causes the kernel to
+read-ahead twice as much as C<normal>, and C<random> turns off
+read-ahead.
+
+The default is C<normal>.
+
=item [B<file=>]FILENAME
Serve the file named C<FILENAME>. A local block device name can also
@@ -31,6 +54,27 @@ See L<nbdkit(1)/Magic parameters>.
=head1 NOTES
+=head2 Optimizing for random or sequential access
+
+If you know in advance that the NBD client will access the file
+randomly or only sequentially then you can hint that to the kernel
+using:
+
+ nbdkit file disk.img fadvise=random
+ nbdkit file disk.img fadvise=sequential
+
+As described in the L</PARAMETERS> section above, on Linux this
+disables or increases the amount of read-ahead that the kernel does.
+
+=head2 Reducing evictions from the page cache
+
+If the file is very large and you known the client will only
+read/write the file sequentially one time (eg for making a single copy
+or backup) then this will stop other processes from being evicted from
+the page cache:
+
+ nbdkit file disk.img fadvise=sequential cache=none
+
=head2 Files on tmpfs
If you want to expose a file that resides on a file system known to
diff --git a/plugins/file/file.c b/plugins/file/file.c
index 076e7531..a8e37cd9 100644
--- a/plugins/file/file.c
+++ b/plugins/file/file.c
@@ -66,6 +66,18 @@
static char *filename = NULL;
+/* posix_fadvise mode: -1 = don't set it, or POSIX_FADV_*. */
+static int fadvise_mode =
+#if defined (HAVE_POSIX_FADVISE) && defined (POSIX_FADV_NORMAL)
+ POSIX_FADV_NORMAL
+#else
+ -1
+#endif
+ ;
+
+/* cache mode */
+static enum { cache_default, cache_none } cache_mode = cache_default;
+
/* Any callbacks using lseek must be protected by this lock. */
static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER;
@@ -97,6 +109,46 @@ file_config (const char *key, const char *value)
if (!filename)
return -1;
}
+ else if (strcmp (key, "fadvise") == 0) {
+ /* As this is a hint, if the kernel doesn't support the feature
+ * ignore the parameter.
+ */
+ if (strcmp (value, "normal") == 0) {
+#if defined (HAVE_POSIX_FADVISE) && defined (POSIX_FADV_NORMAL)
+ fadvise_mode = POSIX_FADV_NORMAL;
+#else
+ fadvise_mode = -1;
+#endif
+ }
+ else if (strcmp (value, "random") == 0) {
+#if defined (HAVE_POSIX_FADVISE) && defined (POSIX_FADV_RANDOM)
+ fadvise_mode = POSIX_FADV_RANDOM;
+#else
+ fadvise_mode = -1;
+#endif
+ }
+ else if (strcmp (value, "sequential") == 0) {
+#if defined (HAVE_POSIX_FADVISE) && defined (POSIX_FADV_SEQUENTIAL)
+ fadvise_mode = POSIX_FADV_SEQUENTIAL;
+#else
+ fadvise_mode = -1;
+#endif
+ }
+ else {
+ nbdkit_error ("unknown fadvise mode: %s", value);
+ return -1;
+ }
+ }
+ else if (strcmp (key, "cache") == 0) {
+ if (strcmp (value, "default") == 0)
+ cache_mode = cache_default;
+ else if (strcmp (value, "none") == 0)
+ cache_mode = cache_none;
+ else {
+ nbdkit_error ("unknown cache mode: %s", value);
+ return -1;
+ }
+ }
else if (strcmp (key, "rdelay") == 0 ||
strcmp (key, "wdelay") == 0) {
nbdkit_error ("add --filter=delay on the command line");
@@ -188,6 +240,17 @@ file_open (int readonly)
return NULL;
}
+ if (fadvise_mode != -1) {
+ /* This is a hint so we ignore failures. */
+#ifdef HAVE_POSIX_FADVISE
+ int r = posix_fadvise (h->fd, 0, 0, fadvise_mode);
+ if (r == -1)
+ nbdkit_debug ("posix_fadvise: %s: %m (ignored)", filename);
+#else
+ nbdkit_debug ("fadvise is not supported");
+#endif
+ }
+
h->is_block_device = S_ISBLK (statbuf.st_mode);
h->sector_size = 4096; /* Start with safe guess */
@@ -329,6 +392,10 @@ file_pread (void *handle, void *buf, uint32_t count, uint64_t offset,
uint32_t flags)
{
struct handle *h = handle;
+#if defined (HAVE_POSIX_FADVISE) && defined (POSIX_FADV_DONTNEED)
+ uint32_t orig_count = count;
+ uint64_t orig_offset = offset;
+#endif
while (count > 0) {
ssize_t r = pread (h->fd, buf, count, offset);
@@ -345,6 +412,12 @@ file_pread (void *handle, void *buf, uint32_t count, uint64_t offset,
offset += r;
}
+#ifdef HAVE_POSIX_FADVISE
+ /* On Linux this will evict the pages we just read from the page cache. */
+ if (cache_mode == cache_none)
+ posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED);
+#endif
+
return 0;
}
@@ -355,6 +428,17 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset,
{
struct handle *h = handle;
+#if defined (HAVE_POSIX_FADVISE) && defined (POSIX_FADV_DONTNEED)
+ uint32_t orig_count = count;
+ uint64_t orig_offset = offset;
+
+ /* If cache=none we want to force pages we have just written to the
+ * file to be flushed to disk so we can immediately evict them from
+ * the page cache.
+ */
+ if (cache_mode == cache_none) flags |= NBDKIT_FLAG_FUA;
+#endif
+
while (count > 0) {
ssize_t r = pwrite (h->fd, buf, count, offset);
if (r == -1) {
@@ -369,6 +453,12 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset,
if ((flags & NBDKIT_FLAG_FUA) && file_flush (handle, 0) == -1)
return -1;
+#ifdef HAVE_POSIX_FADVISE
+ /* On Linux this will evict the pages we just wrote from the page cache. */
+ if (cache_mode == cache_none)
+ posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED);
+#endif
+
return 0;
}
--
2.27.0
4 years, 4 months
[PATCH nbdkit] python: Allow extents to return any iterable (which includes lists).
by Richard W.M. Jones
Thanks: Nir Soffer.
Enhances: commit c12e3cb150259f0058727a50341a2d14bb0015a3
---
plugins/python/nbdkit-python-plugin.pod | 3 +-
plugins/python/python.c | 39 +++++++++++++++----------
2 files changed, 25 insertions(+), 17 deletions(-)
diff --git a/plugins/python/nbdkit-python-plugin.pod b/plugins/python/nbdkit-python-plugin.pod
index d7b6033f..ddae677e 100644
--- a/plugins/python/nbdkit-python-plugin.pod
+++ b/plugins/python/nbdkit-python-plugin.pod
@@ -377,7 +377,8 @@ optionally using C<nbdkit.set_error> first.
(Optional)
def extents(h, count, offset, flags):
- # return a list of (offset, length, type) tuples:
+ # return an iterable object (eg. list) of
+ # (offset, length, type) tuples:
return [ (off1, len1, type1), (off2, len2, type2), ... ]
=back
diff --git a/plugins/python/python.c b/plugins/python/python.c
index 46b912e2..27c5ede2 100644
--- a/plugins/python/python.c
+++ b/plugins/python/python.c
@@ -1035,7 +1035,8 @@ py_extents (void *handle, uint32_t count, uint64_t offset,
struct handle *h = handle;
PyObject *fn;
PyObject *r;
- Py_ssize_t i, size;
+ PyObject *iter, *t;
+ size_t size;
if (callback_defined ("extents", &fn)) {
PyErr_Clear ();
@@ -1045,29 +1046,25 @@ py_extents (void *handle, uint32_t count, uint64_t offset,
if (check_python_failure ("extents") == -1)
return -1;
- /* We expect a list of extents to be returned. Each extent is a
- * tuple (offset, length, type). The list must not be empty.
- */
- if (!PyList_Check (r)) {
- nbdkit_error ("extents method did not return a list");
- Py_DECREF (r);
- return -1;
- }
- size = PyList_Size (r);
- if (size < 1) {
- nbdkit_error ("extents method cannot return an empty list");
+ iter = PyObject_GetIter (r);
+ if (iter == NULL) {
+ nbdkit_error ("extents method did not return "
+ "something which is iterable");
Py_DECREF (r);
return -1;
}
- for (i = 0; i < size; ++i) {
- PyObject *t, *py_offset, *py_length, *py_type;
+ size = 0;
+ while ((t = PyIter_Next (iter)) != NULL) {
+ PyObject *py_offset, *py_length, *py_type;
uint64_t extent_offset, extent_length;
uint32_t extent_type;
- t = PyList_GetItem (r, i);
+ size++;
+
if (!PyTuple_Check (t) || PyTuple_Size (t) != 3) {
- nbdkit_error ("extents method did not return a list of 3-tuples");
+ nbdkit_error ("extents method did not return an iterable of 3-tuples");
+ Py_DECREF (iter);
Py_DECREF (r);
return -1;
}
@@ -1078,16 +1075,26 @@ py_extents (void *handle, uint32_t count, uint64_t offset,
extent_length = PyLong_AsUnsignedLongLong (py_length);
extent_type = PyLong_AsUnsignedLong (py_type);
if (check_python_failure ("PyLong") == -1) {
+ Py_DECREF (iter);
Py_DECREF (r);
return -1;
}
if (nbdkit_add_extent (extents,
extent_offset, extent_length, extent_type) == -1) {
+ Py_DECREF (iter);
Py_DECREF (r);
return -1;
}
}
+ if (size < 1) {
+ nbdkit_error ("extents method cannot return an empty list");
+ Py_DECREF (iter);
+ Py_DECREF (r);
+ return -1;
+ }
+
+ Py_DECREF (iter);
Py_DECREF (r);
}
else {
--
2.27.0
4 years, 4 months
[nbdkit PATCH v2] server: Permit - in plugin names
by Eric Blake
Use of - does not need shell quoting, and aids legibility in
multi-word plugin or filter names. Permitting both - and _ would be
ambiguous (not to mention that things like 'man nbdkit-foo_bar-plugin'
would look ugly), so prefer only the character that is easier for
human use. Permitting a leading - would be ambiguous with options,
but restricting to a letter as the first character would rule out any
existing plugins such as a theoretical '9p' plugin for interaction
with the 9p filesystem.
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
After a bit more IRC chat, I think this is more what we want.
docs/nbdkit-filter.pod | 3 ++-
docs/nbdkit-plugin.pod | 3 ++-
server/backend.c | 15 +++++++++++----
3 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod
index 12343dbf..93601088 100644
--- a/docs/nbdkit-filter.pod
+++ b/docs/nbdkit-filter.pod
@@ -201,7 +201,8 @@ methods.
const char *name;
This field (a string) is required, and B<must> contain only ASCII
-alphanumeric characters and be unique amongst all filters.
+alphanumeric characters or non-leading dashes, and be unique amongst
+all filters.
=head2 C<.longname>
diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod
index 9341f282..a661680c 100644
--- a/docs/nbdkit-plugin.pod
+++ b/docs/nbdkit-plugin.pod
@@ -471,7 +471,8 @@ an indication of failure. It has the following prototype:
const char *name;
This field (a string) is required, and B<must> contain only ASCII
-alphanumeric characters and be unique amongst all plugins.
+alphanumeric characters or non-leading dashes, and be unique amongst
+all filters.
=head2 C<.version>
diff --git a/server/backend.c b/server/backend.c
index 75ca53be..b001a9a9 100644
--- a/server/backend.c
+++ b/server/backend.c
@@ -98,13 +98,20 @@ backend_load (struct backend *b, const char *name, void (*load) (void))
program_name, b->filename, b->type);
exit (EXIT_FAILURE);
}
- for (i = 0; i < len; ++i) {
+ if (! ascii_isalnum (*name)) {
+ fprintf (stderr,
+ "%s: %s: %s.name ('%s') field must begin with an "
+ "ASCII alphanumeric\n",
+ program_name, b->filename, b->type, name);
+ exit (EXIT_FAILURE);
+ }
+ for (i = 1; i < len; ++i) {
unsigned char c = name[i];
- if (! ascii_isalnum (c)) {
+ if (! ascii_isalnum (c) && c != '-') {
fprintf (stderr,
- "%s: %s: %s.name ('%s') field "
- "must contain only ASCII alphanumeric characters\n",
+ "%s: %s: %s.name ('%s') field must contain only "
+ "ASCII alphanumeric or dash characters\n",
program_name, b->filename, b->type, name);
exit (EXIT_FAILURE);
}
--
2.28.0
4 years, 4 months
[PATCH nbdkit] python: Implement can_extents + extents.
by Richard W.M. Jones
---
plugins/python/nbdkit-python-plugin.pod | 19 ++++++-
plugins/python/python.c | 75 +++++++++++++++++++++++++
tests/test-python-plugin.py | 8 ++-
tests/test_python.py | 73 +++++++++++++++++++++++-
4 files changed, 169 insertions(+), 6 deletions(-)
diff --git a/plugins/python/nbdkit-python-plugin.pod b/plugins/python/nbdkit-python-plugin.pod
index b20f51f7..d7b6033f 100644
--- a/plugins/python/nbdkit-python-plugin.pod
+++ b/plugins/python/nbdkit-python-plugin.pod
@@ -271,6 +271,13 @@ contents will be garbage collected.
# return nbdkit.CACHE_NONE or nbdkit.CACHE_EMULATE
# or nbdkit.CACHE_NATIVE
+=item C<can_extents>
+
+(Optional)
+
+ def can_extents(h):
+ # return a boolean
+
=item C<pread>
(Required)
@@ -365,6 +372,14 @@ indicated range.
If the cache operation fails, your function should throw an exception,
optionally using C<nbdkit.set_error> first.
+=item C<extents>
+
+(Optional)
+
+ def extents(h, count, offset, flags):
+ # return a list of (offset, length, type) tuples:
+ return [ (off1, len1, type1), (off2, len2, type2), ... ]
+
=back
=head2 Missing callbacks
@@ -382,9 +397,7 @@ C<version>,
C<longname>,
C<description>,
C<config_help>,
-C<magic_config_key>,
-C<can_extents>,
-C<extents>.
+C<magic_config_key>.
These are not yet supported.
diff --git a/plugins/python/python.c b/plugins/python/python.c
index 398473f5..585cd9e6 100644
--- a/plugins/python/python.c
+++ b/plugins/python/python.c
@@ -1020,6 +1020,79 @@ py_can_cache (void *handle)
return NBDKIT_CACHE_NONE;
}
+static int
+py_can_extents (void *handle)
+{
+ ACQUIRE_PYTHON_GIL_FOR_CURRENT_SCOPE;
+ return boolean_callback (handle, "can_extents", "extents");
+}
+
+static int
+py_extents (void *handle, uint32_t count, uint64_t offset,
+ uint32_t flags, struct nbdkit_extents *extents)
+{
+ ACQUIRE_PYTHON_GIL_FOR_CURRENT_SCOPE;
+ struct handle *h = handle;
+ PyObject *fn;
+ PyObject *r;
+ Py_ssize_t i, size;
+
+ if (callback_defined ("extents", &fn)) {
+ PyErr_Clear ();
+
+ r = PyObject_CallFunction (fn, "OiLI", h->py_h, count, offset, flags);
+ Py_DECREF (fn);
+ if (check_python_failure ("extents") == -1)
+ return -1;
+
+ /* We expect a list of extents to be returned. Each extent is a
+ * tuple (offset, length, type).
+ */
+ if (!PyList_Check (r)) {
+ nbdkit_error ("extents method did not return a list");
+ Py_DECREF (r);
+ return -1;
+ }
+
+ size = PyList_Size (r);
+ for (i = 0; i < size; ++i) {
+ PyObject *t, *py_offset, *py_length, *py_type;
+ uint64_t extent_offset, extent_length;
+ uint32_t extent_type;
+
+ t = PyList_GetItem (r, i);
+ if (!PyTuple_Check (t) || PyTuple_Size (t) != 3) {
+ nbdkit_error ("extents method did not return a list of 3-tuples");
+ Py_DECREF (r);
+ return -1;
+ }
+ py_offset = PyTuple_GetItem (t, 0);
+ py_length = PyTuple_GetItem (t, 1);
+ py_type = PyTuple_GetItem (t, 2);
+ extent_offset = PyLong_AsUnsignedLongLong (py_offset);
+ extent_length = PyLong_AsUnsignedLongLong (py_length);
+ extent_type = PyLong_AsUnsignedLong (py_type);
+ if (check_python_failure ("PyLong") == -1) {
+ Py_DECREF (r);
+ return -1;
+ }
+ if (nbdkit_add_extent (extents,
+ extent_offset, extent_length, extent_type) == -1) {
+ Py_DECREF (r);
+ return -1;
+ }
+ }
+
+ Py_DECREF (r);
+ }
+ else {
+ nbdkit_error ("%s not implemented", "extents");
+ return -1;
+ }
+
+ return 0;
+}
+
#define py_config_help \
"script=<FILENAME> (required) The Python plugin to run.\n" \
"[other arguments may be used by the plugin that you load]"
@@ -1058,6 +1131,7 @@ static struct nbdkit_plugin plugin = {
.can_fast_zero = py_can_fast_zero,
.can_fua = py_can_fua,
.can_cache = py_can_cache,
+ .can_extents = py_can_extents,
.pread = py_pread,
.pwrite = py_pwrite,
@@ -1065,6 +1139,7 @@ static struct nbdkit_plugin plugin = {
.trim = py_trim,
.zero = py_zero,
.cache = py_cache,
+ .extents = py_extents,
};
NBDKIT_REGISTER_PLUGIN (plugin)
diff --git a/tests/test-python-plugin.py b/tests/test-python-plugin.py
index 8e90bc23..d7331df9 100644
--- a/tests/test-python-plugin.py
+++ b/tests/test-python-plugin.py
@@ -1,5 +1,5 @@
# nbdkit test plugin
-# Copyright (C) 2019 Red Hat Inc.
+# Copyright (C) 2019-2020 Red Hat Inc.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
@@ -95,6 +95,9 @@ def can_cache (h):
elif cache == "native":
return nbdkit.CACHE_NATIVE
+def can_extents (h):
+ return cfg.get ('can_extents', False)
+
def pread (h, buf, offset, flags):
assert flags == 0
end = offset + len(buf)
@@ -131,3 +134,6 @@ def zero (h, count, offset, flags):
def cache (h, count, offset, flags):
assert flags == 0
# do nothing
+
+def extents(h, count, offset, flags):
+ return cfg.get ('extents', [])
diff --git a/tests/test_python.py b/tests/test_python.py
index 6b9f2979..a8a1a79f 100755
--- a/tests/test_python.py
+++ b/tests/test_python.py
@@ -1,6 +1,6 @@
#!/usr/bin/env python3
# nbdkit
-# Copyright (C) 2019 Red Hat Inc.
+# Copyright (C) 2019-2020 Red Hat Inc.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
@@ -153,7 +153,19 @@ class Test (unittest.TestCase):
self.connect ({"size": 512, "can_cache": "native"})
assert self.h.can_cache()
- # Not yet implemented: can_extents.
+ # In theory we could use a test like this, but nbdkit can
+ # always synthesize base:allocation block_status responses
+ # even if the plugin doesn't support them.
+ #
+ #def test_can_extents_true (self):
+ # self.h.add_meta_context ("base:allocation")
+ # self.connect ({"size": 512, "can_extents": True})
+ # assert self.h.can_meta_context ("base:allocation")
+ #
+ #def test_can_extents_false (self):
+ # self.h.add_meta_context ("base:allocation")
+ # self.connect ({"size": 512, "can_extents": False})
+ # assert not self.h.can_meta_context ("base:allocation")
def test_pread (self):
"""Test pread."""
@@ -220,3 +232,60 @@ class Test (unittest.TestCase):
"""Test cache."""
self.connect ({"size": 512, "can_cache": "native"})
self.h.cache (512, 0)
+
+ # We don't have access to the magic constants defined in the
+ # nbdkit module, so redefine them here.
+ EXTENT_HOLE = 1
+ EXTENT_ZERO = 2
+
+ def test_extents_1 (self):
+ """Test extents."""
+
+ offset = None
+ entries = []
+
+ def f(meta_context, o, e, err):
+ nonlocal offset, entries
+ if meta_context != "base:allocation": return
+ offset = o
+ entries = e
+
+ self.h.add_meta_context ("base:allocation")
+ self.connect ({"size": 512,
+ "can_extents": True,
+ "extents":
+ [ (0, 512, self.EXTENT_HOLE|self.EXTENT_ZERO) ]})
+
+ self.h.block_status (512, 0, lambda *args : f (*args))
+ assert offset == 0
+ assert entries == [ 512, self.EXTENT_HOLE|self.EXTENT_ZERO ]
+
+ def test_extents_2 (self):
+ """Test extents."""
+
+ offset = None
+ entries = []
+
+ def f(meta_context, o, e, err):
+ nonlocal offset, entries
+ if meta_context != "base:allocation": return
+ offset = o
+ entries = e
+
+ self.h.add_meta_context ("base:allocation")
+ self.connect ({"size": 2048,
+ "can_extents": True,
+ "extents":
+ [ (0, 512, self.EXTENT_HOLE|self.EXTENT_ZERO),
+ (512, 512, 0),
+ (1024, 1024, self.EXTENT_ZERO) ]})
+
+ self.h.block_status (2048, 0, lambda *args : f (*args))
+ assert offset == 0
+ assert entries == [ 512, self.EXTENT_HOLE|self.EXTENT_ZERO,
+ 512, 0,
+ 1024, self.EXTENT_ZERO ]
+
+ self.h.block_status (1024, 1024, lambda *args : f (*args))
+ assert offset == 1024
+ assert entries == [ 1024, self.EXTENT_ZERO ]
--
2.27.0
4 years, 4 months
[nbdkit PATCH] server: Permit - and _ in plugin names
by Eric Blake
Neither - nor _ need shell quoting, and as long as they are not the
first character, permitting them in a plugin or filter name may make
the command line easier to read. A restriction against a leading
digit is new, but hopefully does not break any existing plugins.
Signed-off-by: Eric Blake <eblake(a)redhat.com>
---
It turns out I can't name a filter 'tls-fallback' without a slight
tweak. Our code for is_short_name() already accepts such names, but
our documentation and backend_load did not.
I'm undecided on whether rejecting a plugin beginning with a digit is
a good idea; as written, this patch would forbid a plugin named '9p',
but it is a one-line tweak to allow it (ascii_isalnum instead of
ascii_isalpha).
docs/nbdkit-filter.pod | 3 ++-
docs/nbdkit-plugin.pod | 3 ++-
server/backend.c | 10 ++++++++--
3 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/docs/nbdkit-filter.pod b/docs/nbdkit-filter.pod
index 3c7527d4..361a70f1 100644
--- a/docs/nbdkit-filter.pod
+++ b/docs/nbdkit-filter.pod
@@ -201,7 +201,8 @@ methods.
const char *name;
This field (a string) is required, and B<must> contain only ASCII
-alphanumeric characters and be unique amongst all filters.
+alphanumeric characters as well as dash or underscore, must begin with
+a letter, and be unique amongst all filters.
=head2 C<.longname>
diff --git a/docs/nbdkit-plugin.pod b/docs/nbdkit-plugin.pod
index 3d573b33..c961c116 100644
--- a/docs/nbdkit-plugin.pod
+++ b/docs/nbdkit-plugin.pod
@@ -471,7 +471,8 @@ an indication of failure. It has the following prototype:
const char *name;
This field (a string) is required, and B<must> contain only ASCII
-alphanumeric characters and be unique amongst all plugins.
+alphanumeric characters as well as dash or underscore, must begin with
+a letter, and be unique amongst all filters.
=head2 C<.version>
diff --git a/server/backend.c b/server/backend.c
index 8f4fed9d..046aacb4 100644
--- a/server/backend.c
+++ b/server/backend.c
@@ -98,10 +98,16 @@ backend_load (struct backend *b, const char *name, void (*load) (void))
program_name, b->filename, b->type);
exit (EXIT_FAILURE);
}
- for (i = 0; i < len; ++i) {
+ if (! ascii_isalpha (*name)) {
+ fprintf (stderr,
+ "%s: %s: %s.name ('%s') field must begin with an ASCII letter\n",
+ program_name, b->filename, b->type, name);
+ exit (EXIT_FAILURE);
+ }
+ for (i = 1; i < len; ++i) {
unsigned char c = name[i];
- if (! ascii_isalnum (c)) {
+ if (! ascii_isalnum (c) && c != '-' && c != '_') {
fprintf (stderr,
"%s: %s: %s.name ('%s') field "
"must contain only ASCII alphanumeric characters\n",
--
2.28.0
4 years, 4 months
[nbdkit PATCH 0/3] Content differentiation during --tls=on
by Eric Blake
Patch 3 still needs tests added, but it is at least working from
my simple command line tests.
Eric Blake (3):
server: Implement nbdkit_is_tls for use during .open
server: Expose final thread_model to filter's .get_ready
tlsdummy: New filter
docs/nbdkit-filter.pod | 21 +-
docs/nbdkit-plugin.pod | 34 ++-
docs/nbdkit-tls.pod | 8 +-
filters/log/nbdkit-log-filter.pod | 2 +-
filters/tlsdummy/nbdkit-tlsdummy-filter.pod | 72 ++++++
plugins/sh/nbdkit-sh-plugin.pod | 5 +-
include/nbdkit-filter.h | 5 +-
include/nbdkit-plugin.h | 1 +
configure.ac | 2 +
filters/tlsdummy/Makefile.am | 63 ++++++
server/internal.h | 3 +-
server/backend.c | 6 +-
server/filters.c | 10 +-
server/nbdkit.syms | 1 +
server/plugins.c | 14 +-
server/public.c | 16 ++
plugins/sh/methods.c | 1 +
filters/cow/cow.c | 2 +-
filters/exitlast/exitlast.c | 2 +-
filters/ext2/ext2.c | 2 +-
filters/extentlist/extentlist.c | 3 +-
filters/gzip/gzip.c | 2 +-
filters/limit/limit.c | 2 +-
filters/log/log.c | 18 +-
filters/partition/partition.c | 2 +-
filters/rate/rate.c | 4 +-
filters/retry/retry.c | 2 +-
filters/stats/stats.c | 2 +-
filters/tar/tar.c | 2 +-
filters/tlsdummy/tlsdummy.c | 235 ++++++++++++++++++++
filters/truncate/truncate.c | 2 +-
filters/xz/xz.c | 2 +-
tests/test-layers-filter.c | 4 +-
TODO | 13 +-
34 files changed, 504 insertions(+), 59 deletions(-)
create mode 100644 filters/tlsdummy/nbdkit-tlsdummy-filter.pod
create mode 100644 filters/tlsdummy/Makefile.am
create mode 100644 filters/tlsdummy/tlsdummy.c
--
2.28.0
4 years, 4 months
Long running nbdkit instances seem to leak memory
by Richard W.M. Jones
I'm using nbdkit + the file plugin to serve NBD root filesystems for
some machines, so I get to observe how it behaves for very long runs.
I think this indicates a memory leak?
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5038 root 20 0 4577220 16000 796 S 0.0 0.1 30:52.40 nbdkit
5592 root 20 0 4159224 11708 784 S 0.0 0.1 22:21.67 nbdkit
Not sure how to easily diagnose this. My thought was to force a
coredump by sending a signal, but I don't know if these processes have
a suitable rlimit. I could also attach gdb. Perhaps by looking at a
dump of the memory we could get an idea of what structure leaks.
Unfortunately also it's a very old version:
$ rpm -qf /usr/sbin/nbdkit
nbdkit-1.8.0-1.el7.x86_64
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
4 years, 4 months
[PATCH nbdkit] plugins: file: More standard cache mode names
by Nir Soffer
The new cache=none mode is misleading since it does not avoid usage of
the page cache. When using shared storage, we may get stale data from
the page cache. When writing, we flush after every write which is
inefficient and unneeded.
Rename the cache modes to:
- writeback - write complete when the system call returned, and the data
was copied to the page cache.
- writethrough - write completes when the data reach storage. read marks
pages as not needed to minimize use of the page cache.
These terms are more aligned with other software like qemu or LIO and
common caching concepts.
[1] https://www.linuxjournal.com/article/7105
[2] https://www.qemu.org/docs/master/system/invocation.html
(see cache=cache)
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
plugins/file/file.c | 22 +++++++++++-----------
plugins/file/nbdkit-file-plugin.pod | 19 +++++++++++++------
tests/test-gzip.c | 2 +-
3 files changed, 25 insertions(+), 18 deletions(-)
diff --git a/plugins/file/file.c b/plugins/file/file.c
index f6f91955..deac9b7f 100644
--- a/plugins/file/file.c
+++ b/plugins/file/file.c
@@ -76,7 +76,7 @@ static int fadvise_mode =
;
/* cache mode */
-static enum { cache_default, cache_none } cache_mode = cache_default;
+static enum { cache_writeback, cache_writethrough } cache_mode = cache_writeback;
/* Any callbacks using lseek must be protected by this lock. */
static pthread_mutex_t lseek_lock = PTHREAD_MUTEX_INITIALIZER;
@@ -140,10 +140,10 @@ file_config (const char *key, const char *value)
}
}
else if (strcmp (key, "cache") == 0) {
- if (strcmp (value, "default") == 0)
- cache_mode = cache_default;
- else if (strcmp (value, "none") == 0)
- cache_mode = cache_none;
+ if (strcmp (value, "writeback") == 0)
+ cache_mode = cache_writeback;
+ else if (strcmp (value, "writethrough") == 0)
+ cache_mode = cache_writethrough;
else {
nbdkit_error ("unknown cache mode: %s", value);
return -1;
@@ -423,7 +423,7 @@ file_pread (void *handle, void *buf, uint32_t count, uint64_t offset,
#ifdef HAVE_POSIX_FADVISE
/* On Linux this will evict the pages we just read from the page cache. */
- if (cache_mode == cache_none)
+ if (cache_mode == cache_writethrough)
posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED);
#endif
@@ -441,11 +441,11 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset,
uint32_t orig_count = count;
uint64_t orig_offset = offset;
- /* If cache=none we want to force pages we have just written to the
- * file to be flushed to disk so we can immediately evict them from
- * the page cache.
+ /* If cache=writethrough we want to force pages we have just written
+ * to the file to be flushed to disk so we can immediately evict them
+ * from the page cache.
*/
- if (cache_mode == cache_none) flags |= NBDKIT_FLAG_FUA;
+ if (cache_mode == cache_writethrough) flags |= NBDKIT_FLAG_FUA;
#endif
while (count > 0) {
@@ -464,7 +464,7 @@ file_pwrite (void *handle, const void *buf, uint32_t count, uint64_t offset,
#ifdef HAVE_POSIX_FADVISE
/* On Linux this will evict the pages we just wrote from the page cache. */
- if (cache_mode == cache_none)
+ if (cache_mode == cache_writethrough)
posix_fadvise (h->fd, orig_offset, orig_count, POSIX_FADV_DONTNEED);
#endif
diff --git a/plugins/file/nbdkit-file-plugin.pod b/plugins/file/nbdkit-file-plugin.pod
index e2ae0b1b..6a5bd088 100644
--- a/plugins/file/nbdkit-file-plugin.pod
+++ b/plugins/file/nbdkit-file-plugin.pod
@@ -5,7 +5,7 @@ nbdkit-file-plugin - nbdkit file plugin
=head1 SYNOPSIS
nbdkit file [file=]FILENAME
- [cache=default|none] [fadvise=normal|random|sequential]
+ [cache=writeback|writethrough] [fadvise=normal|random|sequential]
=head1 DESCRIPTION
@@ -18,12 +18,19 @@ It serves the named C<FILENAME> over NBD. Local block devices
=over 4
-=item B<cache=default>
+=item B<cache=writeback>
-=item B<cache=none>
+=item B<cache=writethrough>
-Using C<cache=none> tries to prevent the kernel from keeping parts of
-the file that have already been read or written in the page cache.
+Using C<cache=writeback>, write will be considered as completed as soon
+as the the data is store in the kernel page cache, before it reached the
+actual stoarge.
+
+Using C<cache=writethrough>, write will be considered as completed only
+when the data reach actual storage, minimizing use of kernel page cache.
+reading will try to minimize use of the page cache.
+
+The default is C<writeback>.
=item B<fadvise=normal>
@@ -73,7 +80,7 @@ the file sequentially one time (eg for making a single copy or backup)
then this will stop other processes from being evicted from the page
cache:
- nbdkit file disk.img fadvise=sequential cache=none
+ nbdkit file disk.img fadvise=sequential cache=writethrough
=head2 Files on tmpfs
diff --git a/tests/test-gzip.c b/tests/test-gzip.c
index 8f81c5b7..b1685098 100644
--- a/tests/test-gzip.c
+++ b/tests/test-gzip.c
@@ -62,7 +62,7 @@ main (int argc, char *argv[])
/* Test the new filter. */
if (test_start_nbdkit ("file", "--filter=gzip", disk,
- "fadvise=sequential", "cache=none",
+ "fadvise=sequential", "cache=writethrough",
NULL) == -1)
exit (EXIT_FAILURE);
do_test ();
--
2.25.4
4 years, 4 months
[PATCH nbdkit] plugins: python: Fix imageio example instructions
by Nir Soffer
Fix the instructions so they should work now on CentOS 8.2:
- Add note about oVirt setup
- List all required packages
- Add missing -W flag to the qemu-img command
I tested this on Fedora 32, with ovirt-engine-sdk built from source,
since the package is not available on Fedora 31 or 32, and the Fedora
30 package cannot be installed on Fedora 32.
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
plugins/python/examples/imageio.py | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/plugins/python/examples/imageio.py b/plugins/python/examples/imageio.py
index f8ade7a2..38ad5ac4 100644
--- a/plugins/python/examples/imageio.py
+++ b/plugins/python/examples/imageio.py
@@ -4,10 +4,18 @@
#
# Upload and download images to oVirt with nbdkit and qemu-img.
#
-# Install ovirt-imageio-client
+# This assumes you have an oVirt environment. If you don't please see
+# https://ovirt.org/ for info on how to install oVirt.
#
-# dnf copr enable nsoffer/ovirt-imageio-preview
-# dnf install ovirt-imageio-client
+# Install ovirt-release package:
+#
+# dnf install http://resources.ovirt.org/pub/yum-repo/ovirt-release-master.rpm
+#
+# Install required packages:
+#
+# dnf install ovirt-imageio-client python3-ovirt-engine-sdk4
+#
+# Note: python3-ovirt-engine-sdk4 is not available yet for Fedora 31 and 32.
#
# To upload or download images, you need to start an image transfer. The
# easiest way is using oVirt image_transfer.py example:
@@ -33,7 +41,7 @@
#
# To upload an image run:
#
-# qemu-img convert -n -f raw -O raw disk.img \
+# qemu-img convert -n -f raw -O raw -W disk.img \
# nbd+unix:///\?socket=/tmp/nbd.sock
#
# Downloading image is not efficient with this version, since we don't report
--
2.26.2
4 years, 4 months