execvp() [01] is powerful:
- it performs PATH search [02] if necessary,
- it falls back to executing the file with the shell as a shell script in
case the underlying execv() or execve() fails with ENOEXEC.
However, execvp() is not async-signal-safe [03], and so we shouldn't call
it in a child process forked from a multi-threaded parent process [04],
which the libnbd client application may well be.
Introduce three APIs for safely emulating execvp() with execve():
- nbd_internal_execvpe_init(): to be called in the parent process, before
fork(). Allocate and format candidate pathnames for execution with
execve(), based on PATH. Allocate a buffer for the longer command line
that the shell script fallback needs.
- nbd_internal_fork_safe_execvpe(): to be called in the child process. Try
the candidates formatted previously in sequence with execve(), as long
as execve() fails with errors related to pathname resolution /
inode-level file type / execution permission. Stop iterating on fatal
errors; if the fatal error is ENOEXEC, activate the shell script
fallback. As a small added feature, take the environment from an
explicit parameter (effectively imitating the GNU-specific execvpe()
function -- cf. commit dc64ac5cdd0b, "states: Use execvp instead of
execvpe", 2019-11-15).
- nbd_internal_execvpe_uninit(): to be called in the parent process, after
fork(). Release the resources allocated with
nbd_internal_execvpe_init().
The main idea with the above distribution of responsibilities is that any
pre-scanning of PATH -- which was suggested by Eric Blake -- should not
include activities that are performed by execve() anyway. I've considered
and abandoned two approaches:
- During scanning (i.e., in nbd_internal_execvpe_init()), check each
candidate with access(X_OK) [05].
I dropped this because the informative APPLICATION USAGE section of the
same specification [06] advises against it, saying "[a]n application
should instead attempt the action itself and handle the [EACCES] error
that occurs if the file is not accessible".
- During scanning, open each candidate with O_EXEC [07] until one open()
succeeds. Save the sole file descriptor for
nbd_internal_fork_safe_execvpe(). In the latter, call fexecve() [08],
which is free of all error conditions that necessitate iteration over
candidates. Still implement the ENOEXEC (shell script) fallback.
I dropped this because (a) fexecve() is a quite recent interface
(highlighted by Eric); (b) for an application to open a file for
execution with fexecve(), it's more robust to pass O_EXEC to open() than
not to pass it, per APPLICATION USAGE [09], but on Linux/glibc, O_EXEC
does not seem supported, only O_PATH does [10].
Thus the chosen approach -- pre-generate filenames -- contains a small
TOCTTOU race (highlighted by Eric) after all, but it should be harmless.
Implementation-defined details:
- If PATH searching is necessary, but PATH is absent [01], then fall back
to confstr(_CS_PATH) [11], similarly to Linux/glibc execvpe() [12].
If PATH is set but empty ("set to null") [02], or PATH is unset and
confstr(_CS_PATH) fails or returns no information or returns the empty
string, we fail the initial scanning (!) with ENOENT.
Details chosen for unspecified behavior:
- Hardwire "/bin/sh" as the shell's pathname [01].
[01]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/execvp.html
[02]
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#...
[03]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html...
[04]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
[05]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html
[06]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html#ta...
[07]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
[08]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html
[09]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fexecve.html#t...
[10]
https://man7.org/linux/man-pages/man2/open.2.html
[11]
https://pubs.opengroup.org/onlinepubs/9699919799/functions/confstr.html
[12]
https://man7.org/linux/man-pages/man3/execvpe.3.html
Suggested-by: Eric Blake <eblake(a)redhat.com>
Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
Reviewed-by: Eric Blake <eblake(a)redhat.com>
Reviewed-by: Richard W.M. Jones <rjones(a)redhat.com>
---
Notes:
v4:
- remove double space typo from commit message [Eric]
- remove the allusion to bash compatibility in the
"Implementation-defined details" part of the commit message, where the
latter discusses PATH being "set to null" [Eric]
- pick up R-b's from Eric and Rich
- keep the #include "..." list sorted -- #include
"checked-overflow.h"
above "minmax.h", not below it
- in get_path(), replace the FIXME comments with notes that explain why
we don't lock the environment [Eric]
lib/internal.h | 22 ++
lib/utils.c | 355 ++++++++++++++++++++
2 files changed, 377 insertions(+)
diff --git a/lib/internal.h b/lib/internal.h
index 7543f545cd95..35cb5e8994ee 100644
--- a/lib/internal.h
+++ b/lib/internal.h
@@ -368,6 +368,18 @@ struct command {
uint32_t error; /* Local errno value */
};
+struct execvpe {
+ string_vector pathnames;
+
+ /* Note: "const_string_vector" is not a good type for "sh_argv"
below. Even if
+ * we reserved enough space in a "const_string_vector",
+ * const_string_vector_append() would still not be async-signal-safe, due to
+ * the underlying const_string_vector_insert() calling assert().
+ */
+ char **sh_argv;
+ size_t num_sh_args;
+};
+
/* Test if a callback is "null" or not, and set it to null. */
#define CALLBACK_IS_NULL(cb) ((cb).callback == NULL && (cb).free == NULL)
#define CALLBACK_IS_NOT_NULL(cb) (! CALLBACK_IS_NULL ((cb)))
@@ -542,4 +554,14 @@ extern void nbd_internal_fork_safe_assert (int result, const char
*file,
__func__, #expression))
#endif
+extern int nbd_internal_execvpe_init (struct execvpe *ctx, const char *file,
+ size_t num_args)
+ LIBNBD_ATTRIBUTE_NONNULL (1, 2);
+extern void nbd_internal_execvpe_uninit (struct execvpe *ctx)
+ LIBNBD_ATTRIBUTE_NONNULL (1);
+extern int nbd_internal_fork_safe_execvpe (struct execvpe *ctx,
+ const string_vector *argv,
+ char * const *envp)
+ LIBNBD_ATTRIBUTE_NONNULL (1, 2, 3);
+
#endif /* LIBNBD_INTERNAL_H */
diff --git a/lib/utils.c b/lib/utils.c
index a8c6c217e2e4..22b466d48258 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -29,6 +29,7 @@
#include <sys/uio.h>
#include "array-size.h"
+#include "checked-overflow.h"
#include "minmax.h"
#include "internal.h"
@@ -505,3 +506,357 @@ nbd_internal_fork_safe_assert (int result, const char *file, long
line,
assertion, "' failed.\n", (char *)NULL);
abort ();
}
+
+/* Returns the value of the PATH environment variable -- falling back to
+ * confstr(_CS_PATH) if PATH is absent -- as a dynamically allocated string. On
+ * failure, sets "errno" and returns NULL.
+ */
+static char *
+get_path (void)
+ LIBNBD_ATTRIBUTE_ALLOC_DEALLOC (free)
+{
+ char *path;
+ bool env_path_found;
+ size_t path_size, path_size2;
+
+ /* Note: per POSIX, here we should lock the environment, even just for
+ * getenv(). However, glibc and any other high-quality libc will not be
+ * modifying "environ" during getenv(), and no sane application should
modify
+ * the environment after launching threads.
+ */
+ path = getenv ("PATH");
+ if ((env_path_found = (path != NULL)))
+ path = strdup (path);
+ /* This is where we'd unlock the environment. */
+
+ if (env_path_found) {
+ /* This handles out-of-memory as well. */
+ return path;
+ }
+
+ errno = 0;
+ path_size = confstr (_CS_PATH, NULL, 0);
+ if (path_size == 0) {
+ /* If _CS_PATH does not have a configuration-defined value, just store
+ * ENOENT to "errno".
+ */
+ if (errno == 0)
+ errno = ENOENT;
+
+ return NULL;
+ }
+
+ path = malloc (path_size);
+ if (path == NULL)
+ return NULL;
+
+ path_size2 = confstr (_CS_PATH, path, path_size);
+ assert (path_size2 == path_size);
+ return path;
+}
+
+/* nbd_internal_execvpe_init() and nbd_internal_fork_safe_execvpe() together
+ * present an execvp() alternative that is async-signal-safe.
+ *
+ * nbd_internal_execvpe_init() may only be called before fork(), for filling in
+ * the caller-allocated, uninitialized "ctx" structure. If
+ * nbd_internal_execvpe_init() succeeds, then fork() may be called.
+ * Subsequently, in the child process, nbd_internal_fork_safe_execvpe() may be
+ * called with the inherited "ctx" structure, while in the parent process,
+ * nbd_internal_execvpe_uninit() must be called to uninitialize (evacuate) the
+ * "ctx" structure.
+ *
+ * On failure, "ctx" will not have been modified, "errno" is set, and
-1 is
+ * returned. Failures include:
+ *
+ * - Errors forwarded from underlying functions such as strdup(), confstr(),
+ * malloc(), string_vector_append().
+ *
+ * - ENOENT: "file" is an empty string.
+ *
+ * - ENOENT: "file" does not contain a <slash> "/" character,
the PATH
+ * environment variable is not set, and confstr() doesn't associate a
+ * configuration-defined value with _CS_PATH.
+ *
+ * - ENOENT: "file" does not contain a <slash> "/" character,
and: (a) the PATH
+ * environment variable is set to the empty string, or (b) PATH is not
+ * set, and confstr() outputs the empty string for _CS_PATH.
+ *
+ * - EOVERFLOW: the sizes or counts of necessary objects could not be expressed.
+ *
+ * - EINVAL: "num_args" is less than 2.
+ *
+ * On success, the "ctx" structure will have been filled in, and 0 is
returned.
+ *
+ * - "pathnames" member:
+ *
+ * - All strings pointed-to by elements of the "pathnames" string_vector
+ * member are owned by "pathnames".
+ *
+ * - If "file" contains a <slash> "/" character, then the
sole entry in
+ * "pathnames" is a copy of "file".
+ *
+ * - If "file" does not contain a <slash> "/" character:
+ *
+ * Let "system path" be defined as the value of the PATH environment
+ * variable, if the latter exists, and as the value output by confstr() for
+ * _CS_PATH otherwise. Per the ENOENT specifications above, "system path"
is
+ * a non-empty string. Let "system path" further be of the form
+ *
+ * <prefix_0> [n = 1]
+ *
+ * or
+ *
+ * <prefix_0>:<prefix_1>:...:<prefix_(n-1)> [n >= 2]
+ *
+ * where for each 0 <= i < n, <prefix_i> does not contain the
<colon> ":"
+ * character. In the (n = 1) case, <prefix_0> is never empty (see ENOENT
+ * above), while in the (n >= 2) case, any individual <prefix_i> may or may
+ * not be empty.
+ *
+ * The "pathnames" string_vector member has n elements; for each 0 <= i
< n,
+ * element i is of the form
+ *
+ * suffix(curdir(<prefix_i>))file
+ *
+ * where
+ *
+ * curdir(x) := "." if x = ""
+ * curdir(x) := x otherwise
+ *
+ * and
+ *
+ * suffix(x) := x if "x" ends with a <slash> "/"
+ * suffix(x) := x "/" otherwise
+ *
+ * This transformation implements the POSIX XBD / PATH environment variable
+ * semantics, creating candidate pathnames for execution by
+ * nbd_internal_fork_safe_execvpe(). nbd_internal_fork_safe_execvpe() will
+ * iterate over the candidate pathnames with execve() until execve()
+ * succeeds, or fails with an error that is due to neither pathname
+ * resolution, nor the candidate not being a regular file, nor the candidate
+ * lacking execution permission.
+ *
+ * - The "sh_argv" array member will have at least (num_args + 1) elements
+ * allocated, and none populated.
+ *
+ * (The minimum value of "num_args" is 2 -- see EINVAL above. According to
+ * POSIX, "[t]he argument /arg0/ should point to a filename string that is
+ * associated with the process being started by one of the /exec/ functions",
+ * plus "num_args" includes the null pointer that terminates the argument
+ * list.)
+ *
+ * This allocation is made in anticipation of execve() failing for a
+ * particular candidate inside nbd_internal_fork_safe_execvpe() with ENOEXEC
+ * ("[t]he new process image file has the appropriate access permission but
+ * has an unrecognized format"). While that failure terminates the iteration,
+ * the failed call
+ *
+ * execve (pathnames[i],
+ * { argv[0], argv[1], ..., NULL }, // (num_args >= 2) elements
+ * { envp[0], envp[1], ..., NULL })
+ *
+ * must be repeated as
+ *
+ * execve (<shell-path>,
+ * { argv[0], pathnames[i], // ((num_args + 1) >= 3)
+ argv[1], ..., NULL }, // elements
+ * { envp[0], envp[1], ..., NULL })
+ *
+ * for emulating execvp(). The allocation in the "sh_argv" member makes it
+ * possible just to *link* the original "argv" elements and the
"pathnames[i]"
+ * candidate into the right positions.
+ *
+ * (POSIX leaves the shell pathname unspecified; "/bin/sh" should be good
+ * enough.)
+ *
+ * The shell *binary* will see itself being executed under the name
"argv[0]",
+ * will receive "pathnames[i]" as the pathname of the shell *script* to read
+ * and interpret ("command_file" in POSIX terminology), will expose
+ * "pathnames[i]" as the positional parameter $0 to the script, and will
+ * forward "argv[1]" and the rest to the script as positional parameters $1
+ * and onward.
+ */
+int
+nbd_internal_execvpe_init (struct execvpe *ctx, const char *file,
+ size_t num_args)
+{
+ int rc;
+ char *sys_path;
+ string_vector pathnames;
+ char *pathname;
+ size_t num_sh_args;
+ char **sh_argv;
+ size_t sh_argv_bytes;
+
+ rc = -1;
+
+ if (file[0] == '\0') {
+ errno = ENOENT;
+ return rc;
+ }
+
+ /* First phase. */
+ sys_path = NULL;
+ pathnames = (string_vector)empty_vector;
+
+ if (strchr (file, '/') == NULL) {
+ size_t file_len;
+ const char *sys_path_element, *scan;
+ bool finish;
+
+ sys_path = get_path ();
+ if (sys_path == NULL)
+ return rc;
+
+ if (sys_path[0] == '\0') {
+ errno = ENOENT;
+ goto free_sys_path;
+ }
+
+ pathname = NULL;
+ file_len = strlen (file);
+ sys_path_element = sys_path;
+ scan = sys_path;
+ do {
+ assert (sys_path_element <= scan);
+ finish = (*scan == '\0');
+ if (finish || *scan == ':') {
+ const char *sys_path_copy_start;
+ size_t sys_path_copy_size;
+ size_t sep_copy_size;
+ size_t pathname_size;
+ char *p;
+
+ if (scan == sys_path_element) {
+ sys_path_copy_start = ".";
+ sys_path_copy_size = 1;
+ } else {
+ sys_path_copy_start = sys_path_element;
+ sys_path_copy_size = scan - sys_path_element;
+ }
+
+ assert (sys_path_copy_size >= 1);
+ sep_copy_size = (sys_path_copy_start[sys_path_copy_size - 1] != '/');
+
+ if (ADD_OVERFLOW (sys_path_copy_size, sep_copy_size, &pathname_size) ||
+ ADD_OVERFLOW (pathname_size, file_len, &pathname_size) ||
+ ADD_OVERFLOW (pathname_size, 1u, &pathname_size)) {
+ errno = EOVERFLOW;
+ goto empty_pathnames;
+ }
+
+ pathname = malloc (pathname_size);
+ if (pathname == NULL)
+ goto empty_pathnames;
+ p = pathname;
+
+ memcpy (p, sys_path_copy_start, sys_path_copy_size);
+ p += sys_path_copy_size;
+
+ memcpy (p, "/", sep_copy_size);
+ p += sep_copy_size;
+
+ memcpy (p, file, file_len);
+ p += file_len;
+
+ *p++ = '\0';
+
+ if (string_vector_append (&pathnames, pathname) == -1)
+ goto empty_pathnames;
+ /* Ownership transferred. */
+ pathname = NULL;
+
+ sys_path_element = scan + 1;
+ }
+
+ ++scan;
+ } while (!finish);
+ } else {
+ pathname = strdup (file);
+ if (pathname == NULL)
+ return rc;
+
+ if (string_vector_append (&pathnames, pathname) == -1)
+ goto empty_pathnames;
+ /* Ownership transferred. */
+ pathname = NULL;
+ }
+
+ /* Second phase. */
+ if (num_args < 2) {
+ errno = EINVAL;
+ goto empty_pathnames;
+ }
+ if (ADD_OVERFLOW (num_args, 1u, &num_sh_args) ||
+ MUL_OVERFLOW (num_sh_args, sizeof *sh_argv, &sh_argv_bytes)) {
+ errno = EOVERFLOW;
+ goto empty_pathnames;
+ }
+ sh_argv = malloc (sh_argv_bytes);
+ if (sh_argv == NULL)
+ goto empty_pathnames;
+
+ /* Commit. */
+ ctx->pathnames = pathnames;
+ ctx->sh_argv = sh_argv;
+ ctx->num_sh_args = num_sh_args;
+ rc = 0;
+ /* Fall through, for freeing temporaries. */
+
+empty_pathnames:
+ if (rc == -1) {
+ free (pathname);
+ string_vector_empty (&pathnames);
+ }
+
+free_sys_path:
+ free (sys_path);
+
+ return rc;
+}
+
+void
+nbd_internal_execvpe_uninit (struct execvpe *ctx)
+{
+ free (ctx->sh_argv);
+ ctx->num_sh_args = 0;
+ string_vector_empty (&ctx->pathnames);
+}
+
+int
+nbd_internal_fork_safe_execvpe (struct execvpe *ctx, const string_vector *argv,
+ char * const *envp)
+{
+ size_t pathname_idx;
+
+ NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->pathnames.len > 0);
+
+ pathname_idx = 0;
+ do {
+ (void)execve (ctx->pathnames.ptr[pathname_idx], argv->ptr, envp);
+ if (errno != EACCES && errno != ELOOP && errno != ENAMETOOLONG
&&
+ errno != ENOENT && errno != ENOTDIR)
+ break;
+
+ ++pathname_idx;
+ } while (pathname_idx < ctx->pathnames.len);
+
+ if (errno == ENOEXEC) {
+ char **sh_argp;
+ size_t argv_idx;
+
+ NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->num_sh_args >= argv->len);
+ NBD_INTERNAL_FORK_SAFE_ASSERT (ctx->num_sh_args - argv->len == 1);
+
+ sh_argp = ctx->sh_argv;
+ *sh_argp++ = argv->ptr[0];
+ *sh_argp++ = ctx->pathnames.ptr[pathname_idx];
+ for (argv_idx = 1; argv_idx < argv->len; ++argv_idx)
+ *sh_argp++ = argv->ptr[argv_idx];
+
+ (void)execve ("/bin/sh", ctx->sh_argv, envp);
+ }
+
+ return -1;
+}