On 7/14/23 11:53, Richard W.M. Jones wrote:
On Thu, Jul 13, 2023 at 07:10:52PM +0200, Laszlo Ersek wrote:
> On QEMU 7.2.0+, if "passt" is available, ask QEMU for passt
("stream")
> rather than SLIRP ("user") networking.
>
> For this, we need to run passt ourselves. Given that passt daemonizes by
> default, start it with our traditional function guestfs_int_cmd_run(). Ask
> passt to save its PID file, because in case something goes wrong before
> we're completely sure the appliance (i.e. QEMU) is up and running, we'll
> need to kill passt, the *grandchild*, ourselves.
>
> Pass "--one-off" to passt (same as libvirt). This way, once we have proof
> that QEMU has connected to passt (because the appliance shows signs of
> life), we need not clean up passt ourselves -- once QEMU exits, passt will
> see an EOF on the unix domain socket, and exit as well.
>
> Passt is way more flexible than SLIRP, and passt normally intends to
> imitate the host environment in the guest as much as possible. This means
> that, when switching from SLIRP to passt, the guest would see changes to
> the following:
>
> - guest IP address,
>
> - guest subnet mask,
>
> - host (= gateway) IP address,
>
> - host (= gateway) MAC address.
>
> Extract the SLIRP defaults into the new macros NETWORK_GW_IP and
> NETWORK_GW_MAC, and pass them explicitly to passt. In particular,
> "tests/rsync/test-rsync.sh" fails without setting the host address
> (NETWORK_GW_IP) properly.
What really matters is that programs like 'dnf' and 'apt' can be run.
Nothing that uses libguestfs networking ought to depend on the exact
IP address or other details of the current SLIRP implementation. So
these changes are fine. (rsync is a weird feature that we should have
pushed back on and not added ...)
OK, dnf does work (just tested it with "virt-builder --update" under the
cover letter).
> (These artifacts can be verified in the appliance with "virt-rescue
> --network", by running "ip addr", "ip route", and "ip
neighbor" at the
> virt-rescue prompt. There are four scenarios: two libguest backends, times
> passt being installed or not installed.)
>
> Bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=2184967
> Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
> ---
> lib/guestfs-internal.h | 26 ++++
> lib/launch-direct.c | 147 +++++++++++++++++++-
> 2 files changed, 168 insertions(+), 5 deletions(-)
>
> diff --git a/lib/guestfs-internal.h b/lib/guestfs-internal.h
> index 9ba4d4ad46cf..9f4f800e6d6e 100644
> --- a/lib/guestfs-internal.h
> +++ b/lib/guestfs-internal.h
> @@ -158,6 +158,32 @@ cleanup_mutex_unlock (pthread_mutex_t **ptr)
> #define NETWORK_ADDRESS "169.254.2.15"
> #define NETWORK_PREFIX "16"
>
> +/* The IP address and the MAC address of the host, as seen from the appliance.
> + *
> + * NETWORK_GW_IP should be the same as NETWORK_ADDRESS, only replacing
".15" in
> + * the rightmost octet with ".2". NETWORK_GW_MAC cannot be changed.
These
> + * restrictions are a consequence of the following landscape:
> + *
> + * libguestfs backend userspace network stack restrictions
> + * ------------------ ----------------------- --------------------------------
> + * direct passt None; both NETWORK_GW_IP and
> + * NETWORK_GW_MAC can be set on the
> + * passt command line.
> + *
> + * direct SLIRP SLIRP hard-codes NETWORK_GW_MAC.
> + *
> + * libvirt passt The domain XML does not expose
> + * either knob (RHBZ#2222766), even
> + * though passt could accept both.
> + *
> + * libvirt SLIRP The domain XML does not expose
> + * either knob (RHBZ#2222766), and
> + * SLIRP hard-codes NETWORK_GW_MAC
> + * anyway.
> + */
> +#define NETWORK_GW_IP "169.254.2.2"
> +#define NETWORK_GW_MAC "52:56:00:00:00:02"
> +
> /* Guestfs handle and associated structures. */
>
> /* State. */
> diff --git a/lib/launch-direct.c b/lib/launch-direct.c
> index 3f46f0509736..8d6ad025a4e1 100644
> --- a/lib/launch-direct.c
> +++ b/lib/launch-direct.c
> @@ -32,6 +32,7 @@
> #include <errno.h>
> #include <fcntl.h>
> #include <sys/types.h>
> +#include <sys/wait.h>
> #include <sys/time.h>
> #include <sys/resource.h>
> #include <sys/stat.h>
> @@ -48,6 +49,7 @@
> #include "guestfs-internal.h"
> #include "guestfs_protocol.h"
> #include "qemuopts.h"
> +#include "ignore-value.h"
>
> /* Per-handle data. */
> struct backend_direct_data {
> @@ -333,6 +335,110 @@ add_drives (guestfs_h *g, struct backend_direct_data *data,
> return 0;
> }
>
> +/**
> + * Launch passt such that it daemonizes.
> + *
> + * On error, -1 is returned; C<passt_pid> and C<sockpath> are not
modified.
> + *
> + * On success, 0 is returned. C<passt_pid> contains the PID of the passt
> + * background process. C<sockpath> contains the pathname of the unix domain
> + * socket where passt will accept a single connection.
> + */
> +static int
> +launch_passt (guestfs_h *g, long *passt_pid, char (*sockpath)[UNIX_PATH_MAX])
> +{
> + int rc;
> + char sockpath_local[sizeof *sockpath];
> + char *pid_path;
> + struct command *cmd;
> + int passt_status;
> + int passt_exit;
> + char *pid_str;
> + long passt_pid_local;
> + char *endptr;
> +
> + rc = -1;
> + if (guestfs_int_create_socketname (g, "passt.sock", &sockpath_local)
== -1)
> + return rc;
> +
> + pid_path = guestfs_int_make_pid_path (g, "passt");
> + if (pid_path == NULL)
> + return rc;
> +
> + cmd = guestfs_int_new_command (g);
> + if (cmd == NULL)
> + goto free_pid_path;
> +
> + guestfs_int_cmd_add_arg (cmd, "passt");
> + guestfs_int_cmd_add_arg (cmd, "--one-off");
> + guestfs_int_cmd_add_arg (cmd, "--socket");
> + guestfs_int_cmd_add_arg (cmd, sockpath_local);
> + guestfs_int_cmd_add_arg (cmd, "--pid");
> + guestfs_int_cmd_add_arg (cmd, pid_path);
> + guestfs_int_cmd_add_arg (cmd, "--address");
> + guestfs_int_cmd_add_arg (cmd, NETWORK_ADDRESS);
> + guestfs_int_cmd_add_arg (cmd, "--netmask");
> + guestfs_int_cmd_add_arg (cmd, NETWORK_PREFIX);
> + guestfs_int_cmd_add_arg (cmd, "--mac-addr");
> + guestfs_int_cmd_add_arg (cmd, NETWORK_GW_MAC);
> + guestfs_int_cmd_add_arg (cmd, "--gateway");
> + guestfs_int_cmd_add_arg (cmd, NETWORK_GW_IP);
> +
> + passt_status = guestfs_int_cmd_run (cmd);
> + if (passt_status == -1)
> + /* guestfs_int_cmd_run() reports errors internally, so just bail here */
> + goto close_cmd;
> +
> + if (WIFSIGNALED (passt_status)) {
> + error (g, _("passt was killed with signal %d"), WTERMSIG
(passt_status));
> + goto close_cmd;
> + }
> +
> + assert (WIFEXITED (passt_status));
This has a bit of a smell to it. There are definitely other
non-signalled ways that the wait could have failed. I think
we should turn this into an internal error, something like:
if (!WIFEXITED (passt_status)) {
error (g, "internal error: unexpected exit status from passt (%d)",
passt_status);
goto close_cmd;
}
Good idea; will do.
> + passt_exit = WEXITSTATUS (passt_status);
> + if (passt_exit != 0) {
> + error (g, _("passt exited with status %d"), passt_exit);
> + goto close_cmd;
> + }
> +
> + /* At this point passt has forked into the background, dropped privileges, and
> + * written a PID file. Due to "--one-off", passt will exit once our
QEMU
> + * appliance disappears (forcibly or cleanly); however, we still need the
> + * passt PID *temporarily*, so we can kill passt in case we encounter an error
> + * *before* starting the appliance.
> + */
> + if (guestfs_int_read_whole_file (g, pid_path, &pid_str, NULL) == -1)
> + /* Any error has been reported internally, so just bail. We can't kill
> + * passt here because we've failed to get its PID in the first place...
> + */
> + goto close_cmd;
Does passt definitely write the PID before exiting in the parent? I
suppose it must, but if not then we would have a race condition here,
and would need to instead loop for a bit waiting for the PID to
appear.
I've been trying to avoid looking at the passt source code, but now
you're forcing me to :)
The central repo is at <
https://passt.top/passt>...
So the answer to your question is "yes". What passt does, when
"--foreground" is *not* passed to it, is:
- it opens/creates the PID file before forking,
- it forks,
- the parent process writes the child's PID to the PID file via
the previously opened file descriptor,
- the parent process exits.
So it's not the child process that writes out its own PID.
> + errno = 0;
> + passt_pid_local = strtol (pid_str, &endptr, 10);
> + if (endptr == pid_str || (*endptr != '\0' && *endptr !=
'\n') || errno != 0 ||
> + passt_pid_local <= 1) {
> + /* Same thing, we can't kill passt just yet. */
> + error (g, _("failed to parse passt PID from '%s'"),
pid_path);
> + goto free_pid_str;
> + }
> +
> + /* We're done. */
> + *passt_pid = passt_pid_local;
> + ignore_value (strcpy (*sockpath, sockpath_local));
> + rc = 0;
> +
> +free_pid_str:
> + free (pid_str);
> +
> +close_cmd:
> + guestfs_int_cmd_close (cmd);
> +
> +free_pid_path:
> + free (pid_path);
> +
> + return rc;
> +}
> +
> static int
> launch_direct (guestfs_h *g, void *datav, const char *arg)
> {
> @@ -340,6 +446,7 @@ launch_direct (guestfs_h *g, void *datav, const char *arg)
> struct qemuopts *qopts = NULL;
> int daemon_accept_sock = -1, console_sock = -1;
> int r;
> + long passt_pid = -1;
> int flags;
> int sv[2];
> struct sockaddr_un addr;
> @@ -658,11 +765,30 @@ launch_direct (guestfs_h *g, void *datav, const char *arg)
>
> /* Enable user networking. */
> if (g->enable_network) {
> - start_list ("-netdev") {
> - append_list ("user");
> - append_list ("id=usernet");
> - append_list ("net=" NETWORK_ADDRESS "/" NETWORK_PREFIX);
> - } end_list ();
> + /* If qemu is 7.2.0+ and "passt" is available, ask for passt rather
> + * than SLIRP. RHBZ#2184967.
> + */
> + if (guestfs_int_version_ge (&data->qemu_version, 7, 2, 0) &&
> + guestfs_int_passt_runnable (g)) {
> + char passt_sock[UNIX_PATH_MAX];
> +
> + if (launch_passt (g, &passt_pid, &passt_sock) == -1)
> + goto cleanup0;
> +
> + start_list ("-netdev") {
> + append_list ("stream");
> + append_list ("id=usernet");
> + append_list ("addr.type=unix");
> + append_list_format ("addr.path=%s", passt_sock);
> + } end_list ();
> + }
> + else {
> + start_list ("-netdev") {
> + append_list ("user");
> + append_list ("id=usernet");
> + append_list ("net=" NETWORK_ADDRESS "/"
NETWORK_PREFIX);
> + } end_list ();
> + }
> start_list ("-device") {
> append_list (VIRTIO_DEVICE_NAME ("virtio-net"));
> append_list ("netdev=usernet");
> @@ -893,6 +1019,15 @@ launch_direct (guestfs_h *g, void *datav, const char *arg)
>
> debug (g, "appliance is up");
>
> + /* From this point onward, even if we fail, QEMU terminating (forcefully or
> + * gracefully) will cause passt to go away as well. Note that we can't
> + * precisely tell whether QEMU managed to open the passt socket before QEMU
> + * failed. Therefore, err on the side of killing passt needlessly, rather
> + * than not killing it when needed -- that's why we re-set
"passt_pid" to (-1)
> + * only this late during QEMU startup verification.
> + */
> + passt_pid = -1;
It's a reasonable trade-off, but shame we cannot open the Unix fd and
pass it to qemu.
Technically speaking, we could do that; I didn't want to do it.
According to the passt(1) and qrap(1) manuals, initially, QEMU's
"stream" netdev backend *only* supported a received fd, and didn't
support opening a unix domain socket by pathname. That was what the
temporary wrapper qrap(1) was invented for. IIUC, qrap(1) would launch
passt, connect to the unix domain socket itself, and then execute QEMU,
passing it the file descriptor.
For this series however, connecting to the unix domain socket ourselves
looked more cumbersome (sockaddr massaging, a connect() call that could
potentially block forever, tweaking the close_file_descriptors() filter,
etc).
> /* This is possible in some really strange situations, such as
> * guestfsd starts up OK but then qemu immediately exits. Check for
> * it because the caller is probably expecting to be able to send
> @@ -924,6 +1059,8 @@ launch_direct (guestfs_h *g, void *datav, const char *arg)
> data->qemu_data = NULL;
>
> cleanup0:
> + if (passt_pid != -1)
> + kill (passt_pid, SIGTERM);
> if (qopts != NULL)
> qemuopts_free (qopts);
> if (daemon_accept_sock >= 0)
Looks generally good to me. If there are no issues with comments I
made above then you can consider it reviewed.
I'd like to address all comments in a v2; except the fd-passing, if
possible.
Thanks!
Laszlo
Rich.