I managed to get a reproduction with udevd --debug.

I just waited for g.inspect_os() (per g.launch()) to fail return anything (/dev/rhel/root symlink doesn't exist) in 32 concurrent executions

Attached output (~300KB).

Thanks!
Sam

On Tue, Apr 6, 2021 at 7:29 PM Sam Eiderman <sameid@google.com> wrote:
FYI, in debian the rules are a bit in a different flavor:

01-md-raid-creating.rules
50-firmware.rules
50-udev-default.rules

55-dm.rules
56-lvm.rules

60-block.rules
60-cdrom_id.rules
60-drm.rules
60-evdev.rules
60-input-id.rules
60-persistent-alsa.rules
60-persistent-input.rules
60-persistent-storage-dm.rules
60-persistent-storage-tape.rules
60-persistent-storage.rules
60-persistent-v4l.rules
60-sensor.rules
60-serial.rules
63-md-raid-arrays.rules
64-btrfs-dm.rules
64-btrfs.rules
64-md-raid-assembly.rules
69-lvm-metad.rules
69-md-clustered-confirm-device.rules
70-joystick.rules
70-mouse.rules
70-power-switch.rules
70-touchpad.rules
70-uaccess.rules
71-seat.rules
73-seat-late.rules
73-special-net-names.rules
73-usb-net-by-mac.rules
75-net-description.rules
75-probe_mtd.rules
78-sound-card.rules
80-debian-compat.rules
80-drivers.rules
80-net-setup-link.rules
82-gfs2-withdraw.rules
85-hwclock.rules
95-dm-notify.rules
99-systemd.rules

So maybe there is some difference here.

Sam

On Mon, Apr 5, 2021 at 9:31 PM Richard W.M. Jones <rjones@redhat.com> wrote:
On Mon, Apr 05, 2021 at 05:49:17PM +0300, Sam Eiderman wrote:
> Yes, attached below (it's quite long)

Thanks - just looking at the lines where the /dev/rhel/swap
device appears or its various synonyms:

> 2021-03-07 10:58:46 T libguestfs - 0 - appliance - drwxr-xr-x 2 0 0 60 Mar  7 10:58 rhel
[...]
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - /dev/mapper:
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - total 0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - crw------- 1 0 0 10, 236 Mar
>  7 10:58 control
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - lrwxrwxrwx 1 0 0       7 Mar
>  7 10:58 rhel-root -> ../dm-2
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - lrwxrwxrwx 1 0 0       7 Mar
>  7 10:58 rhel-swap -> ../dm-1

This is fairly early on during appliance boot where the init script is
listing out /dev/ recursively
(https://github.com/libguestfs/libguestfs/blob/b18ac489db76a700f2168ae6eb64e9d450613a27/appliance/init#L159).
This is also after we did udevadm settle in the init script.  It's
interesting that at this point /dev/mapper/rhel-swap exists correctly,
but /dev/rhel/ is an empty directory.

That's very much not expected ...

A bit later in the init script we list out PVs/VGs/LVs and everything
looks normal:

> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - + lvm pvs
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   PV         VG      Fmt
>  Attr PSize   PFree
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   /dev/sda2  rhel    lvm2
> a--  <15.00g    0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   /dev/sdb   vg_myvg lvm2
> a--  496.00m    0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - + lvm vgs
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   VG      #PV #LV #SN Attr  
> VSize   VFree
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   rhel      1   2   0 wz--n-
> <15.00g    0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance -   vg_myvg   1   1   0 wz--n-
> 496.00m    0
> 2021-03-07 10:58:47 T libguestfs - 0 - appliance - + lvm lvs
> 2021-03-07 10:58:48 T libguestfs - 0 - appliance -   LV     VG      Attr      
> LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
> 2021-03-07 10:58:48 T libguestfs - 0 - appliance -   root   rhel    -wi-a-----
>  13.39g
> 2021-03-07 10:58:48 T libguestfs - 0 - appliance -   swap   rhel    -wi-a-----
>   1.60g
> 2021-03-07 10:58:48 T libguestfs - 0 - appliance -   lv_var vg_myvg -wi-a-----
> 496.00m

A little bit further on while we're executing the guestfs_inspect_os()
API call, we have this snippet:

> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: lvm 'lvs' '-o'
> 'vg_name,lv_name' '-S' 'lv_role=public && lv_skip_activation!=yes'
> '--noheadings' '--separator' '/'
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: lvm returned 0
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: lvm: stdout:
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance -   rhel/root
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance -   rhel/swap
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance -   vg_myvg/lv_var
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid '-c' '/dev/
> null' '-o' 'value' '-s' 'TYPE' '/dev/rhel/root'
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid returned 0
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid: stdout:
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - xfs
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid '-c' '/dev/
> null' '-o' 'value' '-s' 'TYPE' '/dev/rhel/swap'
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid returned 2
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid '-c' '/dev/
> null' '-o' 'value' '-s' 'TYPE' '/dev/vg_myvg/lv_var'
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid returned 0
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - command: blkid: stdout:
> 2021-03-07 10:58:49 T libguestfs - 0 - appliance - xfs

If you look closely it's clear that /dev/rhel/root has been created,
/dev/vg_myvg/lv_var has been created (that was also an empty directory
earlier on), but /dev/rhel/swap has *not* been created--notice how the
blkid command exits with code 2.

And obviously by the time we get down here ...

> 2021-03-07 10:58:53 T libguestfs - 0 - appliance -   rhel/root
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance -   rhel/swap
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance -   vg_myvg/lv_var
> 2021-03-07 10:58:53 T libguestfs - 0 - appliance - guestfsd: error: stat: /dev/
> rhel/swap: No such file or directory

... it still doesn't exist.

We ran a udevadm --debug settle command before this, so if it wasn't
going to be created because of a delayed udev event then I'd expect it
should have done.

I'm not much closer to understanding why the /dev/VG/LV nodes are
created so much later/delayed than the /dev/mapper/VG-LV nodes.  As
far as I can tell they should be created by udev
(/usr/lib/udev/rules.d/11-dm-lvm.rules in Fedora).

Would suggest enabling udevadm debugging (uncomment #--debug in
appliance/init).

It would also be good to have some more concrete reproducer: you might
have appliance/init kill PID 1 if /dev/rhel/swap does not exist, then
repeatedly run g.launch() to see if you can make it fail, all the time
while trying to capture the log of the failure with udev debugging
turned on.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/