On Wed, Nov 02, 2011 at 10:28:12AM -0700, Jeff Schroeder wrote:
As part of the oVirt Node[1] project, I'm going to P.O.C. a
version of
the node using febootstrap 2.x instead of livecd-creator. The main
reasons in doing this is for allowing a r/w root filesystem without
some hacky overlayfs and just to see if it can be done. On IRC,
rwmjones mentioned a few issues he ran into with febootstrap 2.x where
some rpms had %post scripts which aren't really safe inside a chroot.
It would be nice to get some more detail as to what problems exactly
have been encountered and what is the best solution for mitigating
this. It is a lot trickier to setup, but perhaps lxc would be better
than chroot for this type of thing?
The Node project is also interested in creating nodes using other
Linux distributions such as Debian or Ubuntu. That is why febootstrap
3.x really excited me until realizing it is for supermin appliances
only. What issues and challenges have you already encountered doing
something like this? It would be nice, but perhaps impossible to find
a mostly distribution neutral way of doing these things. Creating a
"chrootable" filesystem and then running image-minimizer seems like a
decent plan of attack for me.
[1]
https://fedoraproject.org/wiki/Ovirt_Node
As a general comment you might want to look at Oz, written by former
Red Hat'ter Chris Lalancette:
http://clalance.blogspot.com/2011/09/oz-070-release.html
Assumptions
-----------
Firstly so that we're all on the same page, here are some assumptions
I'm making about the appliance builder tool. These are all absolute
requirements for libguestfs, but maybe OVirt will not care about some
of them ...
(1) Must run as non-root.
This is so we can build an appliance as part of ./configure && make
(2) Must work across major Linux distros (Fedora, Debian at a minimum).
We want to work across the widest range of Linux distros.
(3) Must not require network access, but can use it if available.
This is so that we can run it inside Koji / Debian's build system.
(4) The result must be small enough that we can distribute it easily.
This is the basic idea behind the supermin appliance: you only
need to distribute a few Kbytes, instead of 100s of MBytes.
(5) Must be able to distribute security fixes to the appliance
distro easily.
Supermin appliances give us this for free. Monolithic 500MB disk
images are harder to update.
febootstrap history
-------------------
febootstrap 2.x ran yum in a chroot. Since yum doesn't normally want
to run as non-root, we had to run it using 'fakeroot'. Since chroot
requires root, we also used 'fakechroot' (patched heavily by myself).
On Debian in the 2.x days, we didn't use febootstrap, but we used a
very similar tool called 'debirf'.
The issue is %pre and %post scripts, and the Debian equivalents.
These can do all sorts of bad stuff, like trying to send signals to
daemons, operations that assume root, and so on. Running them is a
bad idea usually, and getting package maintainers to understand our
niche use-case is a losing proposition.
[As an aside, it would be good to minimize the use of %pre and %post
as far as possible. There are many, many Fedora packages that just
do ldconfig or have a lot of hassle for creating a user, or
installing a systemd unit. RPM would do well to grow support for
this, eg. some sort of %ldconfig directive. This would eliminate
the vast majority of scripts, and make the whole system a lot more
reliable.]
The second issue is that yum is really slow.
The third issue is that debirf+debootstrap was very buggy.
In febootstrap 3.x, we take a different approach. We examine the RPMs
themselves and produce the supermin appliance directly from them.
There is no yum or RPM installation involved, so no need for fakeroot
or fakechroot. All we need to do are some basic operations on RPMs --
eg. resolving dependencies, listing files (rpm -ql), extracting config
files -- and we can directly create the supermin appliance.
Since the operations we require are generic, we also wrote backends
for Debian (apt/dpkg) and ArchLinux, and you can extend this easily to
other distros assuming reasonable package management.
You'll note there is a two step process:
pile of RPMS -- build --> supermin appliance -- run --> appliance
(febootstrap) (febootstrap-supermin-
helper)
The reason for having two steps might not be obvious if you are only
interested in is appliances. However it is so that we can distribute
something very small (the supermin appliance, typically 100K - 1MB),
instead of distributing the huge appliance (typically 300-500MB).
The issue with febootstrap 3.x is with supermin appliances themselves:
the format [described here:
http://libguestfs.org/febootstrap.8.html#supermin_appliances] bakes in
the list of files contained in each RPM. This makes it fragile if an
RPM is updated and a file moves. The classic case is where a library
moves from /lib to /usr/lib. This should be a "no op" change, but it
breaks supermin appliances, since the path to the library is hard
coded.
febootstrap 4.x
---------------
So what I'd like to do for the next gen febootstrap 4.x is to move the
current logic of listing files from the build step to the run step.
The supermin appliance format would be changed so that it just lists
the RPMs (more precisely, the roots). Depsolving and listing of files
would move to the run step. An updated package would not break this
new format supermin appliance, since it's flexible enough to deal with
that.
Config files still have to be packaged up separately, but packages can
usually deal with old config files (since that situation happens
anyway during package updates), so that's not so much of a problem.
%pre and %post scripts
----------------------
On the issue of %pre and %post scripts, I think we'd want to do with
debootstrap does: don't run them, but save them into a directory on
the appliance, and run them on first boot of the appliance.
9pfs and making chroots
-----------------------
At the moment febootstrap-supermin-helper 3.x can only build a disk
image. However we should be able to build a chroot. There is
essentially no reason why not, it's just that we don't have a backend
for doing this at the moment.
Equally we'd like to start using virtio-9p in qemu instead of building
a disk image, since that should be faster. Again, this is just a
matter of making a backend to do this (or using the chroot backend).
We discussed also with IBM what we'd need to make virtio-9p easier to
use for this application.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw