This is something I've been working on: Reworking inspection so it's
not a big mess of ad hoc C code, but instead uses a well-defined
domain-specific language to describe how we inspect guests.
The best introduction to this is the manual page, which I include
below (it's also included in patch 2/3).
Rich.
----------------------------------------------------------------------
NAME
guestfs-inspection - guestfs inspection program
SYNOPSIS
guestfs-inspection
NOTE
This man page documents the guestfs inspection program. If you want to
read about guestfs inspection then this is the wrong place. See
"INSPECTION" in guestfs(3) instead.
DESCRIPTION
guestfs-inspection is a standalone program that performs inspection on
the local disks, to find out what operating system(s) are installed. It
normally runs inside the libguestfs appliance, started by guestfsd(8),
when the caller uses the guestfs_inspect_os API (see
guestfs-internals(1) and "guestfs_inspect_os" in guestfs(3)). You
should never need to run this program by hand.
The program looks at all disks attached to the appliance, looking for
filesystems that might belong to operating systems. It may mount these
temporarily to examine them for Linux configuration files, Windows
Registries, and so on. It then tries to determine what operating
system(s) are installed on the disks. It is able to detect many
different Linux distributions, Windows, BSD, and others. The currently
mounted root filesystem is ignored, since when running under
libguestfs, that filesystem is part of the libguestfs appliance (this
is the main difference compared to programs like facter).
Guestfs-inpection is written in C, but most of the C is generated by a
rules compiler from a set of inspection rules written in a more
compact, declarative, Prolog-inspired language. If you want to write or
modify the rules, see "WRITING RULES" below.
OPTIONS
-?
--help
Display brief help.
-v
--verbose
Enable verbose messages for debugging.
WRITING RULES
Inspection is performed according to a set of rules written in a
compact, declarative, Prolog-inspired language. This section explains
how this language works, so you can write your own rules to detect
other operating systems.
The rules can be found starting in inspection/inspection.rules (in the
libguestfs sources). The rules are compiled down to C and linked into
the guestfs-inspection program, together with a bit of extra C code to
provide runtime support.
Facts
Facts are what we try to determine about the operating system(s) we are
inspecting. They look like this:
Filesystem("/dev/sda1")
which means "/dev/sda1 is a filesystem".
File("/dev/sda1", "/etc/fstab")
which means "there exists a file called /etc/fstab on the /dev/sda1
filesystem".
Facts come in three flavours: true facts, false facts, and unknown
facts. False facts are written like this:
! File("/dev/sda1", "/etc/fstab")
which means "either /dev/sda1 is not a filesystem or there does not
exist a file called /etc/fstab on this filesystem".
Unknown facts are facts that we don't know if they are true or false
yet.
Rules
Rules are used to generate more facts. A simple rule for generating
File facts might look like this:
File(fs, filename) :-
Filesystem(fs),
{{
// some C code to mount 'fs' and check for 'filename'
}}.
You can read this as: "For all fs & filename, if fs is a filesystem,
and running the C code with parameters fs and filename returns true,
then File(fs, filename) is a true fact".
In the Prolog-inspired language, a comma (,) is the AND operator. A
semicolon (;) is the OR operator. :- is a backwards if-statement (the
condition is on the right, the conclusion is on the left). Also notice
the dot (.) which must follow each rule.
Uppercase identifiers are facts. Lowercase identifiers are variables.
All identifiers are case-sensitive.
Everything in {{ ... }} is embedded C code. In this case the C code
returns a true/false/error indication, but embedded C code can also do
more complicated things and return strings and lists as we'll see
later.
You can use parentheses (...) for grouping expressions on the right
hand side of the :- operator.
Program evaluation
Let's take a simple set of rules which you might use to detect a Fedora
root filesystem:
File(fs, filename) :-
Filesystem(fs),
{{
// some C code to mount 'fs' and check for 'filename'
}}.
Fedora(rootfs) :-
Filesystem(rootfs),
File(rootfs, "/etc/fedora-release").
When evaluating this program, there are two sets of facts, the true
facts and the false facts. Let's start with the false facts set being
empty, and let's seed the true facts set with some Filesystem facts:
true_facts = { Filesystem("/dev/sda1"), Filesystem("/dev/sda3")
}
false_facts = { } // empty set
Unknown facts are facts which don't appear in either set.
Evaluating the program works like this: We consider each rule in turn,
and see if we can find new true or false facts from it. These new facts
are added to the true or false facts sets. After looking at each rule
in the program, as long as at least one new fact was added to the true
facts set, we go back to the start of the rules and repeat over. We do
this until we can no longer add any new true facts, and then we're
done.
In the case of this program, we start with the File rule, and we
substitute (theoretically) every possible string for fs and filename.
For example, this substitution:
File("/dev/sda1", "/etc/fedora-release") :-
Filesystem("/dev/sda1"),
{{ // checks for file and returns false }}.
turns out to be false (because the C code doesn't find /etc/fstab in
/dev/sda1), so that yields a new false fact:
! File("/dev/sda1", "/etc/fedora-release")
But this substitution turns out to be true:
File("/dev/sda3", "/etc/fedora-release") :-
Filesystem("/dev/sda3"),
{{ // checks for file and returns true }}.
so that yields a new true fact:
File("/dev/sda3", "/etc/fedora-release")
In theory every possible string is tried, eg File("ardvark",
"foo123654"). That would take literally forever to run, but luckily the
rules compiler is smarter.
Looking now at the second rule, we try this substitution:
Fedora("/dev/sda3") :-
Filesystem("/dev/sda3"),
File("/dev/sda3", "/etc/fedora-release").
which yields another new true fact:
Fedora("/dev/sda3")
Because we added several new true facts to the set, we go back and
repeat the whole process. But after trying all the rules for a second
time, no more true facts can be added, so now we're done.
At the end, the set of true facts is:
true_facts = { Filesystem("/dev/sda1"), Filesystem("/dev/sda3"),
File("/dev/sda3", "/etc/fedora-release"),
Fedora("/dev/sda3") }
We don't care about the false facts -- they are discarded at the end of
the program.
The summary of inspection is that /dev/sda3 contains a Fedora root
filesystem.
Of course real inspection is much more complicated than this, but the
same program evaluation order is followed.
Some caveats with the language
It's easy to look at an expression like:
Fedora(rootfs) :-
Filesystem(rootfs),
File(rootfs, "/etc/fedora-release"). /* line 3 */
and think that line 3 is "calling" the "File function". This is
not
what is happening! Rules are not functions. Rules are considered in
isolation. Rules don't "call" other rules. Instead when trying to find
possible values that can be substituted into a rule, we only look at
the rule and the current sets of true and false facts.
When searching for values to subsitute, in theory the compiler would
have to look at every possible string. In practice of course it can't
and doesn't do that. Instead it looks at the current sets of true and
false facts to find strings to substitute. In the following rule:
File(fs, filename) :-
Filesystem(fs),
{{ // C code }}.
suitable choices for fs are found by looking at any Filesystem facts in
either the true or false sets.
In some cases, this doesn't work, as in the example above where we have
no clues for the filename variable. In that case the compiler tries
every string literal from every rule in the program. This can be
inefficient, but by modifying the rule slightly you can avoid this. In
the following program, only the strings /etc/fstab and
/etc/fedora-release would be tried:
Filename("/etc/fstab").
Filename("/etc/fedora-release").
File(fs, filename) :-
Filesystem(fs),
Filename(filename),
{{ // C code }}.
C expressions returning boolean
Simple C code enclosed in {{ ... }} as shown above should return a
true, false or error status only. It returns true by returning any
integer ≥ 1. It should return 0 to indicate false, and it should return
-1 to indicate an error (which stops the program and causes inspection
to fail with a user-visible error).
Here is an example of a simple C expression returning a boolean:
File(fs, filename) :-
Filesystem(fs),
{{
int r;
char *relative_filename;
r = get_mount (fs, filename, &relative_filename);
if (r != 1) return r;
r = access (relative_filename, F_OK);
free (relative_filename);
if (r == -1) {
if (errno == ENOENT || errno == ENOTDIR)
return 0;
perror ("access");
return -1;
}
return 1;
}}.
Notice that fs and filename are passed into the C code as local
variables.
You can see that dealing with errors is a bit involved, because we want
to fail hard if some error like EIO is thrown.
C expressions returning strings
C expressions can also return strings or tuples of strings. This is
useful where you want to parse the content of external files.
The syntax for this sort of C expression is:
(var1, var2, ...)={{ ... }}
where var1, var2, etc. are outputs from the C code.
In the following example, a lot of error checking has been omitted for
clarity:
ProductName(fs, product_name) :-
Unix_root(fs),
Distro(fs, "RHEL"),
(product_name)={{
int r;
char *line = NULL;
size_t n;
char *relative_filename;
r = get_mount (fs, "/etc/redhat-release", &relative_filename);
FILE *fp = fopen (relative_filename, "r");
free (relative_filename);
getline (&line, &n, fp);
fclose (fp);
set_product_name (line);
free (line);
return 0;
}}.
The C code calls a function set_product_name (that the compiler
generates).
The return value from the C code should be 0 if everything was OK, or
-1 if there is a error (which stops the whole program).
C expressions returning multiple results
Finally it is possible for C code to return multiple results.
The syntax is:
[var1, var2, ...]={{ ... }}
where var1, var2, etc. are outputs. Unlike the previous rules, these
rules may generate multiple facts from a single string substitution.
This is how we populate the initial list of true facts about
filesystems:
Filesystem(fs) :-
[fs]={{
int i;
for (i = 0; i < nr_filesystems; ++i) {
set_fs (fs[i]);
}
return 0;
}}.
In this case, the C code repeatedly calls a function set_fs (that the
compiler generates) for each new filesystem discovered. Multiple
Filesystem facts can be generated as a result of one application of
this rule.
The return value from the C code should be 0 if everything was OK, or
-1 if there is a error (which stops the whole program).
Type checking
The current language treats every value as a string. Every expression
is a boolean. One possible future enhancement is to handle other types.
There is still some minimal type checking applied:
* A fact name which appears on a right hand side of any rule must
also appear on the left hand side of a rule. This is mainly for
catching typos.
* A fact must have the same number of arguments ("arity") each time
it appears in the source.
Debugging
You can debug the evaluation of inspection programs by calling
guestfs_set_verbose (or setting $LIBGUESTFS_DEBUG=1) before launching
the handle.
This causes guestfsd(8) to pass the --verbose parameter to this
inspection program, which in turn causes the inspection program to
print information about what rules it is trying and what true/false
facts it has found. These are passed back to libguestfs and printed on
stderr (or sent to the event system if you are using that).
You can also print debug messages from C code embedded in {{...}}
expressions. These are similarly sent upwards through to libguestfs and
will appear on stderr.
EXIT STATUS
This program returns 0 if successful, or non-zero if there was an
error.
SEE ALSO
guestfsd(8), guestfs-hacking(1), guestfs-internals(1), "INSPECTION" in
guestfs(3),
http://libguestfs.org/.
AUTHOR
Richard W.M. Jones
http://people.redhat.com/~rjones/
COPYRIGHT
Copyright (C) 2009-2015 Red Hat Inc.
LICENSE
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
----------------------------------------------------------------------