Sorry for the late reply to this ...
On Tue, Apr 30, 2019 at 06:28:01PM +0200, Pino Toscano wrote:
On Friday, 9 February 2018 19:01:53 CEST Richard W.M. Jones wrote:
> My contention is that the libguestfs git repository is too large and
> unwieldy. There are too many separate, unrelated projects and as a
> result of that the source has too many dependencies and takes too long
> to build and test.
>
> The project divides (sort of) naturally into layers -- the library,
> the bindings, the various virt tools -- and could be split along those
> lines into separate projects which can then be released and evolve at
> their own pace.
As also other answers to this email say, splitting tools, and bindings
may be very complex, and thus for now it is still a too far goal.
However...
> My suggested split would be something like this:
>
> [...]
> virt-v2v and virt-p2v
I'd rather split virt-p2v in its own repository. There are various
reasons for this:
- it does not use libguestfs (the library), just the tools for testing
stuff
- the communication with virt-v2v is done via network, and its
capabilities are dynamically probed (so theoretically virt-p2v, and
virt-v2v can be used even when their versions are odd)
- it is written only in C
However, even if it looks simple, in reality there are number of common
things used from the rest of the libguestfs tree:
1) gnulib
We hardly use gnulib in virt-p2v. I think it's only used for
ignore-value.h, getprogname.h, and c-ctype.h, all of which are likely
to be easily worked around.
2) some build system bits (e.g. m4/guestfs-v2v.m4)
Right, although this in itself should be split up, so no bad thing.
3) auto-cleanup bits (e.g. CLEANUP_FREE), although only few are used
(CLEANUP_FREE, CLEANUP_FREE_STRING_LIST, CLEANUP_PCLOSE,
CLEANUP_FCLOSE, and CLEANUP_XMLFREETEXTWRITER)
4) other internal macros, i.e. guestfs-utils.h
Common code is a bit tricker, as is ...
5) the list of credits generated by the generator
(i.e. generator/authors.ml)
6) the p2v configuration generated by the generator
(i.e. generator/p2v_config.ml)
... the generator and ...
7) test images/data (phony images, and virt-tools)
test data.
8) the miniexpect module, right now out of the p2v subdirectory
This is only used by virt-p2v I think, so it could go with virt-p2v or
be made into a separate project.
Possible solutions may/might be:
1) add own submodule (use its own set of modules)
I think we should ditch gnulib as much as possible, so see above.
2) copy/implement them them locally: luckly they are not many, so
inlining them in configure.ac will not be a problem; the common
bits (e.g. the distro detection from os-release) can be split in
its own module in libguestfs, copying it in p2v
3/4) have a local version of them; not pretty, although they are not
that many
5) this list is reflected in two places: the p2v/about-authors.c file,
and the AUTHORS file (theoretically mandatory for automake, unless
"foreign" is used, which it is); my idea was to go back to a manually
written about-authors.c file without the libguestfs credits, leaving
the few p2v ones easy to manage; the same for the AUTHORS file
6) this is a bit more complex: my idea was to keep it as OCaml script
to run at build time, instead of being statically shipped at dist
time
7) create their own versions at test time using guestfish/virt-builder;
maybe use a fedora image, instead of a phony windows one (will avoid
hivex for the tests)
8)
So while I'm not a massive fan of git submodules, now that I have used
them a few times with riscv stuff, they do solve a certain problem as
long as they are managed carefully. I think the common code and the
generator are cases where a submodule or two would work.
Does this mean we need to move immediately to a submodule if just
splitting virt-p2v, or copy code as you suggest? Maybe not, because
you can imagine for just this project copying the code needed from the
common/ directory, and creating a new "mini-generator" for the project
which handles the little bits that need to be generated in virt-p2v.
However in the long term if we split up everything a submodule or two
does seem to make sense, so maybe we should start there?
The other problem is how to split the repository, as the various
bits
are in different places:
a) git filter-branch --subdirectory-filter p2v
+ very small repo with the current p2v subdirectory
+ preserves the history of the p2v subdirectory, with branches and tags
- missing all the other bits, which will have no history
- not usable to build older releases (e.g. for bisecting)
I'm not exactly sure what this does. Is this something to do with
preserving the history? TBH I don't think we need to bother with the
history -- it exists still in libguestfs.git.
b) create a work branch in libguestfs, then in that branch move/copy
all
the stuff making the p2v subdirectory build standalone there, and then
import the content of the p2v subdirectory of that branch in a new empty
repo
+ very small repo with the current p2v subdirectory
- no history, no tags nor branches
+ using a graft it is possible to "stitch" the history of the new repo
with the work branch in libguestfs
c) git filter-branch to remove all the bits not related to p2v from all
the commits
+ not that big repo
+ preserves the history of all the content, with branches and tags
- will take a very long time to create (e.g. iterate over and over to
find out what to remove)
- not usable to build older releases (e.g. for bisecting)
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/