Re: [Libguestfs] [libguestfs/nbdkit] Building Windows Binaries (#14)
by Richard W.M. Jones
On Fri, Jul 16, 2021 at 03:12:42PM -0700, Nathan Shearer wrote:
> I am trying to build some windows binaries of nbdkit as I have some files on a
> windows host that I need to exported as nbd block devices to another system.
>
> I first attempted to configure and build nbdkit under cygwin with these
> packages installed on a windows 10 computer:
>
> • automake 11-1
> • gcc-core 10.2.0-1
> • gcc-g++ 10.2.0-1
> • libtool 2.4.6-7
> • make 4.3-1
> • pkg-config 1.6.3-1
>
> The autoconf and configure steps work fine, however it fails during the make
> step due to some compile-time bugs in the code relating to conflicting
> definitions in some included headers.
>
> Can you please include some more detail on how to cross-compile windows
> binaries of nbdkit, on a linux host. I assume your document is referring to
> debian/ubuntu? If you used a different distro please include that too as it
> will help me reproduce the process.
I've only built nbdkit using mingw (not cygwin), and only using
cross-compilation from Linux. It ought to work from Windows, but
you'd have to use mingw.
The full details of how to do that are in the README file under
"WINDOWS":
https://gitlab.com/nbdkit/nbdkit/-/blob/master/README
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
3 years, 3 months
[PATCH] v2v: rhv-upload-plugin: Fix waiting for finalize
by Nir Soffer
Waiting for image transfer finalize is complex. In the past we tried to
simplify the process by waiting on the disk status, but turns out that
due to the way oVirt lock the disk, this is not reliable.
These is finalize success flow:
1. User asks to finalize the transfer
2. oVirt sets transfer phase to FINALIZING_SUCCESS
3. oVirt finalizes the transfer
4. oVirt sets disk status to OK
5. oVirt unlocks the disk and changes transfer phase to FINISHED_SUCCESS
6. oVirt removes the transfer
In oVirt logs we can see that disk status changes to OK about 3 seconds
before the disk is actually unlocked. This is a very old problem that is
unlikely to be fixed.
The only way to wait for transfer completion is to poll the transfer
phase, but oVirt makes this hard by removing the transfer shortly after
it completes, so we may not be able to get the FINISHED_SUCCESS phase.
If the transfer was removed before we got one of the final phases, we
need to check the disk status to understand the result of transfer.
oVirt 4.4.7 made polling transfer phase easier by keeping the transfer
after completion, but we need to support older versions so we must have
generic code that work with any version.
To make debugging easier, we log the transfer phase during polling. Here
is a typical transfer log when finalizing transfer:
finalizing transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa
transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finished_success
transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa finalized in 5.153 seconds
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
v2v/rhv-upload-plugin.py | 102 +++++++++++++++++++++++++++------------
1 file changed, 71 insertions(+), 31 deletions(-)
diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py
index 07e879c9..11050358 100644
--- a/v2v/rhv-upload-plugin.py
+++ b/v2v/rhv-upload-plugin.py
@@ -607,17 +607,29 @@ def finalize_transfer(connection, transfer, disk_id):
"""
Finalize a transfer, making the transfer disk available.
- If finalizing succeeds, transfer's phase will change to FINISHED_SUCCESS
- and the transer's disk status will change to OK. On errors, the transfer's
- phase will change to FINISHED_FAILURE and the disk status will change to
- ILLEGAL and it will be removed. In both cases the transfer entity will be
- removed shortly after.
-
- If oVirt fails to finalize the transfer, transfer's phase will change to
- PAUSED_SYSTEM. In this case the disk's status will change to ILLEGAL and it
- will not be removed.
-
- For simplicity, we track only disk's status changes.
+ If finalizing succeeds, the transfer's disk status will change to OK
+ and transfer's phase will change to FINISHED_SUCCESS. Unfortunately,
+ the disk status is modified before the transfer finishes, and oVirt
+ may still hold a lock on the disk at this point.
+
+ The only way to make sure that the disk is unlocked, is to wait
+ until the transfer phase switches FINISHED_SUCCESS. Unfortunately
+ oVirt makes this hard to use because the transfer is removed shortly
+ after switching the phase to the final phase. However if the
+ transfer was removed, we can be sure that the disk is not unlocked,
+ since oVirt releases the locks before removing the transfer.
+
+ On errors, the transfer's phase will change to FINISHED_FAILURE and
+ the disk status will change to ILLEGAL and it will be removed. Again
+ the transfer will be removed shortly after that.
+
+ If oVirt fails to finalize the transfer, transfer's phase will
+ change to PAUSED_SYSTEM. In this case the disk's status will change
+ to ILLEGAL and it will not be removed.
+
+ oVirt 4.4.7 made waiting for transfer easier by keeping transfers
+ after they complete, but we must support older versions so we have
+ generic code that work with any version.
For more info see:
- http://ovirt.github.io/ovirt-engine-api-model/4.4/#services/image_transfer
@@ -632,34 +644,62 @@ def finalize_transfer(connection, transfer, disk_id):
transfer_service.finalize()
- disk_service = (connection.system_service()
- .disks_service()
- .disk_service(disk_id))
-
while True:
time.sleep(1)
try:
- disk = disk_service.get()
+ transfer = transfer_service.get()
except sdk.NotFoundError:
- # Disk verification failed and the system removed the disk.
- raise RuntimeError(
- "transfer %s failed: disk %s was removed"
- % (transfer.id, disk_id))
+ # Transfer was removed (ovirt < 4.4.7). We need to check the
+ # disk status to understand if the transfer was successful.
+ # Due to the way oVirt does locking, we know that the disk
+ # is unlocked at this point so we can check only once.
- if disk.status == types.DiskStatus.ILLEGAL:
- # Disk verification failed or transfer was paused by the system.
- raise RuntimeError(
- "transfer %s failed: disk is ILLEGAL" % transfer.id)
+ debug("transfer %s was removed, checking disk %s status"
+ % (transfer.id, disk_id))
- if disk.status == types.DiskStatus.OK:
- debug("transfer %s finalized in %.3f seconds"
- % (transfer.id, time.time() - start))
- break
+ disk_service = (connection.system_service()
+ .disks_service()
+ .disk_service(disk_id))
+
+ try:
+ disk = disk_service.get()
+ except sdk.NotFoundError:
+ raise RuntimeError(
+ "transfer %s failed: disk %s was removed"
+ % (transfer.id, disk_id))
+
+ debug("disk %s is %s" % (disk_id, disk.status))
+
+ if disk.status == types.DiskStatus.OK:
+ break
- if time.time() > start + timeout:
raise RuntimeError(
- "timed out waiting for transfer %s to finalize"
- % transfer.id)
+ "transfer %s failed: disk is %s" % (transfer.id, disk.status))
+ else:
+ # Transfer exists, check if it reached one of the final
+ # phases, or we timed out.
+
+ debug("transfer %s is %s" % (transfer.id, transfer.phase))
+
+ if transfer.phase == types.ImageTransferPhase.FINISHED_SUCCESS:
+ break
+
+ if transfer.phase == types.ImageTransferPhase.FINISHED_FAILURE:
+ raise RuntimeError(
+ "transfer %s has failed" % (transfer.id,))
+
+ if transfer.phase == types.ImageTransferPhase.PAUSED_SYSTEM:
+ raise RuntimeError(
+ "transfer %s was paused by system" % (transfer.id,))
+
+ if time.time() > start + timeout:
+ raise RuntimeError(
+ "timed out waiting for transfer %s to finalize, "
+ "transfer is %s"
+ % (transfer.id, transfer.phase))
+
+ debug("transfer %s finalized in %.3f seconds"
+ % (transfer.id, time.time() - start))
def transfer_supports_format():
--
2.26.3
3 years, 3 months
supermin root: race condition with multiple drives
by Brian Candler
Hi,
I discovered an issue when using libguestfs with large numbers of
attached disks. I submitted the details to github:
https://github.com/libguestfs/libguestfs/issues/69
... and then discovered that the mailing list is the right place, not
github. Sorry about that!
The problem is: I have batches of 40 or 50 qcow2 images to write files
to. It is very slow to start a separate libguestfs appliance for each
one, so what I do is to start a single one with 40 or 50 disks attached,
and then mount, upload and unmount each one in turn.
What I find is that sometimes the disks are attached in the wrong order,
such that the kernel tries to use one of these qcow2 files as its root
disk, instead of the supermin appliance image. This seems to happen
more often when the system is under load, such as when running multiple
libguestfs instances concurrently (I have 15 or 20 different versions of
these batches of 40-50 disks to create, so to speed things up, I run
them concurrently).
This is all "userland" stuff so ought to work fine under load, but the
supermin kernel booting issue messes it up intermittently.
Anyway, the github ticket has full details, including standalone scripts
which can reproduce the problem on my system. I'd be grateful if
someone could take a look.
Many thanks,
Brian Candler.
3 years, 3 months
Figuring out some failing tests for libnbd
by Martin Kletzander
I am preparing more patches for CI to run check-valgrind and fix ongoing
errors but there are two issues I can not identify the reason why they
are failing.
- On debian-10 the info/info-can.sh started failing and the error
message is just one of those I saw earlier in other places:
libnbd: debug: nbd1: nbd_opt_abort: leave: error="nbd_opt_abort:
invalid state: READY: the handle must be negotiating: Invalid
argument"
- On Fedora rawhide I hit a random issue where a port in a URI was
translated to its name and looking at the code I can not find how this
could have happened. Until this is fixed the test suite is unreliable
and notification fatigue will cause everyone to start ignoring any
future failures.
/builds/nertpinx/libnbd/tests/.libs/aio-connect: actual URI
nbd://127.0.0.1:altova-lm/ != expected URI nbd://127.0.0.1:35355/
- Both openSUSE builds are failing to run check-valgrind and it looks
like it might be unrelated to libnbd, although it would be nice for
someone else to confirm that. For now I have disabled check-valgrind
on those platforms in my branch.
- Similarly to openSUSE Ubuntu 20.04 fails in valgrind tests, but
somewhere down the GnuTLS rabbit hole, which I presume is unrelated
too, so I disabled check-valgrind on that one as well.
I will send the patches once they are cleaned up, but I wanted to let
everyone know what the current status is because eliminating all random
issues is essential to properly consuming CI results.
Thanks,
Martin
3 years, 3 months
Commit "v2v: Remove -o rhv-upload -oa preallocated"
by Nir Soffer
For some reason I did not see this change in the mailing list:
https://github.com/libguestfs/virt-v2v/commit/18084f90d9dd9092831cb348703...
The commit message claims:
> Using -oa preallocated with -o rhv-upload always gave an error. We
> should be able to implement this properly in modular virt-v2v, but as
> this option did nothing here remove it to simplify things.
But I used -oa preallocated and -oa sparse and it worked fine after removing
the unhelpful validation, and using:
sparse=params["output_sparse"]
The code was already broken even before this change with block storage domain
(iSCSI/FC) and raw format (the default), since raw-sparse is not supported in
block base storage, but now there is no way to fix the code.
We can always use qcow2-sparse - this combination is most useful in RHV
but using raw preallocated volume can give better performance or reliability
and this is still the default format for block storage RHV.
Since RHV does not support the feature of selecting the image format and
allocation policy for the user, and does not make the available combinations
or the system defaults available via the API, virt-v2v must pass the decision
to the user of the program.
Nir
3 years, 4 months