On Thu, Aug 8, 2019 at 5:05 PM Richard W.M. Jones <rjones(a)redhat.com> wrote:
On Mon, Mar 18, 2019 at 06:51:26PM +0200, Daniel Erez wrote:
> After invoking transfer_service.finalize, check operation
> status by examining DiskStatus.
> This is done instead of failing after a predefined timeout
> regardless the status.
>
> * not verified *
>
> Bug-Url:
https://bugzilla.redhat.com/show_bug.cgi?id=1680361
> ---
> v2v/rhv-upload-plugin.py | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py
> index 2a950c5ed..2cefe94b0 100644
> --- a/v2v/rhv-upload-plugin.py
> +++ b/v2v/rhv-upload-plugin.py
> @@ -523,16 +523,23 @@ def close(h):
> # waiting for the transfer object to cease to exist, which
> # falls through to the exception case and then we can
> # continue.
> - endt = time.time() + timeout
> + disk_id = disk.id
> + start = time.time()
> try:
> while True:
> time.sleep(1)
> - tmp = transfer_service.get()
> - if time.time() > endt:
> - raise RuntimeError("timed out waiting for transfer "
> - "to finalize")
> + disk_service = h['disk_service']
> + disk = disk_service.get()
> + if disk.status == types.DiskStatus.LOCKED:
> + if time.time() > start + timeout:
> + raise RuntimeError("timed out waiting for
transfer "
> + "to finalize")
> + continue
> + if disk.status == types.DiskStatus.OK:
> + debug("finalized after %s seconds", time.time() -
start)
Maybe this was the real error? (using , instead of %)
> + break
> except sdk.NotFoundError:
> - pass
> + raise RuntimeError("transfer failed: disk %s not found" %
disk_id)
>
> # Write the disk ID file. Only do this on successful
completion.
> with builtins.open(params['diskid_file'], 'w') as fp:
Ilanit tested this patch for us. Unfortunately it fails for reasons I
don't really understand but seem to be deep inside the RHV API:
nbdkit: python[1]: error: /var/tmp/v2v.sZm0my/rhv-upload-plugin.py: close:
error: ['Traceback (most recent call last):
', ' File "/var/tmp/v2v.sZm0my/rhv-upload-plugin.py", line 577, in
close
', ' File "/var/tmp/v2v.sZm0my/rhv-upload-plugin.py", line 516, in
delete_disk_on_failure
This looks like failure flow - but there is no info here on the real
failure. We need to understand
what was that failure.
', ' File
"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py",
line 37334, in remove
self._internal_remove(headers, query, wait)
', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
271, in _internal_remove
return future.wait() if wait else future
', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
55, in wait
return self._code(response)
', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
268, in callback
self._check_fault(response)
', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
132, in _check_fault
self._raise_error(response, body)
', ' File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py",
line
118, in _raise_error
raise error
', 'Error: Fault reason is "Operation Failed". Fault detail is
"[Cannot
remove Virtual Disk. Related operation is currently in progress. Please try
again later.]". HTTP response code is 409.
']
I think this error means that the disk is currently busy, probably because
of the real error.
Ilanit, can you share complete virt-v2v and daemon logs?
There is a surprising behavior in the image transfer API - you create a
disk, and start
a transfer session with that disk. However once you started a the transfer,
it owns
the disk, and it will delete it in case of failures, so you don't need to
delete it yourself.
I guess you need something like
h["owns_disk"] = True
After creating the disk, and once the transfer was started, set it to False.
In your cleanup flow, delete the disk only if you still own it.
The other issue here is that the error during close is hiding the real
error. This is
a common problem in python 2, and the solution is typically to catch all
exceptions
in the cleanup flow and log them, and make sure the original error is
raised and
logged by the top error handler.
Nir