On Wed, Nov 15, 2017 at 09:41:20PM +0100, Gandalf Corvotempesta wrote:
2017-11-15 21:29 GMT+01:00 Richard W.M. Jones
<rjones(a)redhat.com>:
> Gandalf, is there an XVA file publically available (pref. not
> too big) that we can look at?
I can try to provide one, but it's simple:
# tar tvf 20160630_124823_aa72_.xva.gz | head -n 50
---------- 0/0 42353 1970-01-01 01:00 ova.xml
---------- 0/0 1048576 1970-01-01 01:00 Ref:175/00000000
---------- 0/0 40 1970-01-01 01:00 Ref:175/00000000.checksum
---------- 0/0 1048576 1970-01-01 01:00 Ref:175/00000001
---------- 0/0 40 1970-01-01 01:00 Ref:175/00000001.checksum
---------- 0/0 1048576 1970-01-01 01:00 Ref:175/00000003
---------- 0/0 40 1970-01-01 01:00 Ref:175/00000003.checksum
---------- 0/0 1048576 1970-01-01 01:00 Ref:175/00000004
---------- 0/0 40 1970-01-01 01:00 Ref:175/00000004.checksum
---------- 0/0 1048576 1970-01-01 01:00 Ref:175/00000005
---------- 0/0 40 1970-01-01 01:00 Ref:175/00000005.checksum
---------- 0/0 1048576 1970-01-01 01:00 Ref:175/00000006
---------- 0/0 40 1970-01-01 01:00 Ref:175/00000006.checksum
---------- 0/0 1048576 1970-01-01 01:00 Ref:175/00000007
---------- 0/0 40 1970-01-01 01:00 Ref:175/00000007.checksum
---------- 0/0 1048576 1970-01-01 01:00 Ref:175/00000009
---------- 0/0 40 1970-01-01 01:00 Ref:175/00000009.checksum
---------- 0/0 1048576 1970-01-01 01:00 Ref:175/00000010
---------- 0/0 40 1970-01-01 01:00 Ref:175/00000010.checksum
You can ignore the ova.xml and just use the "Ref:175" directory.
Inside the XVA you'll fine one "Ref" directory for each virtual disk
(ref number is different for each disk)
Inside each Ref directory, you'll get tons of 1MB file, corrisponding
to the RAW image.
You have to merge these files in a single raw file with just an
exception: numbers are not sequential.
as you can see above, we have: 00000000, 00000001, 00000003
The 00000002 is missing, because it's totally blank. XenServer doesn't
export any empty block, thus it will skip the corrisponding 1MB file.
When building the raw image, you have to replace empty blocks with 1MB
full of zeros.
This is (in addition to the tar extract) the most time-consuming part.
Currently I'm rebuilding a 250GB image, with tons of files to be
merge.
If qemu-img can be patched to automatically convert this kind of
format, I can save about 3 hours (30 minutes for extracting the
tarball, and about 2 hours to merge 250-300GB image)
I guess the nbdkit approach would be better given the multiple and
missing files within the tar file.
You'll have to use ‘tar tRvf file.xva’ to extract the offset and size
of each file. (See the function ‘find_file_in_tar’ in virt-v2v source
for exactly how).
This will give you one offset/size/filename tuple for each file. The
plugin will then simply need to calculate which file to access to
resolve each virtual file range (or substitute zeroes for missing
files).
Note that nbdkit lets you write plugins in high-level languages like
Perl or Python which would be ideally suited to this kind of task.
You can then use "captive nbdkit" (see the manual) like this:
nbdkit -U - \
perl script=/path/to/xva-parser.pl file=/path/to/file.xva \
--run 'qemu-img convert $nbd -O qcow2 output.qcow2'
At no point does the tar file need to be fully unpacked and it should
all be reasonably efficient.
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html