Re: [Libguestfs] [PATCH v2 4/5] v2v: ova: don't extract files from OVA if it's not needed

Thursday, 24 November 2016

On Wednesday, 23 November 2016 16:16:44 CET Tomáš Golembiovský wrote:
...
 On Mon, 21 Nov 2016 16:21:02 +0100
 Pino Toscano <ptoscano(a)redhat.com&gt; wrote:

 > On Saturday, 12 November 2016 16:37:52 CET Tomáš Golembiovský wrote:
 > > We don't have to always extract all files from the OVA archive. The OVA,
 > > as defined in the standard, is plain tar. We can work directly over the
 > > tar archive if we use correct 'offset' and 'size' options when
defining
 > > the backing file for QEMU.
 > > 
 > > This leads to improvements in speed and puts much lower requirement on
 > > available disk space.
 > > 
 > > Signed-off-by: Tomáš Golembiovský <tgolembi(a)redhat.com&gt;
 > > ---
 > >  v2v/input_ova.ml | 177
+++++++++++++++++++++++++++++++++++++++++++++++--------
 > >  1 file changed, 154 insertions(+), 23 deletions(-)
 > > 
 > > diff --git a/v2v/input_ova.ml b/v2v/input_ova.ml
 > > index f76fe82..7a3e27b 100644
 > > --- a/v2v/input_ova.ml
 > > +++ b/v2v/input_ova.ml
 > > @@ -39,17 +39,23 @@ object
 > >  
 > >    method source () =
 > >  
 > > -    let untar ?(format = "") file outdir =
 > > -      let cmd = [ "tar"; sprintf "-x%sf" format; file;
"-C"; outdir ] in
 > > +    (* Untar part or all files from tar archive. If [path] is specified it is
 > > +     * a path in the tar archive.
 > > +     *)
 > > +    let untar ?(format = "") ?(path) file outdir =  
 > 
 > No need for parenthesis around "path".
 > Also, tar supports extracting more files at the same time, so I'd
 > change "path" into "paths", and just call this once later on
...

 You mean pass an array/list instead of single string? Yeah, why not. 
Right.

...
 > > +      let rec loop lines =
 > > +        match lines with
 > > +        | [] -> raise Not_found
 > > +        | line :: lines -> (
 > > +          (* Lines have the form:
 > > +           * block <offset>: <perms> <owner>/<group>
<size> <mdate> <mtime> <file>
 > > +           *)
 > > +          let rex = Str.regexp "^block \\([0-9]+\\): \\([^ \\t]+\\) \\([^
\\t]+\\) +\\([^ \\t]+\\) \\([^ \\t]+\\) \\([^ \\t]+\\) \\(.+\\)$" in  
 > 
 > Whoa how unreadable :/ The alternative would be using String.nsplit,
 > but it would play badly with filenames with spaces inside the archive.
 > Maybe a faster alternative could be:
 > a) find the 6th empty space, and pick the substring of all the rest
 >    of the string after it
 > b) filter the lines by picking only the wanted file, checking using (a)
 > c) once a single line is found, use String.nsplit and parse only parts
 >    1 (removing the ':' suffix) and 4
 > 
 > It feels slightly more complex, but this way we can avoid running the
 > regex until we find the right filename.

 The problem is there are multiple spaces used to align the column with
 file sizes. Hence the complicated regexp. :( 
I see... in any case, I'd prefer to avoid running the regex on every
line if possible. An idea could be improve our String.nsplit to ignore
empty fields, and thus
  String.nsplit " " "foo  bar" -> ["foo";
"bar"]

-- 
Pino Toscano

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [PATCH v2 4/5] v2v: ova: don't extract files from OVA if it's not needed