On Wednesday, 23 November 2016 16:16:44 CET Tomáš Golembiovský wrote:
On Mon, 21 Nov 2016 16:21:02 +0100
Pino Toscano <ptoscano(a)redhat.com> wrote:
> On Saturday, 12 November 2016 16:37:52 CET Tomáš Golembiovský wrote:
> > We don't have to always extract all files from the OVA archive. The OVA,
> > as defined in the standard, is plain tar. We can work directly over the
> > tar archive if we use correct 'offset' and 'size' options when
defining
> > the backing file for QEMU.
> >
> > This leads to improvements in speed and puts much lower requirement on
> > available disk space.
> >
> > Signed-off-by: Tomáš Golembiovský <tgolembi(a)redhat.com>
> > ---
> > v2v/input_ova.ml | 177
+++++++++++++++++++++++++++++++++++++++++++++++--------
> > 1 file changed, 154 insertions(+), 23 deletions(-)
> >
> > diff --git a/v2v/input_ova.ml b/v2v/input_ova.ml
> > index f76fe82..7a3e27b 100644
> > --- a/v2v/input_ova.ml
> > +++ b/v2v/input_ova.ml
> > @@ -39,17 +39,23 @@ object
> >
> > method source () =
> >
> > - let untar ?(format = "") file outdir =
> > - let cmd = [ "tar"; sprintf "-x%sf" format; file;
"-C"; outdir ] in
> > + (* Untar part or all files from tar archive. If [path] is specified it is
> > + * a path in the tar archive.
> > + *)
> > + let untar ?(format = "") ?(path) file outdir =
>
> No need for parenthesis around "path".
> Also, tar supports extracting more files at the same time, so I'd
> change "path" into "paths", and just call this once later on
...
You mean pass an array/list instead of single string? Yeah, why not.
Right.
> > + let rec loop lines =
> > + match lines with
> > + | [] -> raise Not_found
> > + | line :: lines -> (
> > + (* Lines have the form:
> > + * block <offset>: <perms> <owner>/<group>
<size> <mdate> <mtime> <file>
> > + *)
> > + let rex = Str.regexp "^block \\([0-9]+\\): \\([^ \\t]+\\) \\([^
\\t]+\\) +\\([^ \\t]+\\) \\([^ \\t]+\\) \\([^ \\t]+\\) \\(.+\\)$" in
>
> Whoa how unreadable :/ The alternative would be using String.nsplit,
> but it would play badly with filenames with spaces inside the archive.
> Maybe a faster alternative could be:
> a) find the 6th empty space, and pick the substring of all the rest
> of the string after it
> b) filter the lines by picking only the wanted file, checking using (a)
> c) once a single line is found, use String.nsplit and parse only parts
> 1 (removing the ':' suffix) and 4
>
> It feels slightly more complex, but this way we can avoid running the
> regex until we find the right filename.
The problem is there are multiple spaces used to align the column with
file sizes. Hence the complicated regexp. :(
I see... in any case, I'd prefer to avoid running the regex on every
line if possible. An idea could be improve our String.nsplit to ignore
empty fields, and thus
String.nsplit " " "foo bar" -> ["foo";
"bar"]
--
Pino Toscano