On 5/16/23 14:17, Richard W.M. Jones wrote:
On Mon, May 15, 2023 at 07:49:24PM +0200, Laszlo Ersek wrote:
> Introduce unescape_device_mapper_lvm() for turning
>
> /dev/mapper/VG-LV
>
> key IDs into
>
> /dev/VG/LV
>
> ones, unescaping doubled hyphens to single hyphens in both VG and LV in
> the process. Libguestfs uses the /dev/VG/LV format internally, for
> identifying devices, but the user might want to specify the
> /dev/mapper/VG-LV ID format with the "--key ID:..." options.
>
> Call unescape_device_mapper_lvm() from key_store_import_key(). That is,
> translate the ID as soon as the "--key" option is processed -- let the
> keystore only know about the usual /dev/VG/LV format, for later matching.
>
> Bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=2168506
> Signed-off-by: Laszlo Ersek <lersek(a)redhat.com>
> ---
>
> Notes:
> regcomp() must definitely allocate memory dynamically, and we
> (intentionally) never free that -- we never call regfree(). I assume
> valgrind will catch this as a leak, so we might have to extend
> "valgrind-suppressions" in each dependent superproject. However,
I'm
> unable to run "make check-valgrind" successfully in e.g. virt-v2v even
> before these patches; see also
> <
https://listman.redhat.com/archives/libguestfs/2023-May/031496.html>.
>
> options/keys.c | 86 ++++++++++++++++++++
> 1 file changed, 86 insertions(+)
>
> diff --git a/options/keys.c b/options/keys.c
> index bc7459749276..f0164a6ed987 100644
> --- a/options/keys.c
> +++ b/options/keys.c
> @@ -27,6 +27,7 @@
> #include <errno.h>
> #include <error.h>
> #include <assert.h>
> +#include <regex.h>
>
> #include "guestfs.h"
>
> @@ -260,6 +261,90 @@ key_store_add_from_selector (struct key_store *ks, const char
*selector)
> return key_store_import_key (ks, &key);
> }
>
> +/* A /dev/mapper/ escaped character is:
> + * - either any character except a slash or a hyphen,
> + * - or a double hyphen.
> + *
> + * Note that parens are deliberately not included here -- they will be included
> + * in the full pattern, for making them more easily countable.
> + *
> + * Also note that the bracket expression below does not contain a range
> + * expression, therefore it should not be locale-sensitive.
> + */
> +#define ESC_CH "[^-/]|--"
> +
> +/* Turn /dev/mapper/VG-LV into /dev/VG/LV. */
> +static void
> +unescape_device_mapper_lvm (char *id)
> +{
> + static bool compiled;
> + int rc;
> + static regex_t regex;
> + regmatch_t match[5];
> + char *output;
> + const char *vg_start, *vg_end, *lv_start, *lv_end;
> + const char *input;
> +
> + if (!compiled) {
> + /* Recognize /dev/mapper/VG-LV pathnames, where VG and LV don't contain
> + * slashes, plus any original hyphens in them are doubled for escaping.
> + */
> + static const char pattern[] = "^/dev/("
> + "mapper/"
> + "((" ESC_CH ")+)"
> + "-"
> + "((" ESC_CH ")+)"
> + ")$";
> +
> + rc = regcomp (®ex, pattern, REG_EXTENDED);
> + if (rc != 0) {
> + char errbuf[256];
> +
> + /* When regcomp() fails, the contents of "regex" are undefined, so
we
> + * cannot pass "regex" to regerror().
> + */
> + (void)regerror (rc, NULL, errbuf, sizeof errbuf);
> + error (EXIT_FAILURE, 0,
> + _("%s: Failed to compile pattern (%s). Please report a bug for
"
> + "libguestfs -- refer to guestfs(3) section BUGS."),
> + __func__, errbuf);
> + }
> + compiled = true;
> + }
Any reason not to use PCRE and the COMPILE_REGEXP macro?
I didn't even think of PCRE -- regexes are standard (i.e., in POSIX).
This is a relatively simple regex at that.
We also have macros like CLEANUP_PCRE2_MATCH_DATA_FREE to free match
data automaticaly.
That's different, I think. If I understand "THE MATCH DATA BLOCK"
section <
https://www.pcre.org/current/doc/html/pcre2api.html#SEC26>
correctly, it implies that dynamic memory needs to be allocated for each
separate matching attempt, not just for compiling the pattern.
Here we only allocate memory when compiling the pattern; regexec() needs
no dynamic memory. (POSIX says that if it fails, it can only fail with
REG_NOMATCH, and not (for example) with REG_ESPACE.) Furthermore, the
offsets of the subexpressions are placed in the local variable "match",
which doesn't even provide enough slots for all the subexpressions -- we
don't care for the last subexpression.
In general PCRE regexps are a lot more usable than the POSIX stuff.
I think PCRE is much more capable than POSIX regexes, regarding parsing
power / expressivity.
On the other hand, I find the PCRE API way more complex than the POSIX
one -- I have never *not* struggled with the PCRE documentation. POSIX
is just one specification page:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html
and for this RE, it is sufficient.
If we're supposed to use pcre2 in common/ for consistency with
libguestfs, that sounds like a valid argument (consistency is
important). However, in that case, I'm tempted to rewrite this without
regular expressions in the first place. That's possible; I just found
regexes more comfortable, specifically due to having been used to the
POSIX RE API. The argument went like, "POSIX regex is easy, and then the
loops become even easier, relying on the regex-pre-check". However, the
PCRE2 API is something I have no experience with, and I find the
documentation baroque... So that kind of defeats the entire original
argument.
After some digging in libguestfs... my point is illustrated by the fact
that we have the macro COMPILE_REGEXP(), and functions like
guestfs_int_match2() :(
Let me take a stab at a rewrite without a regular expression. I wonder
if I'll dislike that enough to sweeten pcre2 to me.
Thanks
Laszlo
Rich.
> + rc = regexec (®ex, id, 5, match, 0);
> +
> + /* If there's no match, just leave "id" alone. */
> + if (rc != 0)
> + return;
> +
> + /* Start outputting after "/dev/". */
> + output = id + match[1].rm_so;
> + vg_start = id + match[2].rm_so;
> + vg_end = id + match[2].rm_eo;
> + lv_start = id + match[4].rm_so;
> + lv_end = id + match[4].rm_eo;
> +
> + /* Each of the following two loops produces at most as many bytes as it
> + * consumes, so we unescape "id" in-place.
> + */
> + input = vg_start;
> + while (input < vg_end)
> + if ((*output++ = *input++) == '-')
> + input++;
> +
> + *output++ = '/';
> +
> + input = lv_start;
> + while (input < lv_end)
> + if ((*output++ = *input++) == '-')
> + input++;
> +
> + *output++ = '\0';
> +}
> +
> +#undef ESC_CH
> +
> struct key_store *
> key_store_import_key (struct key_store *ks, struct key_store_key *key)
> {
> @@ -278,6 +363,7 @@ key_store_import_key (struct key_store *ks, struct key_store_key
*key)
> error (EXIT_FAILURE, errno, "realloc");
>
> ks->keys = new_keys;
> + unescape_device_mapper_lvm (key->id);
> ks->keys[ks->nr_keys] = *key;
> ++ks->nr_keys;
>
> _______________________________________________
> Libguestfs mailing list
> Libguestfs(a)redhat.com
>
https://listman.redhat.com/mailman/listinfo/libguestfs