On Tue, May 16, 2023 at 10:55:25AM -0500, Eric Blake wrote:
> +static int
> +evil_config (nbdkit_next_config *next, nbdkit_backend *nxdata,
> + const char *key, const char *value)
> +{
> + if (strcmp (key, "evil") == 0 || strcmp (key, "evil-mode") ==
0) {
> + if (strcmp (value, "cosmic-rays") == 0 ||
> + strcmp (value, "cosmic") == 0) {
Do we want strcasecmp on these? Do we want to allow _ as a synonym to -?
We're a bit random about this. There are a few places where we use
ascii_strcasecmp, but mostly we use strcmp ...
[...]
> + /* Choose the block size based on the probability, so that at
least
> + * 100 bits are expected to be corrupted in the block. Block size
> + * must be a power of 2.
> + */
> + block_size = next_power_of_2 ((uint64_t) (100. / evil_probability));
...and I'm worried that your block_size computation reaches a point
where it does not do what we want if the probably gets too close to
zero. Even if it is not division by zero, division by a subnormal
float can easily produce overflow to infinity, which converts to
UINT64_MAX, and next_power_of_2(UINT64_MAX) was untested in patch 3's
unit tests, but appears like it will be '1 << (64 - 64)' == 1, which
isn't really desirable.
Yes this is definitely a bug. I meant to avoid divide by zero, and
did in one place, but not everywhere.
I think it's worthwhile allowing the probability to be zero, not least
because it avoids a special case for the user. However I will need to
add a test to make sure I've avoided all the problems.
Also as you say very small but non-zero probabilities can make
block_size blow up.
Let me fix all of that & add tests.
> +
> + uint64_t offs, intvl, i, rand;
> + const uint64_t dinvp = (uint64_t) (2.0 * (1.0 / evil_probability));
[1] Ah, I see - you did mean multiplication, not exponentiation. Like
you commented elsewhere, it is not quite a true uniform distribution,
but certainly seems sane enough for the intent.
I will clarify the comment.
> +++ b/tests/test-evil-cosmic.sh
> +
> +f="test-evil-cosmic.out"
> +rm -f $f
> +cleanup_fn rm -f $f
> +
> +# 80 million zero bits in the backing disk, and the filter will
> +# randomly flip (ie. set high) 1 in 800,000 bits, or about 100.
> +
> +# XXX Actually the number of set bits clusters around 80. There could
> +# be a mistake in my calculations or the interval algorithm we use
> +# might be biased.
I have not closely inspected the math yet to see if there's an obvious
bias (treat this as a first-pass quick review looking for obvious
code/shell bugs, as opposed to more insidious deep analysis bugs).
But that comment does mean we probably ought to double-check things
prior to v2.
Interestingly the other tests are nicely distributed around the
expected values, it's just this one test where things are out of
whack. I'll add a bit of debugging to the interval calculation etc to
see if I can see what's going on.
> +export f
> +nbdkit -U - null 10000000 \
> + --filter=evil --filter=noextents \
> + evil=cosmic-rays evil-probability=1/800000 \
> + --run 'nbdcopy "$uri" $f'
> +
> +# This will give an approximate count of the number of set bits.
> +
> +zbytes="$( od -A n -w1 -v -t x1 < $f | sort | uniq -c |
> + $SED -n -E -e 's/([0-9]+)[[:space:]]+00[[:space:]]*$/\1/p'
)"
> +nzbits=$(( 10000000 - zbytes )); # looks wrong but actually correct ...
Interesting computation!
I can probably change this to use Python code. Because I wanted to
use nbdcopy (to allow the filter to be hit by multiple threads), I
wasn't using nbdsh, so Python wasn't immediately available ...
> +# Expect about 50 stuck-high bits.
> +buf = h.pread(10000000, 0)
> +bits = count_bits(buf)
> +print("stuck high bits: %d (expected 50)" % bits)
> +assert(bits > 20 and bits < 80)
> +
> +# If we read subsets they should match the contents of the buffer.
> +buf1 = h.pread(1000, 1000)
> +assert(buf1 == buf[1000:2000])
> +
> +buf1 = h.pread(10000, 999)
> +assert(buf1 == buf[999:10999])
Should we also assert that there is at least one stuck bit in those
two ranges, and/or pick a different or larger range to improve the
chances of that being the case?
So I think you mean pick a range where we know there is a stuck bit
before doing the read + assert. I will see. With a 1:800,000 chance,
it's not likely the ranges as written would have stuck bits so we're
comparing zeroes ...
> +++ b/tests/test-evil-stuck-wires.sh
> +
> +# Run nbdkit with the evil filter.
> +start_nbdkit -P $pidfile -U $sock \
> + --filter=evil --filter=noextents \
> + null 1G evil=stuck-wires evil-probability=1/10000
> +
> +# Reads from the filter should have 1:10,000 bits stuck high or low.
> +# However we don't see stuck low bits are we are always reading
s/are we are/since we are/
Will fix.
Overall looks like an interesting idea for a filter.
Thanks,
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit