On Tue, Sep 11, 2018 at 03:02:47PM -0500, Eric Blake wrote:
[Lots of issues]
I'll fix all these, thanks.
> files="blocksize1.img blocksize1.log blocksize1.sock
blocksize1.pid
>@@ -72,27 +73,13 @@ cleanup ()
> trap cleanup INT QUIT TERM EXIT ERR
> # Run two parallel nbdkit; to compare the logs and see what changes.
>-nbdkit -P blocksize1.pid -U blocksize1.sock \
>+start_nbdkit -P blocksize1.pid -U blocksize1.sock \
> --filter=log file logfile=blocksize1.log blocksize1.img
>-nbdkit -P blocksize2.pid -U blocksize2.sock --filter=blocksize \
>+start_nbdkit -P blocksize2.pid -U blocksize2.sock --filter=blocksize \
> --filter=log file logfile=blocksize2.log blocksize2.img \
> minblock=1024 maxdata=512k maxlen=1M
If the first process starts but the second fails, then you have not
set pid1=, and leak the first process. This is a regression from
the old code, which managed to capture pid1 before triggering the
trap to cleanup(), and thus killed the successful nbdkit. Several
tests are impacted.
So one other idea I had is to have start_nbdkit create a trap function
to clean up the nbdkit process (we'd still need trap functions in the
main program for deleting temporary files). I don't know if this
would work, but if it does it would fix the above problem.
Less importantly, the old code performed startup in parallel (kick
off both nbdkit processes, then start the loop; if a pid file was
not produced by either one of the nbdkit processes, the test
determined that within 10 seconds); now startup is serial (you don't
start a second nbdkit until the first one has succeeded). For a test
with two nbdkit processes, that could expand typical test time from
1 second to 2 (if the pid file creation is not instantaneous, but
the first sleep 1 was sufficient for both processes to be ready);
and expand worst-case failure detection under heavy load from 10
into 20 seconds (if the first succeeded on the last iteration, but
the second timed out). I don't see anything wrong with the change,
although it might be worth a mention in the commit message as
intentional.
I'm not too worried about this since it should cause only a minor
increase in time.
I'm also seeing a common pattern of assigning pid=$(cat
file.pid)
right after calling start_nbdkt. Would it be worth changing the
signature of start_nbdkit() to also populate a pid variable on
success, as in:
start_nbdkit blocksize1.pid pid1 myarg1
start_nbdkit()
{
pidfile=$1
pidvar=$2
shift 2
...
pid=$(cat $pidfile)
eval $pidvar=$pid
It was nice that start_nbdkit has the same "signature" as
a regular nbdkit command.
which would also have the added benefit of avoiding the regression
of pid1 not being set if the second nbdkit process doesn't start?
But overall, I like the idea of refactoring the common code for less
repetition.
I'll have a think about the trap function stuff, but most
likely tomorrow.
Thanks!
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW