Re: [Libguestfs] [libnbd PATCH v2] lib/errors.c: Fix assert fail in exit path in multi-threaded code

Wednesday, 8 March 2023

On Wed, Mar 08, 2023 at 10:55:01PM +0000, Richard W.M. Jones wrote:
...
 > @@ -87,16 +92,23 @@ free_errors_key (void *vp)
 >  static struct last_error *
 >  allocate_last_error_on_demand (void)
 >  {
 > -  struct last_error *last_error = pthread_getspecific (errors_key);
 > +  struct last_error *last_error = NULL;
 > 
 > -  if (!last_error) {
 > -    last_error = calloc (1, sizeof *last_error);
 > -    if (last_error) {
 > -      int err = pthread_setspecific (errors_key, last_error);
 > -      if (err != 0) {
 > -        /* This is not supposed to happen (XXX). */
 > -        fprintf (stderr, "%s: %s: %s\n", "libnbd",
"pthread_setspecific",
 > -                 strerror (err));
 > +  if (key_use_count) {
 > +    last_error = pthread_getspecific (errors_key);
 > +    if (!last_error) {
 > +      last_error = calloc (1, sizeof *last_error);
 > +      if (last_error) {
 > +        int err = pthread_setspecific (errors_key, last_error);
 > +        key_use_count++;
 > +        if (err != 0) {
 > +          /* This is not supposed to happen (XXX). */
 > +          fprintf (stderr, "%s: %s: %s\n", "libnbd",
"pthread_setspecific",
 > +                   strerror (err));
 > +          free (last_error);
 > +          last_error = NULL;
 > +          key_use_count--;
 > +        }
 >        }
 >      }
 >    }

 I suspect this may not be completely race condition free unless
 there's a lock.  Otherwise key_use_count could be > 0 when we enter
 this code and then another thread could run the destructor at the same
 time. 
Hmm.  I agree that the common path (pthread_get/setspecific) must NOT
lock.  But the !last_error path is not the common case (only once per
thread), likewise the data destructor is only once per thread.

There's also the question of how much work it is to make this
bulletproof, and yet how difficult it is to test that it is actually
correct.  Going with -Z nodelete is certainly less maintenance.

...

 However I don't want to add locking around these functions since
 (especially the one setting context) is highly contended and this
 could kill performance.

 I think the worst that might happen is we would get EINVAL from
 pthread_setspecific, print a message, but at least we won't crash
 because the assert has been removed.

 So soft ACK, but I'm still wondering if there's a better way to do this. 
I'll be thinking about locking only the slow paths, while leaving the
fast paths unlocked.

...

 If simply never calling pthread_key_destroy a good idea?  The worst is
 it leaks slots in glibc's thread-local storage array. 
That thinking is precisely why libvirt uses -Z nodelete - the moment a
thread can have thread-local data with a destructor pointing to libnbd
code, but where libnbd can be removed from memory without calling
pthread_key_delete(), is the moment you get a SEGV trying to call the
thread destructor.  A memory leak is better than a SEGV, so we either
have to enable '-Z nodelete' or unconditionally use
pthread_key_delete() when libnbd is unloaded.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Libguestfs] [libnbd PATCH v2] lib/errors.c: Fix assert fail in exit path in multi-threaded code