This version simply removes the call to pthread_key_destroy. It fixes
the problem and allows us to leave the asserts alone so we can still
catch real errors.
Of course this leaks pthread_key_t in the case where you dlclose() the
library. glibc has a limit of 128 thread-specific keys (and the first
32 are somehow "fast" in a way I could quite follow), so that's a
thing.
Rich.