Hi Daniel,
Thanks for the detailed report!
On 10/13/22 03:33, Daniel P. Berrangé wrote:
On Thu, Oct 13, 2022 at 09:49:09AM +0100, Richard W.M. Jones wrote:
> On Wed, Oct 12, 2022 at 02:00:21PM -0500, Eric Blake wrote:
>>> Job #3163966643 (
https://gitlab.com/nbdkit/libnbd/-/jobs/3163966643/raw )
>>>
>>> Stage: builds
>>> Name: x86_64-opensuse-leap-153-prebuilt-env
>>
>> This one is still failing because of a bug in gnutls; the log is
>> reporting:
>>
>> libnbd: debug: nbd1: nbd_connect_command: transition:
NEWSTYLE.OPT_STARTTLS.RECV_REPLY_PAYLOAD -> NEWSTYLE.OPT_STARTTLS.CHECK_REPLY
>> free(): invalid pointer
>> libnbd: debug: nbd1: nbd_connect_command: transition:
NEWSTYLE.OPT_STARTTLS.CHECK_REPLY -> NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_READ
>> libnbd: debug: nbd1: nbd_connect_command: transition:
NEWSTYLE.OPT_STARTTLS.TLS_HANDSHAKE_READ -> DEAD
>> libnbd: debug: nbd1: nbd_connect_command: leave: error="nbd_connect_command:
gnutls_handshake: Error in the pull function. (-1/1)"
>>
>> That libc message about invalid free() is scary; I'm not yet sure
>> whether it is a bug in opensuse-leap's gnutls package or something
>> we're doing wrong in libnbd.
>
> I had a look into this. Unfortunately I only have OpenSUSE Tumbleweed
> available. It doesn't fail for me in Tumbleweed. (It also doesn't
> fail in the CI pipeline for Tumbleweed.)
Anyone has access to the CI env. Line 9 of the build log
shows the container env used:
Using docker image
sha256:e4a8e52b0bbb712a544a90d21b21010daad8ab3e85a768cfea38571461ec85fc for
registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153:latest with digest
registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153@sha256:11179119130...
...
You just need to launch the same container, clone the git repo and
then run the build commands
IOW, on your local machine do:
$ podman run -it
registry.gitlab.com/nbdkit/libnbd/ci-opensuse-leap-153:latestn
# git clone
https://gitlab.com/nbdkit/libnbd
# cd libnbd
# autoreconf -if
# ./configure --enable-gcc-warnings --with-gnutls --with-libxml2 --enable-fuse
--enable-ocaml --enable-python --enable-golang
# make -j 20
# cd tests
# ./connect-tls-psk
requires nbdkit --tls-verify-peer -U - null --run 'exit 0'
nbdkit: pattern: error: failed to set TLS session priority to
@NBDKIT,SYSTEM:+ECDHE-PSK:+DHE-PSK:+PSK: The request is invalid.
nbd_connect_command: gnutls_handshake: Error in the push function. (-1/1)
What's interesting here is that this shows the real error
mesage about TLS sessino priority.
If you set MALLOC_CHECK=1, however, then we loose the useful
error message:
# MALLOC_CHECK_=1 MALLOC_PERTURB_=146 ./connect-tls-psk
requires nbdkit --tls-verify-peer -U - null --run 'exit 0'
free(): invalid pointer
nbd_connect_command: gnutls_handshake: Error in the pull function. (-1/1)
which was unfortunate for debuggability.
I confirmed it is nbdkit that is crashing and it appears to be
in gnutls code.
Looking at the image there is no /etc/crypto-policies directory,
and nor is there any 'crypto-policies' package available in the
distro.
Indeed. Leap 15.4 and newer include the crypto-policies package. Should the
container move to a 15.4 base?
So they have mis-built nbdkit in leap 15.3 with TLS priority
string of @NBDKIT,SYSTEM, despite not having support for that
in their distro.
I'll fix this in our downstream packages. Thanks a lot for bringing it to my
attention.
Regards,
Jim