On Thu, Jun 09, 2022 at 04:24:12PM +0100, Richard W.M. Jones wrote:
On Thu, Jun 09, 2022 at 03:20:02PM +0100, Daniel P. Berrangé wrote:
> > + go test -count=1 -v
> > === RUN Test010Load
> > --- PASS: Test010Load (0.00s)
> > === RUN TestAioBuffer
> > --- PASS: TestAioBuffer (0.00s)
> > === RUN TestAioBufferFree
> > --- PASS: TestAioBufferFree (0.00s)
> > === RUN TestAioBufferBytesAfterFree
> > SIGABRT: abort
> > PC=0x3fdf6f9bac m=0 sigcode=18446744073709551610
>
> So suggesting TestAioBufferBytesAfterFree is as fault, but quite
> odd as that test case is trivial and whle it allocates some
> native memory it doesn't seem to write to it. Unless the problem
> happened in an earlier test case and we have delayed detection ?
>
> I guess I'd try throwing darts at the wall by chopping out bits
> of test code to see what makes it disappear.
>
> Perhaps also try swapping MakeAioBuffer with MakeAioBufferZero
> in case pre-existing data into the C.malloc()d block is confusing
> Go ?
Interestingly if I remove libnbd_020_aio_buffer_test.go completely,
and disable GODEBUG, then the tests pass. (Reproducer commands at end
of email). So I guess at least one of the problems is confined to
this test and/or functions it calls in the main library.
Unfortunately this test is huge.
At your suggestion, replacing every MakeAioBuffer with
MakeAioBufferZero in that test, but it didn't help. Also tried
replacing malloc -> calloc in the aio_buffer.go implementation which
didn't help.
I'll try some more random things ...
Adding a few printf's shows something interesting:
=== RUN TestAioBufferBytesAfterFree
calling Free on 0x3fbc1882b0
calling C.GoBytes on 0x3fbc1882b0
SIGABRT: abort
PC=0x3fe6aaebac m=0 sigcode=18446744073709551610
goroutine 21 [running]:
gsignal
:0
abort
:0
runtime.throwException
../../../libgo/runtime/go-unwind.c:128
runtime.unwindStack
../../../libgo/go/runtime/panic.go:535
panic
../../../libgo/go/runtime/panic.go:750
runtime.panicmem
../../../libgo/go/runtime/panic.go:210
runtime.sigpanic
../../../libgo/go/runtime/signal_unix.go:634
_wordcopy_fwd_aligned
:0
__GI_memmove
:0
runtime.stringtoslicebyte
../../../libgo/go/runtime/string.go:155
__go_string_to_byte_array
../../../libgo/go/runtime/string.go:509
_cgo_23192bdcbd72_Cfunc_GoBytes
./cgo-c-prolog-gccgo:46
This is a simple use after free because the Free function in
aio_buffer.go frees the array and then the Bytes function attempts to
copy b.Size bytes from the NULL pointer.
I didn't write this test so I'm not quite sure what it's trying to
achieve. It seems to be deliberately trying to cause a panic, but
causes a segfault instead? (And why only on RISC-V?)
func TestAioBufferBytesAfterFree(t *testing.T) {
buf := MakeAioBuffer(uint(32))
buf.Free()
defer func() {
if r := recover(); r == nil {
t.Fatal("Did not recover from panic calling Bytes() after
Free()")
}
}()
buf.Bytes()
}
Since this only happens on RISC-V I guess it might be something to do
with the golang implementation on this architecture being unable to
turn segfaults into panics.
Removing all three *AfterFree tests fixes the tests.
It seems a bit of an odd function however. Wouldn't it be better to
changes the Bytes function so that it tests if the pointer is NULL and
panics?
NB: this _does not_ address the other problem where GODEBUG=cgocheck=2
complains about "fatal error: Go pointer stored into non-Go memory".
Rich.
--
Richard Jones, Virtualization Group, Red Hat
http://people.redhat.com/~rjones
Read my programming and virtualization blog:
http://rwmj.wordpress.com
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages.
http://libguestfs.org