Garbage collector stops the game[or probably disk access]

Questions about the LÖVE API, installing LÖVE and other support related questions go here.
Forum rules
Before you make a thread asking for help, read this.
no_login_found
Prole
Posts: 18
Joined: Sun Dec 31, 2017 4:04 pm

Re: Garbage collector stops the game[or probably disk access]

Post by no_login_found »

I still can't make it run even on 64bit. It segfaults for no obvious reason. gdb shows meaningless things like love::Object::retain() or love:thread:sdl:Thread::thread_runner.
If I collectgarbage("stop") for all threads, then it was able to survive enough to show something and create region files, but then still crashes.
If I write first thread line os.exit(), then it's able to exit before it segfaults. If I add a line to print something before exiting, it segfaults.
I tried jit.off() since somewhere they say that it previously crashed when jitted, still no result.
Tried to reinstall both from love2d.org dbg files and from sources (just in case because people say that old gcc had issues with threads), same result.
Probably I'm doing something completely wrong.
-----
update: checked with 32-bit win love2d version, it works. So most likely issues on debian 32b are more likely os-specific than bitness-specific.
update2: it even 'works' on old android (10-15 fps with very slow generation, crashes when keyboard is removed or attached, however still much better than segfaults)
User avatar
pgimeno
Party member
Posts: 3548
Joined: Sun Oct 18, 2015 2:58 pm

Re: Garbage collector stops the game[or probably disk access]

Post by pgimeno »

I've done some debugging. For me it is crashing in the loop within Chunk:fromBlockIds(). Indeed, the access to data[0] is what triggers the segfault, so it's probably pointing to an invalid location. Again, it smells like a thread-related race condition.

EDIT:
I've found something that may or may not be the cause. 'ffi' is an upvalue in the current file, but the function is called from another thread, which has a different environment. I don't know if you can do that. Using threads like that sounds risky.

Changing ffi to require"ffi" inside Chunk:fromBlockIds fixed the crash for me.

Code: Select all

function Chunk:fromBlockIds(dataPointer)
  local data = require"ffi".cast("double *", dataPointer)
  ...
  require"ffi".C.free(data) -- no crash here but just in case

EDIT2:
Actually, *anything* that delays the call to ffi.cast fixes it, so I no longer think that the above was the cause. It looks more and more like ffi.cast is not thread-safe. I used this to print the value of 'data' within the loop:

Code: Select all

if x == self.fromPoint.x then print(dataPointer, data, ffi.cast("double *", data
Pointer)) end
The result, just before the crash:

Code: Select all

-1307548544	cdata<double *>: NULL	cdata<double *>: 0xb2106480
My advice would be to grab this: https://github.com/slime73/love-mutex and protect the calls to ffi.cast, and report it to Mike Pall.
no_login_found
Prole
Posts: 18
Joined: Sun Dec 31, 2017 4:04 pm

Re: Garbage collector stops the game[or probably disk access]

Post by no_login_found »

I can't confirm that synchronizing ffi.cast helps.
Attached 2 files - the game with synchronized cast and the test project which tries to do what game does - allocate pointer in one thread, pass through channel, access it and free. The game with synchronized cast still crashes. The test project doesn't have any issue. I didn't use the "love-mutex" @pgimeno suggested, so there's also a small chance that I didn't synchronize it properly.

However introducing delays makes it less likely to happen (however, only in this particular case). So the more testing code you add, the more rarely it will happen. For example, just printing something before cast significantly reduces chance to see the issue. Launching from the folder instead of zip also has such effect. When I synchronized all ffi calls I do, I never seen the issue. It was possible to reproduce it with only "cast" unsynchronized however the opposite is also true. It's really hard to judge when it happens <1% of game launches.
However I can't make any conclusions based on that since initially it was exactly the opposite - delays made it more likely to happen, and launching from a folder also increased the probability to almost 100%. Increasing number of cores assigned to virtual machine also made it happen less often.
Probably it's changed because I moved all initialization code to love.load instead of just plainly doing it in main. Before I did that, the game crashed almost always.

Also when launched with gdb and dmalloc, it immediately reports error that something wrong is 'freed', even if all the free calls through ffi are removed from lua code. I'm not sure if it means anything - it can mean equally that I didn't manage to set up gdb+dmalloc properly, or it can mean that some third-party lib damages memory, or it means that debug corrupts memory itself. I also tried to do all free/malloc in synchronized manner and check for correctness of params, results, and memory access, and it never reported single issue, which makes it less probable that there's something wrong in my allocation/access/free code.

Not sure there's any sense in reporting this as luajit bug when I don't know exact conditions to reproduce and not even sure if it's a bug in my code, or in luajit, or in love2d, or in some third-party lib which was build for linux in thread-unsafe way.
Attachments
threadtest.love
attempt to reproduce 'cast' issue in minimal project - not reproduced
(1.02 KiB) Downloaded 111 times
game1.love
attempt to fix 'cast' issue in game - no success
(93 KiB) Downloaded 118 times
User avatar
pgimeno
Party member
Posts: 3548
Joined: Sun Oct 18, 2015 2:58 pm

Re: Garbage collector stops the game[or probably disk access]

Post by pgimeno »

Apologies, I'm not used to debug multithreaded applications. But after more consideration, I think it's possible that you are writing out of the bounds of an array somewhere, probably within a thread, and overwriting the pointer. When I run it out of gdb, I occasionally get a segfault at a different point. Within gdb, I have been unable to catch it at any other place.
no_login_found
Prole
Posts: 18
Joined: Sun Dec 31, 2017 4:04 pm

Re: Garbage collector stops the game[or probably disk access]

Post by no_login_found »

It seems the issue happnened because by casting tonumber(ffi.cast("intptr_t", pointer)) you can get negative value (especially on 32bit machines), but when casting ffi.cast("pointer", number) you get null pointer from negative values.
So, now I'm using the following code which uses ffi.cast only to get specific pointer type from void*.
Casting between number and pointer is done with the help of union, this operation is somewhat equivalent to reinterptret_cast.
It seems to work fine.

Code: Select all

  local union = ffi.new("union { double d; void* ptr; }")
  ffi.pointerFromNumber = function(ctype, value)
    union.d = value
    return ffi.cast(ctype, union.ptr)
  end
  ffi.numberFromPointer = function(value)
    union.ptr = value
    return tonumber(union.d)
  end
Attachments
game1.love
fixed passing pointers through channels
(93.12 KiB) Downloaded 96 times
Post Reply

Who is online

Users browsing this forum: Ahrefs [Bot], Google [Bot] and 31 guests