Multicore/Threading help?

SirRanjid · Post by **SirRanjid** » Wed Jan 10, 2018 3:01 pm

Hello I tried wrapping my mind around outsourcing tasks to other threads for efficiency reasons. But I never made something like it before and I can't get the basic concept to work in my mind.

Suppose having stuff drawn that shuld be calculated on another thread.
My intuition would be like(semi-pseudocode all the way):

Code: Select all

OBJECTS = {<arbitrarylist of drawable objects and depending data>}

function love.draw()
	for k,DATA in ipairs(OBJECTS) do
		love.gr.draw(DATA.object,unpack(DATA.data)) --DATA.data = {[1] = x,[2] = y, [3] = rot, [4] = scale_x, [5] = scale_y}
	end
end

--function to be replaced:
function love.update()
	for k,DATA in ipairs(OBJECTS) do	<-- this should be outsoured 
		modify(DATA.data) --do some stuff with data in that table like updating position
	end
end

--to outsource i would just do like:
function love.update()
	thread.assignto(2) --pseudofunction to assign stuff to other thread;1 being the mainthread
		for k,DATA in ipairs(OBJECTS) do	<-- this should not be calculated simultaneously on the 2nd thread
			modify(DATA.data)
		end
	thread.endassign() --pseudofunction to declare that the assignment of a function is limited to this point
end

reality:
(besides using non-existent functions and logic)

However, as they are separate environments, they cannot access the variables and functions of the main thread, and communication between threads is limited.

(https://love2d.org/wiki/love.thread)

Also the thread of examples (https://love2d.org/forums/viewtopic.php?f=4&t=76670) mentioned on the wiki is already a bit too advanced for me on this topic. :/

Some questions I have:
I get that threads are using dedicated chunks of lua that can't write code to the memory like I needed due to it not necessarily being sync then. So channels come in for communication, or? But how do they work?
Could someone solve my outsourcing problem with working code -if it's not too much work- for me to have a direct comparison of how I'd like it to work vs. how this would actually be possible?

I see functions like channel:push/pop - why is it needed? It implies that channels are stacks(?) or objects in a stack?

The best example that gives me somewhat of an understanding is here: https://love2d.org/wiki/love.thread.getChannel ("Communication between main/thread")
Well but I don't get it - I guess I'ts made to print "foo" and "bar" forever from different threads when the other thread wrote something to he other channel.

erasio · Post by **erasio** » Wed Jan 10, 2018 6:00 pm

Alright. Let's begin with the basics.

The CPU is responsible for calculations of all kinds. It does that by getting commands which detail what calculations should be performed.

This alone is a more complicated process than you might imagine.

Accessing the hard drive takes incredibly long (from a software point of view).

This is why we have introduced ram. A place to store stuff during execution temporarily.

The ram is also too slow though. So every CPU has some memory directly next to the cores. Three layers of memory actually. All with increasing speeds and decreasing size.

However. Only ram data can be shared between cores (this is not entirely correct but from our perspective we can accept this as simplified fact).

This means data between threads (or between cores) can never be shared easily.

There are a few techniques to work around this.

But all have essentially the same idea. Data is copied.

One way to do this is by creating a queue or stack which points at data. By doing so we first copy the data and then simply push the reference onto the stack. That's what's happening here. (also just FYI. We have a queue here. Not a stack. First in, first out).

Another common idea is double buffering the data. Meaning you have two versions of the same data. One is write only, the other read only. So one thread writes new data, while the other thread reads the previously prepared data. Once the writing process is done, it swaps out the two data sets so the first thread reads the new data and the second can continue to write new data.

You can do this in love via ffi. Disclaimer: this will involve writing C code and is an advanced topic.

The tricky part now is to communicate properly. Which is quite tricky indeed.

In short. My strong recommendation would be to use threads only for networking or file operations which are elements of a game that take the most time with only simple data in return.

That is unless you're running into serious performance issues that can not be optimised away.

Because setting up threading properly is far from easy and 2D games usually don't need that amount of performance.

The runtime issues usually originate from poorly optimized code.

SirRanjid · Post by **SirRanjid** » Wed Jan 10, 2018 7:53 pm

Wow. Thanks for that explanation.
I think I kinda get it.

erasio wrote: ↑Wed Jan 10, 2018 6:00 pm But all have essentially the same idea. Data is copied.

One way to do this is by creating a queue or stack which points at data. By doing so we first copy the data and then simply push the reference onto the stack. That's what's happening here. (also just FYI. We have a queue here. Not a stack. First in, first out).

So I create a channel, push all data onto it... then there are few ideas to get a full table into a channel:
-keeping the table structure limited to an array where every n elements the data of an object conclude
-array of arrays(1 per object)
-serialising the table and push only one string onto the channel
-dynamically creating sub-channels for tables inside tables like some kind of semi-serialisation
(not sure about the perf of the last 2 solutions)

erasio wrote: ↑Wed Jan 10, 2018 6:00 pm The runtime issues usually originate from poorly optimized code.

It's more about outsourcing stuff thats unnecessary to compute on the main thread like eyecandy for example snowflakes in the foreground which have nothing to do with the game-mechanics but I need many of them and don't wanna pollute the main thread with them.

Ok sticking by the snowflakes example I'd do as follows:
The thread code:

Code: Select all

snowflakes_thread = love.thread.newThread([[
	snowflakes_channel = love.thread.getChannel("flakes")	--flakes on other thread
	
	local function calc_snowflake(flake)
		--like applying flowfield dynamics to each flake
		return flake
	end
	
	function love.update()
		local tmp_flakes = {}
		while true do	--pops and calculates the flakes until empty
			local v = snowflakes_channel:pop()
			if v then
				table.insert(tmp_flakes,calc_snowflake(v))
			else
				break
			end
		end
		
		for k,v in ipairs(tmp_flakes) do	--push calculated data onto channel
			snowflakes_channel:push(v)
		end
	end
]])

Main thread:

Code: Select all

snowflakes_thread = love.thread.newThread([[...]])	--code above
snowflakes_channel = love.thread.getChannel("flakes")	--flakes on main thread
local actual_flakes = {} --table of flakes

for i = 1, 500 do --create like 500 flakes
	local new_flake = {x_pos,y_pos,x_vel,y_vel} --actually with random pos for every flake; and 0,0 velocity
	snowflakes_channel:push(new_flake)	--channel gets a copy here(pointer)
	table.insert(actual_flakes ,new_flake)
end

function love.update()
	if snowflakes_channel:peek() ~= nil then --i guess it checks for the first value of the channel
		actual_flakes = {}
		while true do
			local v = snowflakes_channel:pop()
			if v then
				table.insert(actual_flakes ,v)
			else
				break
			end
		end
	elseif not snowflakes_thread:isRunning( ) then --restart the thread once its done
		snowflakes_thread:start()
	end
end

function love.draw()
	for i,v in ipairs(actual_flakes) do
		--draw each flake
	end
end

This is supposed to calculate the snowflakes asynchronous.

I'd like to pass a custom dt into snowflakes_thread like snowflakes_thread:start(custom_dt) but how do I get it in the thread then?

Would it be possible this way? And if yes is it optimal this way?

erasio wrote: ↑Wed Jan 10, 2018 6:00 pm In short. My strong recommendation would be to use threads only for networking or file operations which are elements of a game that take the most time with only simple data in return.

Thanks! I'll keep that in mind.

That raises one question: If I load something on the not-mainthread into the love-filesystem does it belong to the game then? And how do I load the filesystem onto the thread so that it's loading everything correctly?

zorg · Post by **zorg** » Wed Jan 10, 2018 11:56 pm

To add to erasio's gigantic post, you can use löve's *Data objects to fill them with some data, and then use channels to pass their reference over to other threads; it won't make a copy in RAM, so the reference passing will be fast, and the object will point to the same one in both threads (though you will have to make sure not to write to it simultaneously, and other synchronicity issues may arise); i'm fairly certain that in 0.11, the most generic Data object will be a bigger boon itself.

And to your reply, i think löve already supports more than just flat tables in Channels, but that may still be limited to some extent, not to mention it won't be the fastest thing, since with lua datatypes, löve will need to do copies, and that means deepcopies for nested tables.

You can pass variables to the thread when creating it (should be channel references, imo), and then use those to communicate vars like dt, etc.

You can require almost all love.* modules in other threads (including filesystem) but some, like the graphics one (and maybe the window? event also has one shortcoming, which is written on the wiki) won't function for the most part.

Also, as i stated before, if you load in something and you treat it as a lua type; boolean, number, string, table (function too, though you can't pass these easily anyway); it will take time to copy them to the main thread. FFI stuff may or may not belong to one lua instance, Löve objects certainly don't. The question whether something loaded "belongs" to the game depends on your meaning then, since it does belong to the löve project, but to the thread-specific lua state, most of the time. FileData for example would be shared, so that'd be fast to access from multiple threads. Same with SoundData, ImageData, etc.

SirRanjid · Post by **SirRanjid** » Thu Jan 11, 2018 7:38 am

Ok thanks for this addition. Now I don't understand why I got to copy tables in lua as they're always passed by reference.

How exactly do I retrieve passed variables on the other thread?
I mean I pass them with Thread:start(arg_1,...,arg_n) ...meh found it as note didn't think it would be as easy as putting 'local args = {...}' at the top of the thread's code.

Lastly how do I require the filesystem (since on the mainthread I just make it do it with the conf.lua)? Like: "require("love.filesystem")", "require("love.<modulename>")"?

erasio · Post by **erasio** » Thu Jan 11, 2018 8:32 am

SirRanjid wrote: ↑Wed Jan 10, 2018 7:53 pm Wow. Thanks for that explanation.
I think I kinda get it.

[...]

It's more about outsourcing stuff thats unnecessary to compute on the main thread like eyecandy for example snowflakes in the foreground which have nothing to do with the game-mechanics but I need many of them and don't wanna pollute the main thread with them.

[...]

This is supposed to calculate the snowflakes asynchronous.

I'd like to pass a custom dt into snowflakes_thread like snowflakes_thread:start(custom_dt) but how do I get it in the thread then?

Would it be possible this way? And if yes is it optimal this way?

Possible? Probably. Optimal? I wouldn't say so.

Why is some foreground eye candy this complex that it takes a noticeable amount of CPU performance in the first place?

Does it really need proper simulation? Because you can achieve very similar effects with a lot simpler methods.

Things that might belong into another thread:

File access
Network traffic handler
Advanced AI (not just pathfinding)
Large scale object processing. (factorio style where tens of thousands of objects need to be simulated)

Those are the cases where I'd say threads are always justified even though not necessarily always needed.

In other cases, unless you know what you're doing there's probably a better alternative without using threads.

And by better I mean a more efficient which also takes less time to implement.

Threads aren't magic "make faster" solutions. They take more time to implement, make things harder to debug because results won't be consistent anymore, don't provide twice the speed. Through the thread handling alone you lose performance. And syncing delays all results.

Threads should not be used too casually to outsource just any calculation.

SirRanjid · Post by **SirRanjid** » Thu Jan 11, 2018 11:39 am

The snowflakes-eyecandy example was just an easy example i made up without testing mostly to get the concept right. Reading that it probably works serves it's purpose. But how is it unoptimal?
Unoptimal in the sense of rather don't use threads here or can I actually improve the code above to be faster in what it's doing?

Why that complex? For me it's also about having fun with the learning experience itself and sometimes implementing stuff thats just there because it works and tinker on how to make it better in detail, as detailed as possible. This gives me even more problems to solve and learn from.

Network - when I get to understand how to do it. (making connection and sending serialized data in the right size to the right player, lag compensation/prediction etc. etc. (would try some more dynamic like overwatch does))
Advanced AI is on my makelist with some machine learning.
And factorio...factorio

They outsource render-preperation to do on multiple cores for example. Thats what I also have in mind to implement somehow.

For me threads are a viable solution wherever it saves more time than I spend on thread handling. The future tends towards more advanced multicores so I'd like to use them. (Today AMD threadripper with 16 cores and 2 threads per core... in 2 years this could get into the price range of 400-500 bucks while 8 cores become more of a standard and 4 cores are to be expected.)

zorg · Post by **zorg** » Thu Jan 11, 2018 2:27 pm

SirRanjid wrote: ↑Thu Jan 11, 2018 7:38 amOk thanks for this addition. Now I don't understand why I got to copy tables in lua as they're always passed by reference.

It's true that tables are also passed by reference, but they are part of one lua state, and those are specific to one thread only. Think of it like trying to use a reference to a memory area owned by another program/process, it'll most likely error (maybe not with an access violation, but it won't work.)

SirRanjid wrote: ↑Thu Jan 11, 2018 7:38 amHow exactly do I retrieve passed variables on the other thread?
I mean I pass them with Thread:start(arg_1,...,arg_n) ...meh found it as note didn't think it would be as easy as putting 'local args = {...}' at the top of the thread's code.

Yep, it's that easy!

although since lua supports true multiple arguments, you don't need to put ... into a table if you don't want to; if you know how many and what vars you want to pass to a thread, you could just do the following (hopefully i'm right here):

Code: Select all

 local a,b,c,d = ...

SirRanjid wrote: ↑Thu Jan 11, 2018 7:38 amLastly how do I require the filesystem (since on the mainthread I just make it do it with the conf.lua)? Like: "require("love.filesystem")", "require("love.<modulename>")"?

Code: Select all

 require("love.filesystem") -- or require "love.filesystem"

but yes, love.modulename

One more thing, if you wanted procedural audio (circa version 0.11), even running the code to generate that on another thread is a viable thing to do (OpenALSoft already uses its own thread for internal processing)

Multicore/Threading help?

Multicore/Threading help?

Re: Multicore/Threading help?

Re: Multicore/Threading help?

Re: Multicore/Threading help?

Re: Multicore/Threading help?

Re: Multicore/Threading help?

Re: Multicore/Threading help?

Re: Multicore/Threading help?

Who is online