Some Questions About Love.thread

Questions about the LÖVE API, installing LÖVE and other support related questions go here.
Forum rules
Before you make a thread asking for help, read this.
Post Reply
User avatar
Roland_Yonaba
Inner party member
Posts: 1562
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Some Questions About Love.thread

Post by Roland_Yonaba » Thu Dec 13, 2012 4:49 pm

I have some questions about the love.thread module, and i'll probably end up asking some advises on how to design in a clean manner a very costful and cpu expensive loop.

Okay, first of all, the context:
I am actually working on a 2weeks project about raytracing. Initially, I wanted to explore the topic and and try to get the basics used to render those beautiful images. I made a first attempt with local illumination (simple raytracing), then I came accross this small global illumination path tracer from Kevin Beason, which was, by far, more realistic, and I wanted to give it try. Anyway I didn't want my port to be so compact, so I splitted the code in several files, made some design and added a bit of object orientation, to have something more understandable.
Here's the link to the Github repository: smallpt-lua

For one thing, it works fine. I didn't pushed yet any example of use, as I am still polishing some stuff on the background, but it works flawlessly.
Problem is, it runs reaally slow.

Possible explanations:
So far, I have to admit that global illumination rendering is a really expensive task. Actually, the program take some inputs (scene, primitives, intial viewpoint), performs some quick camera setup, then processes each pixel to compose the the output:

Here is the main loop skeleton:

Code: Select all

for y = 1, height do -- height
  for x = 1, width do -- width

    for sx = 0,aa-1 do -- for anti-aliasing
      for sy = 0,aa-1 do -- for anti-aliasing

        for samp = 1, spp do -- for sampling : we need at least 16~25 samples/per pixels to have a clean output 
          -- Processes pixel x,y, with possible recursive calls (around 5, in general)
          -- evaluate a partial radiance for each sampling, save it
        end
     end
   end
   -- cumulate all radiances to evaluate the final color for pixel x,y
 end
end
So, most of operations involded to evaluate radiance are vector-based operations. The are A LOT of vector operations, actually. I am using my own vector class, and I am conviced that this is one of the bottlenecks...
But, I can't thing of using a table-less structure here (in other words, multiple return values instead), as the code would no longer be readable nor clean enough to me: functions passing vectors to each others will take lots of args, syntax would be a lot more verbose...

So Threads ?
Then I started thinking of using love.thread as a possible way to gain speed.
Actually, I have a very basic understanding of love.thread module and threads in general (and don't laugh at me, i can see you :ultrashocked: ).

First of all, why can't we pass tables from one thread to another ? (Well that's more of a general question, was just wondering why).
Second, I know that multiple threads can be created with love.thread module. And they can run in parallel. This is really nice because, in my case, I can make it so that each <x,y> pixel sampling would be processed in its own thread. Then i'll just have to pass the scene description, plus a <x,y> couple matching a pixel to be evaluated to a thread, and it will return to the main thread the color for that pixel.

Let's say that I have to render a 200*100 sized-image, which makes 20000 pixels to render.
Well, it doesn't mean I have to create 20000 threads, right ?
And even if I could, they woudn't run all at the same time ?
So how much maximum threads can I run at the same time ? This depends on the processor, right ? (and stop laughing at me... :roll: :awesome:)
What would be the pros/cons of doing this, though ? Actually, what I've done so far whith love thread was loading resources in a separate thread, to preserve the framerate in the main thread. In this case actually, I will obviously have two ore more threads running, the main one for basic display, and the others for raytracing operations...But does this means necessarily a gain in terms of speed ?

I have some others questions left, but they can wait, actually.
Sorry for the long post.

Mathias
Prole
Posts: 5
Joined: Thu Dec 13, 2012 12:48 pm

Re: Some Questions About Love.thread

Post by Mathias » Thu Dec 13, 2012 5:38 pm

Creating a thread always takes some time, so its not useful to create a thread for each pixel. Also since your CPU has probably not more then 4-8 cores it doesn't make sense to create much more then that.
The usual way is, you create a few threads and they ask the main thread for a work unit, in your case it makes sense to render one or a few lines per thread. When they are finished, pass back the result and ask for a new work unit.

Keep in mind, path tracing is pretty slow. So using lua probably doesn't help either.

User avatar
bartbes
Sex machine
Posts: 4944
Joined: Fri Aug 29, 2008 10:35 am
Location: The Netherlands
Contact:

Re: Some Questions About Love.thread

Post by bartbes » Thu Dec 13, 2012 5:59 pm

Roland_Yonaba wrote:why can't we pass tables from one thread to another?

Because they're an extremely non-trivial data structure. Correctly copying tables is (practically) impossible. What do you do with loops, for instance?
Roland_Yonaba wrote:Let's say that I have to render a 200*100 sized-image, which makes 20000 pixels to render.
Well, it doesn't mean I have to create 20000 threads, right?
Please no, even a gpu doesn't have that many threads. What you could do, is have a bunch of worker threads, say 4, that all do a bunch of pixels, this means that every thread is doing roughly 5000 pixels.
Roland_Yonaba wrote:And even if I could, they woudn't run all at the same time?
No. But then again, you can almost never guarantee that, it will run mostly at the same time though. As in, it will pretty much act like they were.
Also, OS threads aren't light in any way, you shouldn't be spawning tons of them, especially since you have to load a lua state with the standard libraries on every one as well, you want to re-use threads as much as possible.
Roland_Yonaba wrote:But does this means necessarily a gain in terms of speed?
No, threads never do. That is, they can make it faster, but they can't guarantee it.

User avatar
Roland_Yonaba
Inner party member
Posts: 1562
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Re: Some Questions About Love.thread

Post by Roland_Yonaba » Thu Dec 13, 2012 6:09 pm

Mathias wrote:Keep in mind, path tracing is pretty slow. So using lua probably doesn't help either.
Totally. Well I'm not looking for some lightning, fast-paced stuff, but I am pretty sure something can be done to improve it in its currenht state, actually.
Mathias wrote:Also since your CPU has probably not more then 4-8 cores it doesn't make sense to create much more then that.
I agree on this. So this implies one thread per core (obviously). Well this raises another question then : assuming I create, let's say 4 threads, then I start running them all of them (thread:start):

Code: Select all

-- main.lua
local threads
function love.load()
  threads = {}
  for i = 1,4 do
    threads[i] = love.thread.newThread(('thread%d'):format(i),'somefile.lua')
    threads[i]:start()
  end
end

-- etc etc
What's likely to happen if i run this on a dual-core machine, for instance ?
Does this means that, along with the main thread, only 1 additional thread will be processed at at time ?
Like this:
main thread + thread1 then
main thread + thread2 then
main thread + thread3 then
main thread + thread4
main thread
main thread
...
EDIT: Well partially answered by Bartbes's previous post. But I still leave it open.

User avatar
Inny
Party member
Posts: 652
Joined: Fri Jan 30, 2009 3:41 am
Location: New York

Re: Some Questions About Love.thread

Post by Inny » Thu Dec 13, 2012 6:59 pm

So there's the misunderstanding about what threads are for, in general, amongst people in the computer science field. There's this assumption that threads make things faster, which is not at all what they were created for. Threads were created to bypass blocking code. Instead of having your code stop dead while waiting for the return result of some complex calculation or file i/o operation, that action could be moved to a thread so that you could continue with your calculations. On single-core machines that were all we had until the 90s, the only speed improvements a program would get were the ones that came from being able to do hard calculation and file i/o at the same time. It wasn't until multi-core machines before the actual speed improvements of threads arose. So, if you're doing threads for just speed, the important thing to remember is that speed is the side effect, not the intention of threads.

Now, as for the questions at hand

1. Love implements threads with two separate Lua states, and communicates between them via the Actor Model. As Tables are complex structures, Love doesn't make any attempt to move them between lua states, because it'd certainly do them wrong for somebody. You'll have to work around it via serialization and deserialization, or have the value "stream" between the two states. Here's an idea for a protocol which mimicks semaphors

Main Thread sends work to Worker Thread
Worker performs work
Worker begins sending back results in packets (named X1...Xn)
Main thread waits on value named X1
Main continues reading values until Xm where 1 < m < n is a nil
Main waits on Xm until Worker fills it
Main continues pulling values between Xm through Xn until complete

This doesn't work for complex types, but for array types it'd probably work. It'd probably need some working at to get right.

2. The maximum number of threads you can have depends on the system (which may be capped at like 4096 or something, it's hard to know), but the maximum number of threads that can run simultaneously is dependent on the number of physical threads the machine has, which is tied to the number of cores. For instance, I have a corei5 laptop with 4 cores, and each core has 2 threads, so 8 threads at once. And one is already taken for the Main thread, and other programs on the machine will want those threads as well (like the OS, the windowing system, firefox/chrome, etc).

But, in my opening paragraph, I said that threads existed before multicore machine. As long as you're happy with waiting, you can have more threads than what's physically available. Since you're doing hard calculations, you probably don't want to wait. So, you'll want less threads than the max, so that less threads are waiting for the machine to give up some physical cpu time. It sounds counterintuitive, but it has to do with cache-hits-and-misses and context-switching. The more things that are vying for computation time, the more the os has to waste cpu time on switching between them. Though, less threads means less work is being done, so you have to be johnny-on-the-spot with work to be performed, so design your threads around performing tasks, and have the main thread act as a dispatcher, sending in work to be done. That's called a "Thread Pool".

Let me take a moment to advertise my thread library, Hot Tub Task Machine. Maybe you can use it to give you ideas about how to get your tasks.

Hope this all helps.

User avatar
bartbes
Sex machine
Posts: 4944
Joined: Fri Aug 29, 2008 10:35 am
Location: The Netherlands
Contact:

Re: Some Questions About Love.thread

Post by bartbes » Thu Dec 13, 2012 7:08 pm

Roland_Yonaba wrote: Does this means that, along with the main thread, only 1 additional thread will be processed at at time ?
No, it means that your OS will be switching threads like crazy. One of the fundamentals with threading, it is non-deterministic (at least, practically), you can't tell what's going to run when, or where, and in what order.

EDIT: Do note that it always switches like crazy, because of multiple processes.

User avatar
Roland_Yonaba
Inner party member
Posts: 1562
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Re: Some Questions About Love.thread

Post by Roland_Yonaba » Fri Dec 14, 2012 8:34 am

Okay, thanks all, that was very useful piece of informations, and nice explanations.
Thanks Inny, for the HotTubTaskMachine, i'll sit around and take a closer look at the code soon.

Actually I just set a temporary solution. What I was aiming to do was the possibility to display (in love.draw) a basic interface, and monitor current progress, while (in the background) the task of rendering would be completed. So that the user can see the computation progress, instead of having an app that would look like blocked in the rendering for-do-end loops.

So for now on, I'll be dealing with coroutines, which seem to be perfect for that.
Taking the original loop, it'll now look like:

Code: Select all

local function render(...)
    for y = 1, height do -- height
      for x = 1, width do -- width

        for sx = 0,aa-1 do -- for anti-aliasing
          for sy = 0,aa-1 do -- for anti-aliasing

            for samp = 1, spp do -- for sampling : we need at least 16~25 samples/per pixels to have a clean output
              -- Processes pixel x,y, with possible recursive calls (around 5, in general)
              -- evaluate a partial radiance for each sampling, save it
            end
         end
       end
       -- cumulate all radiances to evaluate the final color for pixel x,y
       local progress = ... -- estimate the current progress
       coroutine.yield(false, progress)
     end
    end
   -- Assuming map is a 2d aray of listing all computed colors
    coroutine.yield(true, map)
end
Then I can just wrap this like:

Code: Select all

local renderer = coroutine.create(render)
And call it in the main file:

Code: Select all

function love.update(dt)
  local _, status, reseult = coroutine.resume(renderer,...)
end
Also, I can still try to commit what i've earlier suggested, using a table - less data structures for vector, just multiple values.
i'll probably won't be that happy with it, but the gain in terms of speed could be noticeable.

There's also a method i've recently discovered that can be applied to global illumination to faster it, called space filling curves (Hilbert curves, for instance).
Actually I still don't get how they really work (opened a thread on stackOverflow for ages, but seems nobody is willing to answer). I'll look into it too.

Post Reply

Who is online

Users browsing this forum: Bing [Bot], slime and 11 guests