Performance and screenSpace culling

Questions about the LÖVE API, installing LÖVE and other support related questions go here.
Forum rules
Before you make a thread asking for help, read this.
User avatar
Roland_Yonaba
Inner party member
Posts: 1563
Joined: Tue Jun 21, 2011 6:08 pm
Location: Ouagadougou (Burkina Faso)
Contact:

Re: Performance and screenSpace culling

Post by Roland_Yonaba »

It might be ProFi. I am using it, it works really fine.
Otherwise, for Windows users, there's LuaProfiler.
User avatar
kikito
Inner party member
Posts: 3153
Joined: Sat Oct 03, 2009 5:22 pm
Location: Madrid, Spain
Contact:

Re: Performance and screenSpace culling

Post by kikito »

I don't remember which one I used. Any of the solutions posted above are better than blind guessing.

Good luck with the bottleneck hunting, and let us know what you find, monsieur_h!
When I write def I mean function.
User avatar
monsieur_h
Citizen
Posts: 65
Joined: Tue Oct 30, 2012 4:43 pm

Re: Performance and screenSpace culling

Post by monsieur_h »

I used profiler.lua and KCacheGrind to visualize the logs. I found several issues with my approach:
- Recomputing every AABB on each frame even if the object didn't move is silly
- using a costy method to check AABB against each other

But after fixing thoses I ended up with a similar problem. If it's now slightly (so slightly is actually not worth doing it) faster to cull unseen objects. However, it's still slower when it comes to moving objects (since we have to recompute the AABBs every time).

I began to investigates quad trees. They indeed look really fast, but I didn't figure out how to update them fast enough to be an interseting solution.
User avatar
Hexenhammer
Party member
Posts: 175
Joined: Sun Feb 17, 2013 8:19 am

Re: Performance and screenSpace culling

Post by Hexenhammer »

monsieur_h wrote:I used profiler.lua and KCacheGrind to visualize the logs. I found several issues with my approach:
- Recomputing every AABB on each frame even if the object didn't move is silly
- using a costy method to check AABB against each other

But after fixing thoses I ended up with a similar problem. If it's now slightly (so slightly is actually not worth doing it) faster to cull unseen objects. However, it's still slower when it comes to moving objects (since we have to recompute the AABBs every time).
Please post your code (entire, run-able .love).
User avatar
Hexenhammer
Party member
Posts: 175
Joined: Sun Feb 17, 2013 8:19 am

Re: Performance and screenSpace culling

Post by Hexenhammer »

I looked at the last version you uploaded, just at one file actually RenderQueue.lua.. and immediately noticed the following:

- You use table.insert() and table.sort()
These are slow functions which have no business in something called RenderQueue
Note that table.insert() will cause allocation, copying, freeing behind the scenes which brings me to my next point..

- You create tables/objects, a lot of creation (and thus allocation / garbage production) is going on actually. About the worst thing for performance is doing just that. Reuse (i.e. update) already created objects, don't create new ones. I dare to say not a single object should be created within such performance critical code.

To illustrate how much that matters. I recently optimized the drawing code of my tile-based engine by eliminating object creation within it completely. The unoptimized code created a lot of Position objects. The new one creates nothing. It runs twice as fast as the old one. And this is really all I changed there. just eliminating object creation.

Finally two more minor things you may not be aware of:

- math.max and math.min are vararg functions in Lua (an unusual choice). vararg functions are slow by design. However, that is almost certainly not a critical problem in your code

- assert is an ordinary function in lua, it runs and causes the evaluation of its arguments at runtime!
This is different from C/C++ where assert() calls can be compiled away and we nowadays even have compile-time assert

Last but not least I ran the code with LoveJIT.. there is zero difference in that case i.e. whether culling is on or off has no effect on FPS / rendering speed. LoveJIT reaches the vsync limit (60 FPS) in both cases. After I turned off vsync the demo ran at 67-69 FPS, again unaffected by culling on/off.
User avatar
kikito
Inner party member
Posts: 3153
Joined: Sat Oct 03, 2009 5:22 pm
Location: Madrid, Spain
Contact:

Re: Performance and screenSpace culling

Post by kikito »

Agreed with hexenhammer on everything except this:
Hexenhammer wrote: - math.max and math.min are vararg functions in Lua (an unusual choice). vararg functions are slow by design. However, that is almost certainly not a critical problem in your code
Someone in this forum asked, and tests where performed. It turns out that the speed loss due to vararg treatment is smaller than the speed gain by doing things natively. It was only very slightly slower than invoking a function with an if inside, and faster than doing and ... or ... tricks. It was a bit surprising.
When I write def I mean function.
User avatar
Hexenhammer
Party member
Posts: 175
Joined: Sun Feb 17, 2013 8:19 am

Re: Performance and screenSpace culling

Post by Hexenhammer »

kikito wrote:Agreed with hexenhammer on everything except this:
Someone in this forum asked, and tests where performed. It turns out that the speed loss due to vararg treatment is smaller than the speed gain by doing things natively. It was only very slightly slower than invoking a function with an if inside, and faster than doing and ... or ... tricks. It was a bit surprising.
Min/max are probably implemented in C right? I thought of pure Lua varargs which have nasty overhead. You must either create a throwaway table every time they are called or work directly with the vararg list (with select()) which is even slower according to benchmark results I have read. None of that probably applies to varargs executed by the C core, though.
Post Reply

Who is online

Users browsing this forum: No registered users and 82 guests