It might be ProFi. I am using it, it works really fine.
Otherwise, for Windows users, there's LuaProfiler.
Performance and screenSpace culling
Forum rules
Before you make a thread asking for help, read this.
Before you make a thread asking for help, read this.
- Roland_Yonaba
- Inner party member
- Posts: 1563
- Joined: Tue Jun 21, 2011 6:08 pm
- Location: Ouagadougou (Burkina Faso)
- Contact:
- kikito
- Inner party member
- Posts: 3153
- Joined: Sat Oct 03, 2009 5:22 pm
- Location: Madrid, Spain
- Contact:
Re: Performance and screenSpace culling
I don't remember which one I used. Any of the solutions posted above are better than blind guessing.
Good luck with the bottleneck hunting, and let us know what you find, monsieur_h!
Good luck with the bottleneck hunting, and let us know what you find, monsieur_h!
When I write def I mean function.
- monsieur_h
- Citizen
- Posts: 65
- Joined: Tue Oct 30, 2012 4:43 pm
Re: Performance and screenSpace culling
I used profiler.lua and KCacheGrind to visualize the logs. I found several issues with my approach:
- Recomputing every AABB on each frame even if the object didn't move is silly
- using a costy method to check AABB against each other
But after fixing thoses I ended up with a similar problem. If it's now slightly (so slightly is actually not worth doing it) faster to cull unseen objects. However, it's still slower when it comes to moving objects (since we have to recompute the AABBs every time).
I began to investigates quad trees. They indeed look really fast, but I didn't figure out how to update them fast enough to be an interseting solution.
- Recomputing every AABB on each frame even if the object didn't move is silly
- using a costy method to check AABB against each other
But after fixing thoses I ended up with a similar problem. If it's now slightly (so slightly is actually not worth doing it) faster to cull unseen objects. However, it's still slower when it comes to moving objects (since we have to recompute the AABBs every time).
I began to investigates quad trees. They indeed look really fast, but I didn't figure out how to update them fast enough to be an interseting solution.
- Hexenhammer
- Party member
- Posts: 175
- Joined: Sun Feb 17, 2013 8:19 am
Re: Performance and screenSpace culling
Please post your code (entire, run-able .love).monsieur_h wrote:I used profiler.lua and KCacheGrind to visualize the logs. I found several issues with my approach:
- Recomputing every AABB on each frame even if the object didn't move is silly
- using a costy method to check AABB against each other
But after fixing thoses I ended up with a similar problem. If it's now slightly (so slightly is actually not worth doing it) faster to cull unseen objects. However, it's still slower when it comes to moving objects (since we have to recompute the AABBs every time).
- Hexenhammer
- Party member
- Posts: 175
- Joined: Sun Feb 17, 2013 8:19 am
Re: Performance and screenSpace culling
I looked at the last version you uploaded, just at one file actually RenderQueue.lua.. and immediately noticed the following:
- You use table.insert() and table.sort()
These are slow functions which have no business in something called RenderQueue
Note that table.insert() will cause allocation, copying, freeing behind the scenes which brings me to my next point..
- You create tables/objects, a lot of creation (and thus allocation / garbage production) is going on actually. About the worst thing for performance is doing just that. Reuse (i.e. update) already created objects, don't create new ones. I dare to say not a single object should be created within such performance critical code.
To illustrate how much that matters. I recently optimized the drawing code of my tile-based engine by eliminating object creation within it completely. The unoptimized code created a lot of Position objects. The new one creates nothing. It runs twice as fast as the old one. And this is really all I changed there. just eliminating object creation.
Finally two more minor things you may not be aware of:
- math.max and math.min are vararg functions in Lua (an unusual choice). vararg functions are slow by design. However, that is almost certainly not a critical problem in your code
- assert is an ordinary function in lua, it runs and causes the evaluation of its arguments at runtime!
This is different from C/C++ where assert() calls can be compiled away and we nowadays even have compile-time assert
Last but not least I ran the code with LoveJIT.. there is zero difference in that case i.e. whether culling is on or off has no effect on FPS / rendering speed. LoveJIT reaches the vsync limit (60 FPS) in both cases. After I turned off vsync the demo ran at 67-69 FPS, again unaffected by culling on/off.
- You use table.insert() and table.sort()
These are slow functions which have no business in something called RenderQueue
Note that table.insert() will cause allocation, copying, freeing behind the scenes which brings me to my next point..
- You create tables/objects, a lot of creation (and thus allocation / garbage production) is going on actually. About the worst thing for performance is doing just that. Reuse (i.e. update) already created objects, don't create new ones. I dare to say not a single object should be created within such performance critical code.
To illustrate how much that matters. I recently optimized the drawing code of my tile-based engine by eliminating object creation within it completely. The unoptimized code created a lot of Position objects. The new one creates nothing. It runs twice as fast as the old one. And this is really all I changed there. just eliminating object creation.
Finally two more minor things you may not be aware of:
- math.max and math.min are vararg functions in Lua (an unusual choice). vararg functions are slow by design. However, that is almost certainly not a critical problem in your code
- assert is an ordinary function in lua, it runs and causes the evaluation of its arguments at runtime!
This is different from C/C++ where assert() calls can be compiled away and we nowadays even have compile-time assert
Last but not least I ran the code with LoveJIT.. there is zero difference in that case i.e. whether culling is on or off has no effect on FPS / rendering speed. LoveJIT reaches the vsync limit (60 FPS) in both cases. After I turned off vsync the demo ran at 67-69 FPS, again unaffected by culling on/off.
- kikito
- Inner party member
- Posts: 3153
- Joined: Sat Oct 03, 2009 5:22 pm
- Location: Madrid, Spain
- Contact:
Re: Performance and screenSpace culling
Agreed with hexenhammer on everything except this:
Someone in this forum asked, and tests where performed. It turns out that the speed loss due to vararg treatment is smaller than the speed gain by doing things natively. It was only very slightly slower than invoking a function with an if inside, and faster than doing and ... or ... tricks. It was a bit surprising.Hexenhammer wrote: - math.max and math.min are vararg functions in Lua (an unusual choice). vararg functions are slow by design. However, that is almost certainly not a critical problem in your code
When I write def I mean function.
- Hexenhammer
- Party member
- Posts: 175
- Joined: Sun Feb 17, 2013 8:19 am
Re: Performance and screenSpace culling
Min/max are probably implemented in C right? I thought of pure Lua varargs which have nasty overhead. You must either create a throwaway table every time they are called or work directly with the vararg list (with select()) which is even slower according to benchmark results I have read. None of that probably applies to varargs executed by the C core, though.kikito wrote:Agreed with hexenhammer on everything except this:
Someone in this forum asked, and tests where performed. It turns out that the speed loss due to vararg treatment is smaller than the speed gain by doing things natively. It was only very slightly slower than invoking a function with an if inside, and faster than doing and ... or ... tricks. It was a bit surprising.
Who is online
Users browsing this forum: No registered users and 82 guests