The following warnings occurred:
Warning [2] Undefined array key "lockoutexpiry" - Line: 94 - File: global.php PHP 8.1.31 (Linux)
File Line Function
/global.php 94 errorHandler->error
/printthread.php 16 require_once
Warning [2] Undefined array key "lockoutexpiry" - Line: 573 - File: global.php PHP 8.1.31 (Linux)
File Line Function
/global.php 573 errorHandler->error
/printthread.php 16 require_once
Warning [2] Undefined variable $can_access_moderationqueue - Line: 752 - File: global.php PHP 8.1.31 (Linux)
File Line Function
/global.php 752 errorHandler->error
/printthread.php 16 require_once
Warning [2] Undefined array key "avatartype" - Line: 892 - File: global.php PHP 8.1.31 (Linux)
File Line Function
/global.php 892 errorHandler->error
/printthread.php 16 require_once
Warning [2] Undefined array key "avatartype" - Line: 892 - File: global.php PHP 8.1.31 (Linux)
File Line Function
/global.php 892 errorHandler->error
/printthread.php 16 require_once
Warning [2] Undefined variable $awaitingusers - Line: 34 - File: global.php(959) : eval()'d code PHP 8.1.31 (Linux)
File Line Function
/global.php(959) : eval()'d code 34 errorHandler->error
/global.php 959 eval
/printthread.php 16 require_once
Warning [2] Undefined array key "style" - Line: 1024 - File: global.php PHP 8.1.31 (Linux)
File Line Function
/global.php 1024 errorHandler->error
/printthread.php 16 require_once
Warning [2] Undefined property: MyLanguage::$lang_select_default - Line: 5308 - File: inc/functions.php PHP 8.1.31 (Linux)
File Line Function
/inc/functions.php 5308 errorHandler->error
/global.php 1024 build_theme_select
/printthread.php 16 require_once
Warning [2] Undefined array key 1 - Line: 1474 - File: inc/functions.php PHP 8.1.31 (Linux)
File Line Function
/inc/functions.php 1474 errorHandler->error
/inc/functions.php 1429 fetch_forum_permissions
/printthread.php 76 forum_permissions
Warning [2] Undefined property: MyLanguage::$archive_pages - Line: 2 - File: printthread.php(257) : eval()'d code PHP 8.1.31 (Linux)
File Line Function
/printthread.php(257) : eval()'d code 2 errorHandler->error
/printthread.php 257 eval
/printthread.php 117 printthread_multipage
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key 1 - Line: 866 - File: inc/class_parser.php PHP 8.1.31 (Linux)
File Line Function
/inc/class_parser.php 866 errorHandler->error
/inc/class_parser.php 938 postParser->mycode_parse_post_quotes
[PHP] postParser->mycode_parse_post_quotes_callback1
/inc/class_parser.php 816 preg_replace_callback
/inc/class_parser.php 451 postParser->mycode_parse_quotes
/inc/class_parser.php 201 postParser->parse_mycode
/printthread.php 179 postParser->parse_message
Warning [2] Undefined array key 1 - Line: 885 - File: inc/class_parser.php PHP 8.1.31 (Linux)
File Line Function
/inc/class_parser.php 885 errorHandler->error
/inc/class_parser.php 938 postParser->mycode_parse_post_quotes
[PHP] postParser->mycode_parse_post_quotes_callback1
/inc/class_parser.php 816 preg_replace_callback
/inc/class_parser.php 451 postParser->mycode_parse_quotes
/inc/class_parser.php 201 postParser->parse_mycode
/printthread.php 179 postParser->parse_message
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error
Warning [2] Undefined array key "showimages" - Line: 160 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 160 errorHandler->error
Warning [2] Undefined array key "showvideos" - Line: 165 - File: printthread.php PHP 8.1.31 (Linux)
File Line Function
/printthread.php 165 errorHandler->error



Forums
profiling - Printable Version

+- Forums (https://www.vdrift.net/Forum)
+-- Forum: Project (https://www.vdrift.net/Forum/forumdisplay.php?fid=4)
+--- Forum: Development (https://www.vdrift.net/Forum/forumdisplay.php?fid=9)
+--- Thread: profiling (/showthread.php?tid=1264)

Pages: 1 2


profiling - NaN - 03-25-2010

I have managed to compile vdrift on VC9. So I used the opportunity to test AMD CodeAnalyst on VDrift. ( I actually was interested in physics code profiling Smile )
Code:
VDrift, Windowed, 800x600, Shaders disabled, Car: 360, Track: Paul Ricard
Top 20 functions, 1215 instructions, 89.35% of samples in the module, 25.92% of total session samples
89.35% of samples in the module
70528 SCENENODE::GetCollapsedDrawList
19959 GRAPHICS_SDLGL::RENDER_INPUT_SCENE::FrustumCull
10908 GRAPHICS_SDLGL::RENDER_INPUT_SCENE::DrawList
5826 std::_Vb_const_iterator<unsigned int,int,std::vector<bool,std::allocator<bool> > >::operator+=
5093 SOUNDSOURCE::SampleAndAdvanceWithPitch16bit
5035 std::vector<bool,std::allocator<bool> >::operator[]
4333 std::_Vb_reference<unsigned int,int,std::vector<bool,std::allocator<bool> > >::_Getptr
4275 GRAPHICS_SDLGL::GLSTATEMANAGER::Set
4218 MATRIX4<float>::TransformVectorOut
3661 std::_Tree<std::_Tmap_traits<DRAWABLE_FILTER *, ...
2823 TEXTURE_GL::Activate
2227 GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectTexturing
1894 SCENETRANSFORM::IsIdentityTransform
1829 GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectFlags
1268 _ftol2_pentium4
1192 std::_Transform<SCENEDRAW const *, ...
1138 MATRIX4<float>::Equals
  997 QUATERNION<double>::operator*
  959 GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectTransformStart
  891 MATRIX4<float>::Multiply
A closer look at GetCollapsedDrawList:
Code:
36.36% of module samples
14122  for (list <DRAWABLE>::const_iterator i = drawlist.begin(); i != drawlist.end(); ++i)
0        {
32113        if (mi->first->Matches(*i))    
0            {
15093            if (i->GetDrawEnable())
0                {
175                    mi->second.push_back(SCENEDRAW(*i, this_transform));
0                }
0            }
0        }
So it is run on all drawables(also the static geometry) per frame?


- NaN - 03-25-2010

50% of cpu time was spent in the gpu driver (169721 samples in atioglxx.dll).

The OpenGL API Trace looks pretty slick (Paul Ricard has 1309 objects + 133 textures):
Code:
408 glCallList
21 glDrawElements
Here an excerpt:
Code:
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,21
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,25
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,29
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65,
glCallList,36
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,67
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,83
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,89
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,96
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,112
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,133
glActiveTexture,GL_TEXTURE0
glBindTexture,GL_TEXTURE_2D 65
glCallList,137
So we are using glCallList for static geometry. One interesting thing is that we sort by the texture, but it is still bound per list.

I wonder if it's possible to set the texture once for a number of lists. Or even batch the meshes sharing the same texture?


- joevenzon - 03-25-2010

Hum, I know the scenegraph is a little heavier than it needs to be, but I'm surprised it's seeing so many samples in GetCollapsedDrawList. The only scenegraph nodes we should be collapsing per frame are the non-static objects. The static track objects only get collapsed once: at the end of GAME::LoadTrack. You're profiling the release optimized version, right? I wonder if the VC9 compiler is missing some optimization opportunities. You might want to try compiling with GCC and using the GNU profiler, then comparing the results. If you do "scons profiling=1" it'll compile with release optimizations, debugging symbols, and profiling all enabled (the framerate is lower with profiling enabled but it's still very playable). On my computer the frustum culling seems to take the longest. Moving to a hierarchical culling system would speed that up. I've also been thinking about how to make the scenegraph a little lighter weight.

About the texture stuff, I've always heard that OpenGL render state calls are pretty lightweight. Also, I assumed that the driver would take care of doing a NOOP instead of switching from a texture to an identical texture, but I could add that to VDrift pretty easily if it'd improve framerate (I'm not so sure it would). I've been meaning to get VDrift under perfHUD, but decided not to worry about it just yet since I'm planning on rewriting the rendering engine anyway.


- joevenzon - 03-25-2010

Here's what the profile looks like on my machine after 30 seconds in the menus and 5 minutes on Paul Ricard:

Code:
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total          
time   seconds   seconds    calls   s/call   s/call  name    
23.33      6.63     6.63 156017736     0.00     0.00  GRAPHICS_SDLGL::RENDER_INPUT_SCENE::FrustumCull(SCENEDRAW&) const
  9.41      9.31     2.67 90222322     0.00     0.00  QUATERNION<double>::operator*(QUATERNION<double> const&) const
  4.85     10.69     1.38 58049499     0.00     0.00  AABB<float>::Intersect(AABB<float>::RAY const&) const
  4.15     11.87     1.18  3331646     0.00     0.00  btQuantizedBvh::walkStacklessQuantizedTreeAgainstRay(btNodeOverlapCallback*, btVector3 const&, btVector3 const&, btVector3 const&, btVector3 const&, int, int) const
  4.08     13.03     1.16    14907     0.00     0.00  SCENENODE::GetCollapsedDrawList(std::map<DRAWABLE_FILTER*, std::vector<SCENEDRAW, std::allocator<SCENEDRAW> >, std::less<DRAWABLE_FILTER*>, std::allocator<std::pair<DRAWABLE_FILTER* const, std::vector<SCENEDRAW, std::allocator<SCENEDRAW> > > > >&, MATRIX4<float> const&) const
  3.50     14.03     0.99  6517888     0.00     0.00  CARDYNAMICS::GetWheelPositionAtDisplacement(WHEEL_POSITION, double) const
  3.20     14.94     0.91 223250134     0.00     0.00  DRAWABLE_FILTER::Matches(DRAWABLE const&) const
  3.13     15.82     0.89 94649152     0.00     0.00  GRAPHICS_SDLGL::GLSTATEMANAGER::Set(int, bool)
  2.99     16.68     0.85   432324     0.00     0.00  GRAPHICS_SDLGL::RENDER_INPUT_SCENE::DrawList(GRAPHICS_SDLGL::GLSTATEMANAGER&)
  2.76     17.46     0.79 11867239     0.00     0.00  GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectTransformStart(SCENEDRAW&, GRAPHICS_SDLGL::GLSTATEMANAGER&)
  2.18     18.08     0.62 13613592     0.00     0.00  TEXTURE_GL::Activate() const
  1.90     18.62     0.54  1598723     0.00     0.00  CARDYNAMICS::GetWheelSteeringAndSuspensionOrientation(WHEEL_POSITION) const
  1.55     19.06     0.44 11866739     0.00     0.00  GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectFlags(SCENEDRAW&, GRAPHICS_SDLGL::GLSTATEMANAGER&)
  1.44     19.47     0.41 13613938     0.00     0.00  std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(char const*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  1.37     19.86     0.39 35426224     0.00     0.00  void QUATERNION<double>::RotateVector<MATHVECTOR<double, 3u> >(MATHVECTOR<double, 3u>&) const
  1.34     20.24     0.38  1229774     0.00     0.00  CARDYNAMICS::ApplyWheelForces(double, double, int, MATHVECTOR<double, 3u> const&, MATHVECTOR<double, 3u>&, MATHVECTOR<double, 3u>&)
  1.23     20.59     0.35 19758101     0.00     0.00  BEZIER::Bernstein(float, MATHVECTOR<float, 3u>*) const
  1.23     20.94     0.35  1844588     0.00     0.00  ROTATIONALFRAME<double>::RecalculateSecondary()

... etc

The hierarchical view is more interesting but it's too long for here so I put the top 100 on pastebin:
http://pastebin.com/r5RfggmV


- joevenzon - 03-25-2010

Oh yeah, I almost forgot, if you run VDrift with the -profiling argument it will display some rough data about which subsystems are taking the longest.


- NaN - 03-26-2010

Oh, I forgot to mention. The data is from the bullet branch. But it shouldn't matter as I didn't modify the scenegraph code.

I am testing it on a Windows 7 x64 machine(Athlon 64 X2). The VC9 build is compiled with /Ox. It is in fact about 20-30% slower than the GCC4.4.3 -O3 build.

Here the GCC build profile:
Code:
time   seconds   seconds  calls ms/call ms/call  name    
26.43     23.14    23.14                             SCENENODE::GetCollapsedDrawList(std::map<DRAWABLE_FILTER*, std::vector<SCENEDRAW, std::allocator<SCENEDRAW> >, std::less<DRAWABLE_FILTER*>, std::allocator<std::pair<DRAWABLE_FILTER* const, std::vector<SCENEDRAW, std::allocator<SCENEDRAW> > > > >&, MATRIX4<float> const&) const
18.22     39.09    15.95                             GRAPHICS_SDLGL::RENDER_INPUT_SCENE::FrustumCull(SCENEDRAW&) const
  3.16     41.86     2.77                             GRAPHICS_SDLGL::RENDER_INPUT_SCENE::DrawList(GRAPHICS_SDLGL::GLSTATEMANAGER&)
  2.86     44.36     2.50                             GRAPHICS_SDLGL::GLSTATEMANAGER::Set(int, bool)
  2.79     46.80     2.44                             __gnu_cxx::__exchange_and_add(int volatile*, int)
  2.78     49.23     2.43                             GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectTransformStart(SCENEDRAW&, GRAPHICS_SDLGL::GLSTATEMANAGER&)
  2.46     51.38     2.15                             GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectFlags(SCENEDRAW&, GRAPHICS_SDLGL::GLSTATEMANAGER&)
  2.35     53.44     2.06                             std::_Rb_tree_increment(std::_Rb_tree_node_base*)
  2.27     55.43     1.99                             GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectTexturing(SCENEDRAW&, GRAPHICS_SDLGL::GLSTATEMANAGER&)
  1.90     57.09     1.66                             std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned int)
  1.85     58.71     1.62                             std::string::append(std::string const&)
  1.60     60.11     1.40                             TEXTURE_GL::Activate() const
  1.31     61.26     1.15                             std::string::append(char const*, unsigned int)
  1.27     62.37     1.11                             std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
  1.23     63.45     1.08                             std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(char const*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  1.20     64.50     1.05                             __gnu_cxx::__atomic_add(int volatile*, int)
  1.07     65.44     0.94                             std::string::reserve(unsigned int)
Funny huh? Will profile vdrift trunk asap.

Quote:About the texture stuff ...
Yeah, I think it's just me not trusting the OpenGL drivers on Windows. :lol:


- NaN - 03-28-2010

Here the vdrift trunk profile(MinGW GCC4.4.3):
Code:
%    cumulative   self            self   total          
time   seconds   seconds  calls  s/call  s/call  name    
22.75     15.90    15.90    75346     0.00     0.00  SCENENODE::GetCollapsedDrawList(std::map<DRAWABLE_FILTER*, std::vector<SCENEDRAW, std::allocator<SCENEDRAW> >, std::less<DRAWABLE_FILTER*>, std::allocator<std::pair<DRAWABLE_FILTER* const, std::vector<SCENEDRAW, std::allocator<SCENEDRAW> > > > >&, MATRIX4<float> const&) const
14.74     26.20    10.30 95579158     0.00     0.00  GRAPHICS_SDLGL::RENDER_INPUT_SCENE::FrustumCull(SCENEDRAW&) const
  3.56     28.69     2.49 40361028     0.00     0.00  AABB<float>::Intersect(AABB<float>::RAY const&) const
  2.52     30.45     1.76   527415     0.00     0.00  GRAPHICS_SDLGL::RENDER_INPUT_SCENE::DrawList(GRAPHICS_SDLGL::GLSTATEMANAGER&)
  2.48     32.18     1.73 62275440     0.00     0.00  QUATERNION<double>::operator*(QUATERNION<double> const&) const
  2.40     33.86     1.68 122087853     0.00     0.00  GRAPHICS_SDLGL::GLSTATEMANAGER::Set(int, bool)
  1.99     35.25     1.39 17632958     0.00     0.00  GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectFlags(SCENEDRAW&, GRAPHICS_SDLGL::GLSTATEMANAGER&)
  1.90     36.58     1.33                             __gnu_cxx::__exchange_and_add(int volatile*, int)
  1.85     37.87     1.29 17633034     0.00     0.00  GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectTexturing(SCENEDRAW&, GRAPHICS_SDLGL::GLSTATEMANAGER&)
  1.77     39.11     1.24 17632896     0.00     0.00  GRAPHICS_SDLGL::RENDER_INPUT_SCENE::SelectTransformStart(SCENEDRAW&, GRAPHICS_SDLGL::GLSTATEMANAGER&)
  1.76     40.34     1.23                             std::_Rb_tree_increment(std::_Rb_tree_node_base*)
  1.59     41.45     1.11                             T.382
  1.59     42.56     1.11                             std::string::append(std::string const&)
  1.46     43.58     1.02                             std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned int)
  1.20     44.42     0.84                             std::string::reserve(unsigned int)
And I get similar results on my intel machine. So I guess it has to be OS or compiler related.


- joevenzon - 03-28-2010

Interesting! It's starting to sound like it might be operating system related. I'm using GCC 4.4.1 and an AMD X4 on Linux.

Does codeanalyst's line-by-line profiling information give you any ideas about optimizing this function? It's pretty cache unfriendly at the moment due to iterating over maps and lists, and it does a fair amount of data copying, but maybe there's something else that's an obvious reason for its slowness that I'm missing.

I'm thinking about how to rewrite the scenegraph system and I'm undecided about where to go with it. Right now it's a sort of scenegraph lite, with only two types of nodes. Should I make it a more conventional scenegraph system with nodes for texture changes, shader changes, etc? Should I try to make it only re-traverse branches of the tree that have changed since last time (but increasing the overhead of changing a node)? Should I add a hierarchical space partitioning system so culling is faster? If so, how to integrate that into the scenegraph and minimize render state changes while keeping the culling process as fast as possible?


- NaN - 03-28-2010

The profiling counters are inconspicuous. GetCollapsedDrawList has a CPI of two. It is the amount of retired instructions that bothers me. It looks like it is doing a lot of work!?
Code:
CS:EIP       Symbol + Offset                                     64-bit         CPU clocks     Ret inst     
0x18b6e0     SCENENODE::GetCollapsedDrawList                                10000          5542         
0x122080     GRAPHICS_SDLGL::RENDER_INPUT_SCENE::FrustumCull                3016           1799         

2 functions, 215 instructions, Total: 20357 samples, 51.96% of samples in the module, 13.95% of total session samples

The bad code segment:
Code:
    CPU clocks     Ret inst     
for (map <DRAWABLE_FILTER *, vector <SCENEDRAW> >::iterator mi = drawlist_output_map.begin(); mi != drawlist_output_map.end(); ++mi) 241            168          
{                                                                                                                                    0              0            
    for (list <DRAWABLE>::const_iterator i = drawlist.begin(); i != drawlist.end(); ++i)                                             2343           846          
    {                                                                                                                                0              0            
        if (mi->first->Matches(*i))                                                                                                  4608           3814         
        {                                                                                                                            0              0            
            if (i->GetDrawEnable())                                                                                                  1622           433          
            {                                                                                                                        0              0            
                mi->second.push_back(SCENEDRAW(*i,this_transform));                                                                  30             8            
            }                                                                                                                        0              0            
        }                                                                                                                            0              0            
    }                                                                                                                                0              0            
}                                                                                                                                    0              0
The Matches() function:
Code:
bool DRAWABLE_FILTER::Matches(const DRAWABLE & drawable) const
{
    return (drawable.filterspeedup.filtervalue & filtermask) == (filtervalue & filtermask);
}
I added a counter to see how often the sort in GetCollapsedDrawList is executed. It is run if drawlist or childlist of the scenenode are not empty:
245 times/frame in GUI
251 times/frame in car selection
253 times/frame in game

Edit:
It won't solve the problem with the sort loop, but maybe it's worth to move GUI into an own scenenode like the track. I don't understand why it is faster on linux. Will install a distro asap.

Edit2:
After rearranging the loop body:
Code:
CPU clocks     Ret inst
for (map <DRAWABLE_FILTER *, vector <SCENEDRAW> >::iterator mi = drawlist_output_map.begin(); mi != drawlist_output_map.end(); ++mi) 252            240
{                                                                                                                                    0              0
    for (list <DRAWABLE>::const_iterator i = drawlist.begin(); i != drawlist.end(); ++i)                                             2879           2586
    {                                                                                                                                0              0
        if (i->GetDrawEnable())                                                                                                      5389           1661
        {                                                                                                                            0              0
            if (mi->first->Matches(*i))                                                                                              68             75
            {                                                                                                                        0              0
                mi->second.push_back(SCENEDRAW(*i, this_transform));                                                                 25             15
            }                                                                                                                        0              0
        }                                                                                                                            0              0
    }                                                                                                                                0              0
}
It looks like there is a serious penalty accessing the drawlist elemets.


- joevenzon - 03-30-2010

Interesting. I knew lists were slower to iterate through than vectors, but that's the first time I've seen them actually be the bottleneck.


- NaN - 05-19-2010

I did another profiling run with focus on physics(360/ring2007).

8.6% in physics in total for a single car
4.2% in cardynamics
2.0% in bezier patch raytest
1.6% in bullets raytest

We spend half of our physics time doing ray tests. I hope to be able to switch to cylinder collision shapes for wheels. It should be faster than rays and more realistic. But we still will be running the bezier patch tests. Will see if I can optimize it a bit.


- joevenzon - 05-20-2010

NaN Wrote:I hope to be able to switch to cylinder collision shapes for wheels. It should be faster than rays and more realistic.

Cylinder collision faster than a raycast? That's surprising to me.


- NaN - 05-21-2010

I don't have any numbers yet. But that is what I gathered from bullet forum. The ray test is not optimized. It is traversing the whole scene. Collision objects are exploiting frame coherence to accelerate collision detection. If the object is not too fast and not too big the collision detection should be very cheap. I am not sure if the frame coherence is considered in the narrow phase(object vs object) but they are caching/reusing collision contacts from previous frames.


- joevenzon - 05-21-2010

Ah cool, very interesting. Their raycast is actually way faster than the one I was doing, so getting it faster still would be really cool.

We could probably use some frame coherence to speed up the bezier raycasts. Right now they are accelerated using an r-tree sort of data structure (aabb_space_partitioning.h) and we could probably start from the last frame leaf and traverse up the tree and then back down to the new leaf. Also, this data structure needs to be cache-optimized. I never dynamically add objects to it, and each level has only 2 branches, so it could easily be stored as a vector (like is commonly done with binary heaps). This would also speed up the object culling, which uses the same data structure.


- NaN - 07-19-2010

The current OpenGL api trace(shaders disabled) looks like:

Code:
614,OpenGL_1.0,glCallList,984,
615,OpenGL_1.3,glActiveTexture,GL_TEXTURE0,
616,OpenGL_1.1,glBindTexture,GL_TEXTURE_2D 177,
617,OpenGL_1.0,glTexEnvi,GL_TEXTURE_ENV GL_TEXTURE_ENV_MODE 8448,
618,OpenGL_1.0,glCallList,983,
619,OpenGL_1.3,glActiveTexture,GL_TEXTURE0,
620,OpenGL_1.1,glBindTexture,GL_TEXTURE_2D 124,
621,OpenGL_1.0,glTexEnvi,GL_TEXTURE_ENV GL_TEXTURE_ENV_MODE 8448,
622,OpenGL_1.0,glCallList,977,
623,OpenGL_1.3,glActiveTexture,GL_TEXTURE0,
624,OpenGL_1.1,glBindTexture,GL_TEXTURE_2D 170,
625,OpenGL_1.0,glTexEnvi,GL_TEXTURE_ENV GL_TEXTURE_ENV_MODE 8448,
The whole trace: http://pastebin.com/JW7Wr32z

8448 means GL_TEXTURE_ENV_MODE is GL_MODULATE. This is the texture color blending, right?