OGRE-Next
2.3
Object-Oriented Graphics Rendering Engine
|
Ogre 2.0 uses synchronous threading for some of its operations. This means the main thread wakes up the worker threads, and waits for all worker threads to finish. It also means users don't have to be worried that Ogre is using CPU cores while the application is outside a renderOneFrame
call.
The number of worker threads must be provided by the user when creating the SceneManager:
The other threading parameter besides the number of threads, is the threading strategy used for Instancing, which can be single threaded or multithreaded.
The threading model is synchronous, and meant to be used for tasks that take roughly the same amount of time in each thread (which is a very important assumption!). The ideal number of worker threads is the number of logical cores exposed by the CPU (excluding hyperthreading cores).
Spawning more threads than cores will oversubscribe the system and won't run faster. In fact it should only slow it down.
If you plan to use a whole core for your own computations that will run in parallel while renderOneFrame is working (i.e. one thread for physics) and take a significant cpu time from that core; then in this case the ideal number of threads becomes number_of_logical_cores – 1
Whether increasing the number of threads to include hyperthreading cores improves performance or not remains to be tested.
There are two Instancing techniques that perform culling of their own:
Frustum culling is highly parallelizable & scalable. However, we first cull InstanceBatches & regular entities, then ask the culled InstanceBatches to perform their culling to the InstancedEntities they own.
This results performance boost for skipping large amounts of instanced entities when the whole batch isn't visible. However, this also means threading frustum culling of instanced entities got harder.
There were four possible approaches:
Whether INSTANCING_CULLING_THREADED improves or degrades performance depends highly on your scene.
When to use INSTANCING_CULLING_SINGLETHREAD?
If your scene doesn't use HW Basic or HW VTF instancing techniques, or you have very few Instanced entities compared to the amount of regular Entities.
Turning threading on, you'll be wasting your time traversing the list from multiple threads in search of InstanceBatchHW & InstanceBatchHW_VTF
When to use INSTANCING_CULLING_THREADED?
If your scene makes intensive use of HW Basic and/or HW VTF instancing techniques. Note that threaded culling is performed in SCENE_STATIC instances too. The most advantage is seen when the instances per batch is very high and when doing many PASS_SCENE, which require frustum culling multiple times per frame (eg. pssm shadows, multiple light sources with shadows, very advanced compositing, etc)
Note that unlike the number of threads, you can switch between methods at any time at runtime.
The following tasks are partitioned into multiple threads:
Culling the receiver's box: Very specific to shadow nodes. When a render_scene pass uses (for example) render queues 4 to 8, but the shadow node users render queues 0 to 8; the shadow node needs receiver's aabb data from RQs 0 to 3; which aren't available. It is very similar to frustum culling; except that the cull list isn't produced, only the aabb is calculated. Since aabb merges are associative:
we can join the results from all threads after they're done. In fact, we even exploit this associative property to process them using SIMD.
While often users may want to user their own threading system; it is possible to ask Ogre to process their own task using its worker threads. Users need to inherit from UniformScalableTask
and call SceneManager::executeUserScalableTask
.
The following example prints a message to the console from the multipler worker threads:
Parameter threadId
is guaranteed to be in range [0; numThreads) while parameter numThreads
is the total number of worker threads spawned by that SceneManager.
executeUserScalableTask
will block until all threads are done. If you do not wish to block; you can pass false to the second argument and then call waitForPendingUserScalableTask
to block until done:
Attention!
You must call waitForPendingUserScalableTask after calling executeUserScalableTask( myThreadedTask, false ) before executeUserScalableTask can be called again. Otherwise deadlocks are bound to happen and Ogre makes no integrity checks. Queuing or scheduling of multiples tasks is not supported. This system is for synchronous multithreading, not for asynchronous tasks.
In Ogre 1.x; SceneNodes weren't thread safe at all, not even setPosition
or _getDerivedPosition
.
In Ogre 2.x, the following operations are not thread safe:
std::vector
invalidates all iterators when resizing). If that happens, all SceneNodes will be in an inconsistent state. Inversely, if too many nodes have been removed, the manager may decide it's time for a cleanup, in which case many SceneNodes can become in an inconsistent state until the cleanup finishes. How large the pool reserve is can be tweaked, and how often the manager performs can also be tweaked (NodeMemoryManager
), though. If the user knows what he's doing the race condition might be possible to avoid. Note other SceneManager implementations may have to fulfill their own needs and introduce race conditions of their own we can't predict.ARRAY_PACKED_REALS
). Calling this function could only be thread safe if all all four nodes are in the same thread AND their parents are also on the same thread (parents may not share the same block, thus worst case scenario 4 * 4 = 16
parent nodes have to be in the same thread, not to mention their parents too 4 * 4 * 4 = 64
) AND the children of these parents are not calling _getDerivedPositionUpdated
too from a different thread.The following operations are thread-safe:
getPosition
and setPosition
to the same Node from different threads.With Ogre 2.0; it is now possible to transfer the position & orientation from a physics engine to Ogre Scene Nodes using a parallel for. Ogre 1.x limitations forced this update to be done in a single thread.