OGRE-Next  4.0.0unstable
Object-Oriented Graphics Rendering Engine
Threading

OgreNext uses synchronous threading for some of its operations. This means the main thread wakes up the worker threads, and waits for all worker threads to finish. It also means users don't have to be worried that Ogre is using CPU cores while the application is outside a renderOneFrame call.

Initializing

The number of worker threads must be provided by the user when creating the SceneManager:

const size_t numThreads = 4;
InstancingTheadedCullingMethod
threadedCullingMethod = INSTANCING_CULLING_THREADED;
mSceneMgr = mRoot->createSceneManager( ST_GENERIC,numThreads,
threadedCullingMethod,
"ExampleSMInstance" );
@ ST_GENERIC
Definition: OgreSceneManager.h:2828

The other threading parameter besides the number of threads, is the threading strategy used for Instancing, which can be single threaded or multithreaded.

Ideal number of threads

The threading model is synchronous, and meant to be used for tasks that take roughly the same amount of time in each thread (which is a very important assumption!). The ideal number of worker threads is the number of logical cores exposed by the CPU (excluding hyperthreading cores).

Spawning more threads than cores will oversubscribe the system and won't run faster. In fact it should only slow it down.

If you plan to use a whole core for your own computations that will run in parallel while renderOneFrame is working (i.e. one thread for physics) and take a significant cpu time from that core; then in this case the ideal number of threads becomes number_of_logical_cores – 1

Whether increasing the number of threads to include hyperthreading cores improves performance or not remains to be tested.

What tasks are threaded in Ogre

The following tasks are partitioned into multiple threads:

  • Frustum culling: The pool of all visible Entities, InstanceBatches, etc are frustum culled in multiple threads and added to a culled list, one per thread. When all threads are done, the main thread collects the results from all lists.
  • Culling the receiver's box: Very specific to shadow nodes. When a render_scene pass uses (for example) render queues 4 to 8, but the shadow node users render queues 0 to 8; the shadow node needs receiver's aabb data from RQs 0 to 3; which aren't available. It is very similar to frustum culling; except that the cull list isn't produced, only the aabb is calculated. Since aabb merges are associative:

    we can join the results from all threads after they're done. In fact, we even exploit this associative property to process them using SIMD.

  • Node transform updates: Updating all scene nodes' derived position and orientation by inheriting from their parent's derived position & orientation. We have to wait for every parent level depth due to data dependencies.
  • Updating all bounds: Updating the World AABB by applying the node's transform to the local aabbs. The World AABB is needed for correct frustum culling, among other things.
  • Frustum culling instanced entities: See previous section.

Using Ogre's threading system for custom tasks

While often users may want to user their own threading system; it is possible to ask Ogre to process their own task using its worker threads. Users need to inherit from UniformScalableTask and call SceneManager::executeUserScalableTask.

The following example prints a message to the console from the multipler worker threads:

class MyThreadedTask : public Ogre::UniformScalableTask
{
public:
virtual void execute( size_t threadId, size_t numThreads )
{
printf( "Hello world from thread %i",threadId );
}
};
int main()
{
// assumes Ogre is initialized and sceneMgr is a valid ptr
Ogre::SceneManager *sceneMgr;
MyThreadedTask myThreadedTask;
sceneMgr->executeUserScalableTask( myThreadedTask,true );
return 0;
}
Manages the organisation and rendering of a 'scene' i.e.
Definition: OgreSceneManager.h:235
void executeUserScalableTask(UniformScalableTask *task, bool bBlock)
Processes a user-defined UniformScalableTask in the worker threads spawned by SceneManager.
A uniform task is a highly parallelizable task that can be divided in many threads where all threads ...
Definition: OgreUniformScalableTask.h:46
virtual void execute(size_t threadId, size_t numThreads)=0
Overload this function to perform whatever you want.

Parameter threadId is guaranteed to be in range [0; numThreads) while parameter numThreads is the total number of worker threads spawned by that SceneManager.

executeUserScalableTask will block until all threads are done. If you do not wish to block; you can pass false to the second argument and then call waitForPendingUserScalableTask to block until done:

int main()
{
// assumes Ogre is intialized and sceneMgr is a valid ptr
SceneManager *sceneMgr;
MyThreadedTask myThreadedTask;
sceneMgr->executeUserScalableTask( myThreadedTask, false );
doSomeWork();
printf( "I am going to sleep now" );
sceneMgr->waitForPendingUserScalableTask();
printf( "All worker threads finished. Resuming execution of main thread" );
return 0;
}

Attention!

You must call waitForPendingUserScalableTask after calling executeUserScalableTask( myThreadedTask, false ) before executeUserScalableTask can be called again. Otherwise deadlocks are bound to happen and Ogre makes no integrity checks. Queuing or scheduling of multiples tasks is not supported. This system is for synchronous multithreading, not for asynchronous tasks.

Thread safety of SceneNodes

In Ogre 1.x; SceneNodes weren't thread safe at all, not even setPosition or _getDerivedPosition.

In Ogre 2.x, the following operations are not thread safe:

  • Creating or destroying nodes/entities. Don't create a SceneNode while there are scene nodes being used in other threads. It can screw the unique ID and their assignment into the global vector we use to keep track of created nodes. Furthermore The node's memory manager may ran out of memory in its pool and reconstruct the Transform's SoA pointers (it's similar to how std::vector invalidates all iterators when resizing). If that happens, all SceneNodes will be in an inconsistent state. Inversely, if too many nodes have been removed, the manager may decide it's time for a cleanup, in which case many SceneNodes can become in an inconsistent state until the cleanup finishes. How large the pool reserve is can be tweaked, and how often the manager performs can also be tweaked (NodeMemoryManager), though. If the user knows what he's doing the race condition might be possible to avoid. Note other SceneManager implementations may have to fulfill their own needs and introduce race conditions of their own we can't predict.
  • Changing parent / child relationships. Attaching/detaching a node from another one causes its SoA memory to migrate to a different node memory manager, which can trigger a cleanup and/or one of the managers rans out of memory and has to reconstruct.
  • Calling _getDerivedPositionUpdated & Co (all functions that end in "Updated"). These functions will update the derived transforms all way up to the ultimate parent (i.e. root). However in SIMD builds, these updates are performed on 4 nodes at a time (the actual number is not 4, but rather depends on ARRAY_PACKED_REALS). Calling this function could only be thread safe if all all four nodes are in the same thread AND their parents are also on the same thread (parents may not share the same block, thus worst case scenario 4 * 4 = 16 parent nodes have to be in the same thread, not to mention their parents too 4 * 4 * 4 = 64) AND the children of these parents are not calling _getDerivedPositionUpdated too from a different thread.

The following operations are thread-safe:

  • Calling getPosition & Co (getOrientation, getScale, getInheritOrientation, etc), _getDerivedPosition & Co (_getDerivedOrientation, etc) unless you're calling getPosition and setPosition to the same Node from different threads.
  • Calling setPosition, setOrientation, setScale. Note on SIMD builds, 4 Nodes can easily share the same 64 byte line, thus it is advisable that all 4 Nodes to be sent to the same thread to reduce the number of false cache sharing performance hits. Calling setPosition to the same Node from different threads is not supported.
  • Calling _setDerivedPosition, _setDerivedOrientation, _setDerivedScale (which assumes the derived transforms are up to date)

With Ogre 2.0; it is now possible to transfer the position & orientation from a physics engine to Ogre Scene Nodes using a parallel for. Ogre 1.x limitations forced this update to be done in a single thread.