OGRE-Next
4.0.0unstable
Object-Oriented Graphics Rendering Engine
|
Starting OgreNext 4.0, the following features were added:
Actual support depends on RenderSystems & CMake build settings. The user can call Ogre::RenderSystems::supportsMultithreadedShaderCompilation
to query whether it is currently supported.
The CMake option OGRE_SHADER_COMPILATION_THREADING_MODE
was added. This option is independent of OGRE_CONFIG_THREAD_PROVIDER
.
Option | Short Description | Long Description |
---|---|---|
0 | Disabled. | No shader compilation will be supported. The macro OGRE_SHADER_THREADING_BACKWARDS_COMPATIBLE_API will be defined thus the class Hlms will provide setProperty, getProperty, unsetProperty and co. API calls that do not ask for a tid argument. If the user calls the functions that ask for a tid argument, the value tid holds must be 0. |
1 | Use compatible API. Enable only if supported by compiler and OS. Default setting. | The macro OGRE_SHADER_THREADING_BACKWARDS_COMPATIBLE_API will be defined thus the class Hlms will provide setProperty, getProperty, unsetProperty and co. API calls that do not ask for a tid argument. OgreNext will use compiler-assisted TLS (Thread Local Storage) to automatically determine the tid (thread ID) value. In practice OgreNext will only use TLS (and hence enabling multithreading) in fully static builds (i.e. CMake option OGRE_STATIC = TRUE ), and disable it in dynamic library builds.If OGRE_STATIC = TRUE then:- If the user calls the functions that ask for a tid argument, the tid value will be used. - If the user calls the functions that don't ask for a tid argument, OgreNext will use TLS to determine the correct tid value. If OGRE_STATIC = FALSE the behavior is the same as Disabled. |
2 | Force-enable. | The macro OGRE_SHADER_THREADING_BACKWARDS_COMPATIBLE_API will not be defined. The user must always call setProperty, getProperty and co. that ask for a tid argument. |
The default option is 1 because it allows to provide Multithreaded support when possible, while providing backwards-compatible API functions that make porting from OgreNext 3.0 easier. Make sure to read Porting tips (from <= 3.0) section.
New users are encouraged to use option 2 instead to make sure they can maximize performance and avoid confusion over which setProperty, getProperty & co. functions they should use.
Many functions provide or ask for a tid argument. When you're asked for a tid argument, you must pass it along. e.g.:
Other functions such as Ogre::HlmsListener::preparePassHash don't ask for a tid. This always means the function is being called from the main rendering thread unless stated otherwise by the documentation. When calling setProperty/getProperty & co. you must use Ogre::Hlms::kNoTid
. e.g.:
If you're overriding the Ogre::Hlms class instead of Ogre::HlmsListener, the same rules apply. However you might override a function that is expected to be called from a worker thread but does not pass a tid value because the original Hlms implementations never needed it.
The rule is that anything called as part of Ogre::Hlms::createShaderCacheEntry needs a tid, while everything else is called from the main thread.
This option is used to facilitate porting from OgreNext 3.0.
All listener functions still provide a tid value. However OgreNext provides setProperty/getProperty functions that don't ask for one. Therefore it is possible to do this:
Hlms shaders are not randomly generated at any time from anywhere. It's not chaotic.
Shader generation requests can originate from 3 locations:
render_scene
compositor pass.warm_up
compositor pass.The last two cases are specifically designed to compile shaders and/or generate PSOs. That's its main function.
Those routines will collect as many shader generation requests as possible and then batch-compile them in worker threads.
The first case however, render_scene
, is a little different. Normally a render_scene in OgreNext 3.0 looks like this (e.g. see the source code for Ogre::RenderQueue::renderGL3) in pseudo-code and extremely simplified:
When OgreNext is compiling in parallel, the function createNewPsoFor()
will actually return immediately with a valid Pso pointer, but it's not ready to be used yet.
That's why we must call waitForHlmsJobs()
before calling executeAllCommands()
.
Therefore OgreNext will process and iterate through many Items while worker threads compile shaders that were seen for the first time.
With a bit of luck, if the worker threads finish compiling before we reach waitForHlmsJobs()
then there will be no stalls at all. If not, then the wait has likely reduced by a little. Furthermore, if multiple new PSOs are encountered, they will be delivered to different worker threads thus compiling in parallel.
The range is [0; num_threads)
.