OGRE-Next  4.0.0unstable
Object-Oriented Graphics Rendering Engine
Compositor

Table of Contents

The Compositor is a Core and key component in Ogre 2.0. In 1.x, it was just used for fancy post-processing effects. In 2.0, it's the way to tell Ogre how you want to render the scene. Without setting it up, Ogre won't render to screen.

With the Compositor, the user stops having to deal with setting Viewports, RenderTargets, updating these RenderTargets every frame, etc.

Instead the user has now to setup Nodes and a Workspace. The workspace is the top level system where the user indicates which Nodes he wants to use, how they will be connected, which global render textures will be declared (which can be seen by all nodes from the same workspace), where to render the final output (i.e. RenderWindow, an offscreen RenderTexture) and which SceneManager to use. The user can have multiple workspaces active at the same time.

The new Compositor system was heavily inspired by Blender's Compositor system. Picture from Blender:

Compositor script syntax hasn't changed much, which should make porting them quite easy. Internally though, the new system was written from scratch (while reusing & reviewing some of the existing code); as the previous Compositor was stack-based, while the new one is node-based.

So, if you used to manipulate the Compositor directly from C++; porting efforts could be considerably bigger.

Nodes

A compositor node most likely resembles what used to be "a compositor" in 1.x

The following node clears the RT and draws everything that is in the render queue #50

compositor_node MyNode
{
in 0 Input_as_MyLocaName // Take input texture #0 and use the local name "Input_as_MyLocaName" for reference
target Input_as_MyLocaName
{
//Clear to violet
pass clear
{
colour_value 1 0 1 1
}
pass render_scene
{
visibility_mask 0xffffffff //Viewport's visibility mask
rq_first 50 //Inclusive
rq_last 51 //Not inclusive
}
}
out 0 Input_as_MyLocaName
}

Where is Input_as_MyLocaName defined? What is its resolution? Its bit depth? The RT comes from the input channel, so the answer is that it depends on how the Workspace will connect this node. The Workspace may pass a local RTT declared in a previous node or it could pass RenderWindow.

Input & output channels and RTTs

A node's texture may come from three different sources:

  1. It was locally declared.
  2. It comes from an input channel.
  3. It is a global texture declared in the Workspace. Global textures must use the global_ prefix

Locally declared textures

The following script declares a local texture of resolution 800x600, clears it to violet, and puts it in the output channel #0 (so other compositor nodes can use it as input):

compositor_node MyNode
{
texture rt0 800 600 PFG_RGBA8_UNORM_SRGB
target Input_as_MyLocaName
{
//Clear to violet
pass clear
{
colour_value 1 0 1 1
}
}
out 0 rt0
}
@ PFG_RGBA8_UNORM_SRGB
Definition: OgrePixelFormatGpu.h:78

You may have noticed the syntax for declaring the RTT is similar as it was in Ogre 1.x

For more information on the syntax see texture.

Textures coming from input channels

Input channels are numbered. An input channel must be given a name so that they can be referenced locally at node scope by all target passes. There can't be any gaps (i.e. you can't use channels 0 & 2 but not 1).

Output channels are also numbered and must be assigned an RTT. This number will be later used by the Workspace to perform the connections.

The workspace will be responsible for connecting node A's output channels with node B's input channels. In other words, channels are a way to send, receive and share RTTs between nodes.

The only restriction is that global textures can't be used neither as input or output (global textures are referenced directly). There can be more input channels than output channels, viceversa, and there may be no input nor output channels (i.e. when working with global textures alone).

The following node definition (which does no rendering):

  1. Takes input channel #0 and sends it to output channel #1
  2. Takes input channel #1 and sends it through output channel #0 (in other words it flips the channels #0 & #1).
  3. Takes an uninitialized texture created locally to channel #2
compositor_node MyNode
{
in 0 myFirstInput
in 1 mySecondInput
texture rt0 target_width_scaled 0.25 target_height_scaled 0.25 PFG_RGBA8_UNORM_SRGB
out 0 mySecondInput
out 1 myFirstInput
out 2 rt0
}

Drawing inspiration from Blender's compositor system, the little dots on the left would be the input channels, while the dots on right would be the output texture channels:

Global Textures

Global textures are declared at workspace scope and have the same syntax as local textures. There are a few restrictions:

  1. Global texture names must contain the global_ prefix. Inversely, any local texture or input channel trying to use a name that starts with global_ is illegal.
  2. They can't be used in input or output channels.
  3. Global textures can be seen by any node belonging to the same workspace, but they can't see a global texture that belongs to a different workspace.
  4. If a Node uses a global texture that the Workspace didn't declare, execution of the workspace will fail.

Global textures are as powerful/dangerous as global variables are in C++. The main reason of their existence is that many nodes may use temporary rtts to perform their work, and it's very wasteful to declare these intermediate rtts on every node when they can be shared and reused.

Sharing and reusage can also be achieved through input & output channels, however for temporary rtts (or rtts that are accessed very frequently, i.e. a deferred shader's G Buffer) it would lead to connection hell; and hence global textures are a much better fit.

compositor_node parameters

in

The input(s) to this node. This clause may be repeated. A node definition may have no inputs.

Format:
in <channel_id> <local_texture_name>
Parameters
channel_idchannel_id is a number in range [0; inf) but must be consecutive and continuous (no gaps, i.e. define channel 0, 2, but not 1).
local_texture_nameAn arbitrary & unique name, cannot start with global_.

out

The input(s) from this node. This clause may be repeated. A node definition may have no outputs.

Format:
out <channel_id> <local_texture_name>
Parameters
channel_idchannel_id is a number in range [0; inf) but must be consecutive and continuous (no gaps, i.e. define channel 0, 2, but not 1).
local_texture_nameAn arbitrary & unique name, cannot start with global_.

in_buffer

For UAV buffers. Same as with regular input textures, except you can reference global buffers, and global buffers don't have to start with global_.

If a local buffer and a global buffer have the same name, the local buffer takes precedence.

Format:
in_buffer <channel_id> <buffer_name>
Parameters
channel_idchannel_id is a number in range [0; inf) but must be consecutive and continuous (no gaps, i.e. define channel 0, 2, but not 1).
local_texture_nameAn arbitrary & unique name.

out_buffer

For UAV buffers. Same as with regular output textures, except you can reference global buffers, and global buffers don't have to start with global_.

If a local buffer and a global buffer have the same name, the local buffer takes precedence.

Format:
out_buffer <channel_id> <buffer_name>

For parameters, see in_buffer

custom_id

Custom string that will be hashed to identify this Node definition. Useful for classifying nodes into categories.

Multiple nodes can have the same custom_id.

Format:
custom_id <string>

From C++ you can use:

{
void passPreExecute( Ogre::CompositorPass *pass ) override
{
if( pass->getParentNode()->getDefinition()->mCustomIdentifier == "CustomString" )
{
// This is our node
}
}
}
IdString mCustomIdentifier
Definition: OgreCompositorNodeDef.h:103
const CompositorNodeDef * getDefinition() const
Definition: OgreCompositorNode.h:162
Abstract class for compositor passes.
Definition: OgreCompositorPass.h:77
const CompositorNode * getParentNode() const
Definition: OgreCompositorPass.h:187
Definition: OgreCompositorWorkspaceListener.h:47
virtual void passPreExecute(CompositorPass *pass)
Called when each pass is about to be executed.
Definition: OgreCompositorWorkspaceListener.h:65

Main RenderTarget

When creating the Workspace instance, the C++ code will ask for the RT which should be the ultimate target (i.e. the RenderWindow). This RT is very important as keywords like target_width_scaled and settings like default msaa & pixel format will be based on the attributes from this main RT. This feature will be seen in more detail in the Workspace section.

Note
By default you cannot use the main RenderTarget as a texture (because it's usually the RenderWindow and D3D and OpenGL don't allow it), and doing it may result in a crash.

It is possible to manually call node->connectFinalRT and supply a texture pointer (i.e. if the final RenderTarget is a RenderTexture) that can be bound. An automated way of doing this is not yet implemented.

Target

Targets include multiple passes. Their main purpose is define to which RenderTarget the passes will render to.

What's worth noting is that targets accept an optional parameter 'slice'.

The slice parameter is optional and refers to which slice or face from a 3D texture (or cubemap or 2D array) to use from the given texture. Valid values can be numeric or the hint '+X'. '-X', '+Y', '-Y', '+Z', and '-Z'.

Note
The word 'slice' must not be written. Just write 'target myTexture +X' to work.
When the target is a 3D/Cubemap/array texture, if the slice goes out of bounds, an exception will be raised. If the target is a 2D or 1D texture, this value is silently ignored. Default: slice = 0

target parameters

target_level_barrier

Format:
target_level_barrier [yes|no]

Default value is no.

When yes, this setting is meant to be used with skip_load_store_semantics for optimization on mobile/TBDR targets, but should be used with care.

skip_load_store_semantics allows OgreNext to execute multiple consecutive passes as if they were only one.

An obstacle for this is that barriers from all theses passes must be issued before the first pass executes; otherwise the passes need to be internally broken up and OgreNext will complain if skip_load_store_semantics on but we can't merge the passes.

target_level_barrier yes means that an extra hidden pass will be inserted at the beginning of the target and issue all necessary barriers from subsequent passes:

target rt_renderwindow
{
// By having this set to true or yes, a barrier will be issued
// to cover all 4 passes in this target, which happens
// before the first render_quad
target_level_barrier true
pass render_quad
{
load
{
all dont_care
}
store
{
all dont_care
colour store
}
material MyMaterial
}
pass render_scene
{
skip_load_store_semantics true
rq_first 0
rq_last 5
shadows MainCharacter reuse
}
pass render_quad
{
skip_load_store_semantics true
material Compositor/UpsampleDepth
input 0 worldRt_depth
}
pass render_scene
{
skip_load_store_semantics true
rq_first 14
rq_last 249
shadows NormalShadows reuse
}
}
@ Compositor
Created by compositor.
Definition: OgreTextureGpu.h:195

Note that not always target_level_barrier yes can be successful.

For example if pass A needs texture X to be in state Texture for sampling but pass C needs that same texture in state Uav then we have a contradiction as we can't transition to two states at the same time.

The only solution is to break up the passes and issue two barriers. Note that you can however issue break ups manually:

target rt_renderwindow
{
target_level_barrier true
// Pass A
pass render_quad {}
// Pass B
pass render_quad {skip_load_store_semantics true}
}
target rt_renderwindow
{
target_level_barrier true
// Pass C
pass render_quad {}
// Pass D
pass render_quad {skip_load_store_semantics true}
}

Passes

Types of passes:

pass parameters

All passes share the following script parameters:

pass

Format:
pass <type> [customId]
{
}
Parameters
typemust be one of the supported types: clear, quad, resolve, render_scene, stencil, custom, etc.
customIdOptional. It is used by custom passes to give the registered custom pass provider the means to identify multiple types, in case there are more than one type of custom passes.

num_initial

Number of times this pass will be executed. Default is -1, which means always execute.

When the execution count hits that value, it won't executed again until a d3d device reset, resize or workspace recreation (the execution count is reset and executed N times again)

Format:
num_initial <number>

This parameter replaces the only_initial parameter in Ogre 1.x.

flush_command_buffers

Whether to flush the command buffer at the end of the pass.

This can incur in a performance overhead (see OpenGL's glFlush and D3D11' ID3D11DeviceContext::Flush) for info.

Usually you want to leave this off. However for VR applications that must meet VSync, profiling may show your workload benefits from submitting earlier so the GPU can start right away executing rendering commands.

The main reason to use this is in CPU-bound scenarios where the GPU starts too late after sitting idle.

Format:
flush_command_buffers <off|on>

identifier

An arbitrary user-defined numeric ID used for identifying individual passes in the C++ code.

Format:
identifier <number>

From C++ you can access it via:

{
void passPreExecute( Ogre::CompositorPass *pass ) override
{
if( pass->getDefinition()->mIdentifier == 1234u )
{
// This is our pass
}
}
}
uint32 mIdentifier
Custom value in case there's a listener attached (to identify the pass)
Definition: OgreCompositorPassDef.h:138
const CompositorPassDef * getDefinition() const
Definition: OgreCompositorPass.h:185

execution_mask

8-bit hex value. Specifies the execution mask. For more information see Stereo and Split-Screen Rendering for more information. Default is 0xFF except for clear passes, which default to 0x01.

Format:
execution_mask <hex number>

viewport_modifier_mask

8-bit hex value. Specifies the viewport modifier mask. For more information see [Stereo and Split-Screen Rendering for more Stereo and Split-Screen Rendering for more information. Default is 0xFF except for clear passes, which default to 0x00.

Format:
viewport_modifier_mask <hex number>

colour_write

Disables colour writes. Useful for Z prepass passes; or pixel shaders that output to an UAV instead of a regular RenderTarget (like a Render Texture).

Default: on.

Format:
colour_write <off|on>

profiling_id

User defined text for identifying this pass by name in profilers and GPU debuggers such as RenderDoc.

Format:
profiling_id "name"

viewport

Specifies the viewport. Supported by many other passes (e.g. render_quad, render_scene). The default is 0 0 1 1 which covers the entire screen. Values should be between 0 and 1.

Format:
viewport [idx] <left> <top> <width> <height>
[<scissor_left> <scissor_top> <scissor_width> <scissor_height>]

When 4 parameters are suplied, the scissor box will match the viewport's. All 8 parameters allow to set a custom scissor box.

Note
Scissor testing must be enabled by Ogre::HlmsMacroblock::mScissorTestEnabled for it to work, we just set the size here.

When the optional 'idx' parameter is supplied at the beginning there will be either 5 or 9 parameters instead of 4 or 8 respectively.

This index allows you to set multiple viewports for e.g. instanced_stereo or for shaders that make use of gl_ViewportIndex/SV_ViewportArrayIndex. When not provided, this value defaults to 0. The value is in range [0; 16)

The Compositor will automatically share Viewport pointers between different passes to the same RenderTarget (even for different nodes) as long as they share the exact same parameters.

expose

Tells OgreNext that you plan on using texture_name for sampling during this pass.

Usually this can be inferred (e.g. render_quad's input parameter).

Format:
expose <texture_name>

Reasons for using expose:

  1. You plan on using an exposed texture from C++, e.g. via Ogre::SceneManager::getCompositorTextures.
  2. You set a low level material texture directly to that texture (either via name or pointer) and the Compositor isn't aware of it.

OgreNext must know which textures may or will be used during the pass so resource transitions and barriers can be issued in explicit APIs like DX12 and Vulkan (also OpenGL has rudimentary memory barriers that are needed if Compute Shaders are involved).

Without it, you may get exceptions when using the Vulkan RenderSystem, or Vulkan Validation Errors (when debug validation layers are active).

skip_load_store_semantics

When yes, the load and store semantics will be ignored. Use with care as improper usage may lead to rendering bugs or crashes.

Normally OgreNext tries to merge passes when possible but certain advanced uses are impossible or difficult to get automatically merged, thus this flag indicates we want.

Format:
skip_load_store_semantics [yes|no]

e.g.

compositor_node MyNode
{
in 0 rtt
target rtt
{
pass render_quad
{
load
{
// Do not load or clear anything to colour, this pass will overwrite everything
all clear
colour dont_care
}
store
{
// Only save colour, don't care about depth/stencil
all dont_care
colour store
}
material MaterialThatWritesToEveryColourPixel
}
pass render_scene
{
// Inherit the same semantics of the previous pass
skip_load_store_semantics true
rq_first 0
rq_last max
// We can NOT use recalculate or first, because
// doing so will end the pass, and we're explicitly
// asking Ogre to not close the pass.
shadows MyShadowNode reuse
}
}
}

The reason you should be careful is that we assume you know where you're drawing. Consider the following example:

target TargetA
{
pass render_quad {}
}
target UnrelatedTargetB
{
pass render_quad { skip_load_store_semantics true }
}

This script will not behave as expected because both render_quad passes will draw to TargetA!

This is because with skip_load_store_semantics you're telling Ogre not to set UnrelatedTargetB as the current target because OgreNext should assume it already is the current target.

Likewise there should be no barriers to execute because barriers force the pass to 'close' which means store semantics must be executed; and to open another pass we must execute load semantics again (see target_level_barrier to solve this problem).

We try to perform validation checks in Debug mode to avoid these type of errors, but we can't cover them all.

This setting is an optimization specifically aimed at mobile and you should pay attention to errors, the Ogre.log, Vulkan Validation Layers, and tools like RenderDoc to be sure rendering is happening as intended.

load

This is often referred as "Load Actions". To understand their relevancy please read Load Store semantics.

Format:
load
{
all clear
clear_colour 0.2 0.4 0.6 1
}

all (load)

Sets all RTV parameters (all colour targets in MRT + depth + stencil) to the same load action.

Format:
load
{
all [dont_care|clear|clear_on_tilers|load]
}
Parameters
dont_careAssume uninitialized values are present and will be completely overwritten. This is discouraged on depth & stencil buffers because it precludes optimizations (e.g. Hi-Z, Early Z, Depth Compression, etc.). See LoadAction::DontCare.
clearClear the whole texture to the set clear_colour/clear_depth/clear_stencil values. See LoadAction::Clear.
clear_on_tilersSee LoadAction::ClearOnTilers.
loadLoad the contents that were stored in the texture. See LoadAction::Load.

colour (load)

Sets the load action for each and/or all the colour textures. Does nothing if there are no colour textures.

Format:
load
{
colour [dont_care|clear|clear_on_tilers|load]
colour [N] [dont_care|clear|clear_on_tilers|load]
}
Parameters
NValue in range [0; num_mrt) to reference the colour target in an MRT. For non-MRT RTVs, use 0 or skip this parameter.
dont_care|clear|clear_on_tilers|loadSee Load Actions in all.

depth (load)

Sets the load action for the Depth Buffer. Does nothing if there is no depth buffer.

Format:
load
{
depth [dont_care|clear|clear_on_tilers|load]
}
Parameters
dont_care|etcSee Load Actions in all.
Note
For performance reasons, the load action of the depth buffer should be the same as the stencil one.

stencil (load)

Sets the load action for the Stencil Buffer. Does nothing if there is no stencil buffer.

Format:
load
{
stencil [dont_care|clear|clear_on_tilers|load]
}
Parameters
dont_care|etcSee Load Actions in all.
Note
For performance reasons, the load action of the depth buffer should be the same as the stencil one.

clear_colour / colour_value (load)

Sets the colour to set the texture to. The kayword colour_value is the same, and retained due to old naming conventions.

Note
IMPORTANT: The load action for this colour target must be set to clear or clear_on_tilers. Otherwise this value is ignored.
Format:
load
{
clear_colour [N] <r g b a>
}
Parameters
NValue in range [0; num_mrt) to reference the colour target in an MRT. When absent, it sets all colour targets in the MRT to the same value.
rgbaFour float values to set the colour target to. e.g. clear_colour 0.2 0.4 0.5 1.0.

clear_colour_reverse_depth_aware (load)

Sets the colour texture to either the clear_colour or White - clear_colour depending on whether OgreNext is configured to use reverse depth. See Ogre::RenderSystem::isReverseDepth.

This setting is useful when storing depth data into colour textures such as PFG_R32_FLOAT.

Note
IMPORTANT: The load action for this colour target must be set to clear or clear_on_tilers. Otherwise this value is ignored.
Format:
load
{
clear_colour_reverse_depth_aware [N] <r g b a>
}
Parameters
NValue in range [0; num_mrt) to reference the colour target in an MRT. When absent, it sets all colour targets in the MRT to the same value.
rgbaFour float values to set the colour target to. e.g. 0.2 0.4 0.5 1.0. When using reverse Z, the final colour value will be 0.8 0.6 0.5 0.0.

clear_depth (load)

Sets the value to set the depth buffer to.

Note
IMPORTANT: The load action for depth must be set to clear or clear_on_tilers. Otherwise this value is ignored.
Format:
load
{
clear_depth <value>
}
Parameters
valueFloating point value to set the depth to. Usually in range [0; 1], but not necessarily (depends on API support).
Note
If OgreNext is using reverse depth, the depth value will be set to 1.0 - value.

clear_stencil (load)

Sets the value to set the stencil buffer to.

Note
IMPORTANT: The load action for stencil must be set to clear or clear_on_tilers. Otherwise this value is ignored.
Format:
load
{
clear_stencil <value>
}
Parameters
valueInteger value to set the depth to. Usually in range [0; 255] since to date all known stencil formats are 8-bit.

warn_if_rtv_was_flushed (load)

See Ogre::CompositorPassDef::mWarnIfRtvWasFlushed

Default: False.

Format:
load
{
warn_if_rtv_was_flushed [on|off]
}

store

This is often referred as "Store Actions". To understand their relevancy please read Load Store semantics.

Format:
store
{
// These are the optimum settings unless you need to reuse more data
// (e.g. you need the depth buffer, or you want access to MSAA data before resolving).
colour store_or_resolve
depth dont_care
stencil dont_care
}

all (store)

Sets all RTV parameters (all colour targets in MRT + depth + stencil) to the same store action.

Format:
store
{
all [dont_care|store|resolve|store_and_resolve|store_or_resolve]
}
Parameters
dont_careDiscard, leaving the texture contents in an undefined state. You should not attempt to read from it after this. See StoreAction::DontCare.
storeDo not resolve. Useful if you have to interrupt rendering to a RenderTarget, switch to another RenderTarget, and come back to continue rendering; asuming you didn't need to sample from this texture (to fetch what has been rendered so far).
See StoreAction::Store.
resolveAlways resolve the texture once we're done rendering, and we do not care about the contents of the MSAA surface. This flag won't work on non-MSAA textures and will raise an exception. You should not continue rendering to this texture after this, unless you clear it.
See StoreAction::MultisampleResolve.
store_and_resolveAlways resolve the texture once we're done rendering, and we do care about the contents of the MSAA surface. It is valid to use this flag without an MSAA texture. This flag is mostly meant for explicit-resolve textures as Ogre users have no way of accessing MSAA contents. However it may be useful if you need to interrupt rendering to a RenderTarget, switch to another RenderTarget while also sampling what has been rendered so far, and then come back to continue rendering to MSAA.
See StoreAction::StoreAndMultisampleResolve.
store_or_resolveThis is the compositor's default. It behaves like 'Store' if the texture is not MSAA. It behaves like 'MultisampleResolve' if the texture is MSAA.
See StoreAction::StoreOrResolve. This is the default value.

colour (store)

Format:
store
{
colour [dont_care|store|resolve|store_and_resolve|store_or_resolve]
colour [N] [dont_care|store|resolve|store_and_resolve|store_or_resolve]
}
Parameters
NValue in range [0; num_mrt) to reference the colour target in an MRT. For non-MRT RTVs, use 0 or skip this parameter.
dont_care|etcSee Store Actions in all.

depth (store)

Sets the store action for the Depth Buffer.

Format:
store
{
depth [dont_care|store|resolve|store_and_resolve|store_or_resolve]
}
Parameters
dont_care|etcSee Store Actions in all.

stencil (store)

Sets the store action for the Stencil Buffer.

Format:
store
{
stencil [dont_care|store|resolve|store_and_resolve|store_or_resolve]
}
Parameters
dont_care|etcSee store Actions in all.

clear

non_tilers_only

When true, this pass must only execute if the current GPU is not a TBDR GPU. When false, this pass always executes. Default: false.

This setting is tightly related to Load Action clear_on_tilers.

Format:
non_tilers_only [yes|no]

buffers

Tells the pass which buffers to clear.

Format:
buffers [colour] [depth] [stencil]
Parameters
colourTells the pass to clear all colour target Does nothing if there is no colour buffer(s).
depthTells the pass to clear the depth buffer. Does nothing if there is no depth buffer.
stencilTells the pass to clear the stencil buffer. Does nothing if there is no stencil buffer.
Note
For performance reasons, if you clear the depth buffer you should also clear the stencil buffer.

generate_mipmaps

generate_mipmaps doesn't have special parameters other than the shared ones that are still relevant (i.e. identifier). They're useful for explicitly populating the lower mip levels after you've done rendering.

mipmap_method

Default is api_default which will ask the API or driver to generate them for you. If the API does not support it (e.g. DX12) then Compute will be used.

Format:
mipmap_method [api_default|compute|compute_hq]
Parameters
api_defaultUses whatever the API provides. Which is usually a bilinear downscale.
computeAt the moment, it is the same as compute_hq.
compute_hqUses a high quality gaussian filter. Useful for fast & high quality mipmap generation. Uses a compute shader.
Note
generate_mipmaps works at texture level. If you pass a Cubemap, it will generate mipmaps for all the faces. If you pass a 3D or a 2D array texture, it will generate mipmaps for all slices.
Note
In order for compute-based mipmap generation to work, OgreNext must be compiled with JSON support, and the user must include the resources included in OgreNext's repository at Samples/Media/2.0/scripts/materials/Common

kernel_radius

Integer value. Default is 8. Must be positive, even number. Defines the kernel radius of the compute_hq gaussian filter.

Format:
kernel_radius <8>

gauss_deviation

The standard deviation of the compute_hq gaussian filter. The default is 0.5.

Format:
gauss_deviation <0.5>

render_quad

Quad passes have the same syntax as 1.x; plus the following keywords have been added:

use_quad

Format:
use_quad [yes|no]

Default is no. When no; the compositor will draw a fullscreen triangle. Due to how modern GPUs work, using two rectangles wastes GPU processing power in the diagonal borders because pixels are processed at least in 2x2 blocks; and the results from the pixels out of the triangle have to be discarded. A single triangle is more efficient as all blocks are fill the viewport area, and when the rectangle goes out of the viewport, the gpu efficiently clips it.

When the viewport is not 0 0 1 1; this value is forced to yes. The following picture illustrates a fullscreen triangle:

Interpolation will cause that the effective UV coordinates will be in the [0; 1] range while inside the viewport area.

Using camera_far_corners_world_space will also force to use a quad instead of a tri (but camera_far_corners_view_space works with tris)

For an explanation of why this is a performance optimization, refer to Optimizing the basic rasterizer by Fabien Giesen.

material

Name of the low level material to use.

Format:
material <material_name>

hlms

Name of the Hlms datablock to use.

Format:
hlms <datablock_name>
Note
The datablock must be visible to the HlmsManager. See Ogre::HlmsManager::getDatablock.

input

Texture to bind to the material or datablock.

Format:
input <N> <texture_name>
Parameters
NMaterial's Texture Unit to set.
texture_nameName of the texture. Must be a local, input or global texture.

quad_normals

Sends to the vertex shaderthe camera's frustum corners in either world space or view space through the NORMALS semantic.

This is particularly useful for efficiently reconstructing position using only the depth and the corners (see Samples/2.0/Tutorials/Tutorial_ReconstructPosFromDepth).

Format:
quad_normals
[camera_far_corners_view_space|camera_far_corners_view_space_normalized|camera_far_corners_view_space_normalized_lh|camera_far_corners_world_space|camera_far_corners_world_space_centered|camera_direction]
Parameters
camera_far_corners_view_spaceSends camera frustum corners in view space.
camera_far_corners_view_space_normalizedSends camera frustum corners in view space. See Ogre::CompositorPassQuadDef::VIEW_SPACE_CORNERS_NORMALIZED. Note: It divides the whole vector by the far plane; causing dir.z to be always -1 (but the vector itself isn't unit-length).
camera_far_corners_view_space_normalized_lhSame as camera_far_corners_view_space_normalized but left-handed. Note: dir.z is always 1.
camera_far_corners_world_spaceSends camera frustum corners in world space.
camera_far_corners_world_space_centeredSends camera frustum corners direction, but it is centered around origin (0, 0, 0).
camera_directionSame as camera_far_corners_world_space_centered but normalized (values are in range [0;1]) which is useful for sky rendering & atmospheric scattering. See TutorialSky_Postprocess sample.

Interesting read: Reconstructing Position From Depth Part I, Part II, Part III

Note
In OgreNext you need to apply the world-view-proj matrix so that the pass being drawn compensates for texel-to-pixel aligning reads in Direct3D9. Failing to do so will not only cause the aforementioned alignment issue, but also will cause glitches when the viewport is not 0 0 1 1

In Ogre 1.x, only the proj matrix was necessary to fix texture flipping issues when rendering to FBOs in OpenGL.

camera

Sets the current camera to use if you want to use a different one from the default one.

camera <camera_name>

camera_cubemap_reorient

Rotates the camera 90°, -90° or 180° depending on the value of the slice index. See Ogre::CompositorPassQuadDef::mCameraCubemapReorient.

Format:
camera_cubemap_reorient [yes|no]

render_scene

This is the main pass to render most of the scene.

rq_first

Sets the first render queue ID to start rendering from. The default is 0. Must be a value in range [0 and 255].

The value is inclusive (Objects in ID = rq_first get rendered).

Format:
rq_first <id>

rq_last

The default is max which is a special parameter that implies the last active render queue ID. If numeric, value must be between 0 and 255.

The value is not inclusive (Objects in ID = rq_last do not get rendered).

Format:
rq_last <id>

lod_bias

Applies a bias multiplier to the lod. Valid values are in range [0; Inf).

A higher lod bias causes LODs to pop up sooner and should be preferred on slower devices.

Default: 1.0

Format:
lod_bias <bias>

lod_update_list

When no, the LOD list won't be updated, and will use the LOD lists of calculated by a previous pass. This saves valuable CPU time.

Useful for multiple passes using the same lod_camera (without a pass in the middle with a different lod_camera that would override the cached LOD lists).

If your application is extremely CPU bound, and hence you don't need LOD, turning this setting to false in all passes will effectively turn lodding off (and alleviate the CPU).

Default: yes; except for passes belonging to shadow nodes, which is forced to false unless lod_camera is a non-empty string.

Format:
lod_update_list [yes|no]

See Ogre::CompositorPassSceneDef::mUpdateLodLists.

cull_reuse_data

When true, the frustum culling is skipped in this pass. Data from the previous frustum culling execution is used instead.

This is useful if you execute two render_scene passes in a row that are almost identical (e.g. early depth pre-pass followed by a normal render); thus saving lots of valuable CPU time.

Format:
cull_reuse_data [yes|no]

visibility_mask

Visibility mask to be used by the pass' viewport. Those entities that fail the test 'entityMask & visibility_mask' will not be rendered.

There are no significant changes to Ogre 1.x, except that the script compiler now accepts hexadecimal values with the 0x prefix instead of decimal values.

Format:
visibility_mask <hex number> // e.g. visibility_mask 0xffff0000

light_visibility_mask

Visibility mask to be used by the pass for culling lights in Forward+.

Those entities that fail the test 'entityMask & light_visibility_mask' will not be used as non-shadow-casting lights during this pass.

This will give you a relatively efficient way to control with coarse granularity which lights affect which objects.

Note that this uses the mask set via light->setVisibilityMask, not the one set via light->setLightMask.

Format:
light_visibility_mask <hex number> // e.g. light_visibility_mask 0xffff0000

shadows

Specifies the shadow node to use for rendering with shadow maps. See section about *Shadow Nodes* for more information. When a shadow node's name is provided, the second parameter defaults to first.

Off by default.

Format:
shadows <off|shadow_node_name> [first|reuse|recalculate]
Parameters
offSet to off if it should not use shadows.
shadow_node_nameThe name of the shadow node to use. Mutually exclusive with off.
firstOptional. Default. Automatically calculates whether we should reuse or recalculate, based on whether this is the first node. Mutually exclusive with reuse & recalculate.
reuseOptional. Reuses existing rendered shadow map data, regardless of whether it's valid or if OgreNext believes it should be redrawn. Mutually exclusive with first & recalculate.
recalculateOptional. Forces to redraw the shadow map, even if OgreNext believes it should not. This is useful if you've modified the scene via listeners or rendering in a complex way that OgreNext has no way to know the shadow node should be redrawn. Mutually exclusive with first & reuse.

overlays

Whether to render Overlays from the OverlaySystem component. On by default for regular nodes, Off by default on shadow nodes. The goal is that eventually Overlays obey RenderQueue IDs like everything else, but it was too hard to port (Overlay system is tad bit complex...) so this hack/flag was created. It will be eventually removed.

Format:
overlays <off|on>

See Overlays for more info.

camera

When not specified, the default camera is used for rendering the pass (this default camera is specified when instantiating the workspace from C++).

When a name is given, the Compositor will look for this camera and use it. Very useful for reflection passes (mirrors, water) where the user wants to be in control of the camera, while the Compositor is associated with it. The Camera must be created by the user before the workspace is instantiated and remain valid until the workspace is destroyed.

Format:
camera <camera_name>

lod_camera

The camera point of view from which the LOD calculations will be based from (i.e. useful for shadow mapping, which needs the LOD to match that of the user camera). When an empty string is provided, OgreNext will assume the lod camera is the same as the current camera, except for shadow nodes in which it will assume it's the lod_camera from the normal pass the shadow node is attached to. Default: Empty string.

Format:
lod_camera <camera_name>

cull_camera

In VR we want to reuse the same cull list for both eyes. Additionally we'd like to calculate shadows for both eyes once, rather than once per eye. This setting allows setting a cull camera different from the rendering camera that should be placed in such a way that the cull camera's frustum encompases both left and right eye. When this string is empty, the regular camera is used. Default: Empty string.

Format:
cull_camera <camera_name>

camera_cubemap_reorient

When Yes, the camera will be reoriented for rendering cubemaps, depending on which slice of the render target we're rendering to (3D, Cubemaps and 2D-array textures only). Its original orientation is restored after the pass finishes. The rotations are relative to its original orientation, which can produce counter-intuitive results if the Camera wasn't set to identity (unless that's the desired effect). See Target section on how to indicate which slice should we render to. Default: No.

Note: if the target is not a cubemap, Ogre will still try to rotate the camera, often to unintended angles.

Format:
camera_cubemap_reorient [yes|no]

enable_forwardplus

When yes, this pass will use Forward3D/ForwardClustered (must be enabled first by the developer via C++, see Forward3D sample). When No, Forward3D will not be used for this pass, which can improve performance both CPU and GPU side (but many lights are likely not going to be drawn or used).

Default: Yes.

Details: CPU side, lights won't be culled against the camera (only a saving if F3D didn't already have a cache from a previous pass during the same frame, with the exact same camera and angle). GPU side, the pixel shaders will be lighter.

Format:
enable_forwardplus [yes|no]

flush_command_buffers_after_shadow_node

Very similar to flush_command_buffers. Does not do anything if 'shadows' is set to 'reuse' (or was set to 'first' and this node is not the first).

See Ogre::CompositorPassSceneDef::mFlushCommandBuffersAfterShadowNode.

Format:
flush_command_buffers_after_shadow_node [yes|no]

is_prepass

Indicates this is a prepass render. HlmsPbs implementation will render a GBuffer with normals and shadow mapping information. See ScreenSpaceReflections sample for an example on how to use it.

Format:
is_prepass [yes|no]

use_prepass

Indicates this pass will take advantage of the data generated during the prepass, which means depth buffer writes may be forced to off; normals will be sourced for the GBuffer. And if present, a reflection texture will be used for calculating SSR (Screen Space Reflections).

Format:
use_prepass <GBuffer> [reflectionBuffer]

gen_normals_gbuffer

See Ogre::CompositorPassSceneDef::mGenNormalsGBuf.

Format:
gen_normals_gbuffer [yes|no]

use_refractions

Tells the HlmsPbs implementation the necessary data to perform refractions.

See sample Samples/2.0/ApiUsage/Refractions on how to set it up.

Format:
use_refractions <depth_texture> <refractions_texture>
Parameters
depth_textureName of the depth texture used for rendering the scene without refractive objects.
refractions_textureName of the colour texture used for rendering the scene without refractive objects.
Note
If the depth_texture is the same as the depth buffer being rendered to, then the current RTV in the target pass must have the depth buffer to read_only.

By default use_refractions will always force-enable read_only, which means the current render_scene cannot write to the depth or the stencil buffers.

Due to this, trying to use skip_load_store_semantics for this pass (or the next one) may cause problems unless read-only depth buffers are desired.

uv_baking

Value in range [0; 254]. When set to 255, it is disabled.

See Samples/2.0/Tutorials/Tutorial_TextureBaking See Ogre::CompositorPassSceneDef::mUvBakingSet.

Format:
uv_baking <value>

uv_baking_offset

Float value in for offsetting U & V values, in pixels.

See Samples/2.0/Tutorials/Tutorial_TextureBaking See Ogre::CompositorPassSceneDef::mUvBakingOffset.

Format:
uv_baking_offset <offset_x> <offset_y>

bake_lighting_only

See Samples/2.0/Tutorials/Tutorial_TextureBaking See Ogre::CompositorPassSceneDef::mBakeLightingOnly.

Format:
bake_lighting_only [yes|no]

instanced_stereo

Whether to use instanced stereo, for VR rendering. See InstancedStereo and OpenVR samples. You will probably want to also set multiple viewports, at the very least viewports 0 and 1

Format:
instanced_stereo [yes|no]

shadows

This pass enables force-updating multiple shadow nodes in batch in its own pass

This is useful because shadow nodes may "break" a render pass in 3:

  • Normal rendering to RT
  • Shadow node update
  • Continue Normal rendering to the same RT

This is an unnecessary performance hit on mobile (TBDR) thus executing them earlier allows for a smooth:

  • Shadow node update (all of them? Up to you)
  • Normal rendering to RT

Don't forget to set shadow nodes to reuse in the pass scene passes or else you may overwrite them unnecessarily

Usage is simple:

// This pass supports being part of a nameless target
target
{
pass shadows
{
// We can update multiple shadow nodes
shadows MyShadowNode_0
shadows MyShadowNode_1
shadows MyShadowNode_2
// These 4 parameters are the same as in pass_scene
camera CameraName
lod_camera CameraName
cull_camera CameraName
visibility_mask 0xffffffff
}
}

stencil

Stencil passes are little more flexible than in Ogre 1.x; always remember to restore the stencil passes before leaving the node otherwise next nodes that will be executed may use unexpected stencil settings.

Most relevant changes are that two sided stencil can now be definted with more flexibility (it's not a boolean anymore), and that syntax has slightly changed to accomodate for this change:

pass stencil
{
check true
mask 0xff
read_mask 0xff
both
{
fail_op keep
depth_fail_op increment
pass_op decrement_wrap
}
}

The read mask is new, and now the fail_op, depth_fail_op & pass_op must be enclosed between brackets.

Valid values are 'both' 'front' and 'back'. 'both' is just a shortcut for defining front and back at the same time with less typing.

uav_queue

This is a new feature introduced in Ogre 2.1. s stands for Unordered Access Views, in D3D's jargon. OpenGL users know UAVs as the feature combination of image textures (imageLoad, imageStore) and SSBOs (Shader Storage Buffer Object). UAVs are powerful beasts because they allow random read and write access from a shader, and even support atomic operations. Proper use of them can achieve incredible results that couldn't be done without UAVs, but improper use can severely hurt performance.

There's quite a discrepancy between D3D11 & OpenGL in how they treat UAVs from API interface perspective. D3D11 equals UAVs to RenderTargets; while OpenGL equals them more like textures.

In fact, the D3D11 API call to bind UAVs must set RenderTargets at the same time. There is no API call to only set UAVs. To make things harder, D3D11 forces UAVs to share slots with RenderTargets; and there are up to 8 slots in total (64 when using D3D11.1 on Windows 8.1). Which means if you're using an MRT with 3 targets, you only have 5 slots left for UAVs.

We can guess for performance improvements: this way D3D11 can check for hazards when setting RTs and UAVs (i.e. make sure you don't bind the same resource as both RT and UAV) while they still use the same hazard checking they do for textures to check that you're not binding a texture at the same time it is bound as an RT/UAV.

If the UAV equals a texture, as in OpenGL; they would have to check textures against textures every time a texture changes, which is O( N! ) complexity; and also a very common operation. Considering past experiences, we're guessing OpenGL just simply skips the check and lets the hazard happen (which is cool when there are hardware extensions that allow you to read/write from these resources at the same time as long as you abide to certain rules).

Because D3D11 is more restrictive than OpenGL, our interface resemble's D3D11.

starting_slot

Offset for all UAV slots. For example if you bind an uav to slot 3 and the starting slot is 2; the uav will actually be bound to uav slot 5. When set to 255, the slot offset is ignore and leaves the last setting made.

Default: 255 (by default Ogre sets it to 1).

Format:
starting_slot <number>

uav

Sets a texture visible in the current compositor scope (i.e. global textures, input textures, local textures).

Format:
uav <slot> <texture_name> <read> <write> [pixel_format] [mipmap #]
Parameters
slot0-based index to bind to.
texture_nameName of the texture. Must be global, input or local texture.
readIf present, it states you want to read from this UAV in the next vertex/pixel shaders. If absent, you don't intend to read.
writeIf present, it states you want to write into this UAV in the next vertex/pixel shaders. If absent, you don't intend to write.
pixel_formatOptional. The pixel format to bind as. If absent, it uses the original pixel format. Note: the texture must have been declared with reinterpretable if reinterpreting as a different pixel format.
mipmapThe mipmap value must be following by a number (0-based). If not present, default value is 0.
Flag Combination Status
read valid
Read + write valid
write valid
<nothing> invalid

Example:

starting_slot 1
uav 2 global_myTexture read write mipmap 5

Will bind the global texture 'global_myTexture' to slot 3, will have both read & write access, and use mipmap level 5.

Note
starting_slot + slot = 1 + 2 = 3

If only the slot is specified, the slot will be cleared.

uav_external

Exactly the same as uav. But instead of sourcing the texture by name from the Compositor scope, the name is referencing a texture that can be accessed via TextureGpuManager::findTextureNoThrow.

uav_buffer

Sets an UAV buffer visible in the current compositor scope (i.e. global buffers, input buffers, local buffers).

Format:
uav__buffer <slot> <buffer_name> <read> <write> [offset_bytes] [size_bytes]
Parameters
slotSame as uav
readSame as uav
writeSame as uav
offset_bytesDefault: 0. Optional. Offset in bytes to bind.
size_bytesDefault: 0. Optional. Size in bytes to bind. When size_bytes = 0, we assume you want to bind from the offset until the end of the buffer.
starting_slot 1
uav_buffer 3 myUavBuffer read write 256 512

Note there may be HW alignment restriction on which offset you specify. Multiples of 256 bytes are a safe bet.

Note that uav_buffer slots are shared with uav texture's. Binding both to the same slot index will only result in one of them being available.

If only the slot is specified, the slot will be cleared.

keep_previous_uavs

When false, all previous UAVs in all slot will be cleared. When true, only the UAV slots modified by this pass will be affected. Default: true.

Format:
keep_previous_uavs [true|false]

compute

Compute passes let you run a compute job. It can read textures, read/write to UAV textures, and read/write to UAV buffers.

job

Sets the name of the compute job to run (an HlmsComputeJob).

Format:
job <job_name>

uav

Format:
uav <slot> <texture_name> [mrt #] <read> <write> [pixel_format] [<mipmap> #] [allow_write_after_write]

See uav_queue description. The presence of allow_write_after_write means the compositor will not insert a barrier between to consecutive passes that writes to the UAV without reading.

uav_buffer

Format:
uav_buffer <slot> <bufferName> <read> <write> [offsetBytes] [sizeBytes] [allow_write_after_write]

See uav_queue description. The presence of allow_write_after_write means the compositor will not insert a barrier between to consecutive passes that writes to the UAV without reading.

input

Format:
input <slot> <texture_name>

Binds a texture to the texture unit. Syntax is the same as render_quad's input. The slot is not shared with the uav's.

Compute passes don't really belong to a render target. However due to the Compositor's design, they must be specified within a render target. You may do so within a valid render target:

compositor_node MyNode
{
in 0 rt_renderwindow
texture myUavTexture target_width target_height PFG_RGBA8_UNORM depth_pool 0 uav
buffer myUavBuffer 1024 4
target rt_renderwindow
{
//Run compute job with myUavTexture & myUavBuffer bound.
pass compute
{
job myComputeJobName
uav 0 myUavTexture read write
uav_buffer 1 myUavBuffer read write
}
//Clear rt_renderwindow to violet
pass clear
{
colour_value 1 0 1 1
}
}
out 0 myUavTexture
out_buffer 0 myUavBuffer
}
@ PFG_RGBA8_UNORM
Definition: OgrePixelFormatGpu.h:77

Or to a null dummy render target, which occupies almost no memory:

compositor_node MyNode
{
texture nullDummy target_width target_height PFG_NULL
texture myUavTexture target_width target_height PFG_RGBA8_UNORM depth_pool 0 uav
buffer myUavBuffer 1024 4
target nullDummy
{
//Run compute job with myUavTexture bound.
pass compute
{
job myComputeJobName
uav 0 myUavTexture read write
uav_buffer 1 myUavBuffer read write
}
}
out 0 myUavTexture
out_buffer 0 myUavBuffer
}
@ PFG_NULL
Definition: OgrePixelFormatGpu.h:49
Note
Do NOT set UAV buffers to the compute job directly via C++ (the class HlmsComputeJob) bypassing the Compositor. The Compositor needs to evaluate memory barriers and resource transitions. Leaving inconsistent memory barriers can result in hazards/race conditions in some APIs. If in doubt, change the CompositorPassComputeDef instead.
Also setting Textures that are RenderTargets is dangerous. For RenderTargets, change the CompositorPassComputeDef instead.
Note
Don't interleave compute and graphics passes. For optimum performance, try to batch everything of the same type together.

texture

texture <name> <width> <height> [depth] <pixel_format> [msaa <N>] [msaa_auto]
[depth_texture] [depth_pool <poolId>] [uav] [2d_array|3d|cubemap] [mipmaps <numMips>] [no_automipmaps] [reinterpretable]
[explicit_resolve] [not_texture]
Parameters
nameA locally unique name must be assigned (and cannot start with global_ prefix).
width
heightThe dimensions of the render texture. You can either specify a fixed width and height, or you can request that the texture is based on the physical dimensions of the viewport to which the compositor is attached. The options for the latter are either of
  • target_width and target_height
  • target_width_scaled <factor> and target_height_scaled <factor>
where <factor> is the amount by which you wish to multiply the size of the main target to derive the dimensions.
Parameters
depthUsed by 2d_array and 3d textures. Specifies their depth / number of slices. It's automatically forced to 1 for 2d textures and 6 for cubemaps.
pixel_formatThe pixel format of the render texture. This affects how much memory it will take, what colour channels will be available, and what precision you will have within those channels. Most common options are PFG_RGBA8_UNORM_SRGB, PFG_RGBA8_UNORM, PFG_RGBA16_FLOAT, PFG_R16_FLOAT, PFG_RGBA16_FLOAT, PFG_R32_FLOAT. See Ogre::PixelFormatGpu for all options.
msaaExplicitly specifies the MSAA count. Valid values are: 1, 2, 4, 8, 16.
Note that this is subject to GPU support (e.g. some mobile GPUs don't support MSAA 2x and most consumer desktop GPUs don't support values > 8).
msaa_autoUses the same MSAA setting as the global setting for the workspace.
depth_poolWhen present, this directive has to be followed by an integer. This one sets from which Depth buffer pool the depth buffer will be chosen from. All RTs from all compositors with the same pool ID share the same depth buffers as long as it's possible (must have the same resolution, must have the same depth_texture setting). RenderWindows can**not** share their depth buffers due to API limitations on some RenderSystems. When the pool ID is 0, no depth buffer is used. This can be helpful for passes that don’t require a Depth buffer at all, potentially saving performance and memory. Default value is 1.
depth_textureWhen present, the RTT indicates you want to later access the depth buffer's contents as a texture in a shader. This feature is discouraged starting OgreNext >= 2.2 because using RTVs gives you better control.
uavWhen present, the texture can be used as an UAV (also called "STORAGE" in Vulkan lingo).
2d_array|3d|cubemapWhen present, the texture will be created as a 2d_array, 3d or cubemap. Mostly relevant for UAVs but is also useful for rendering. See Target slice parameter.
mipmapsDefault: 1; Indicates how many mipmaps to use. 1 for none. Use 0 to fill all mipmaps until 1x1
no_automipmapsWhen absent (default) and mipmaps are != 1, OgreNext will assume you may eventually use generate_mipmaps (not the Compute version) on this texture or call Ogre::TextureGpu::_autogenerateMipmaps from C++.
When present, it is not valid to use generate_mipmaps on this texture (except for the Compute version).
reinterpretableWhen present, it indicates the texture may be reinterpreted to a different pixel format. e.g. PFG_RGBA8_UNORM as PFG_RGBA8_UNORM_SRGB, PFG_RGBA8_UNORM as PFG_R32_UINT, etc.

Not all formats can be reinterpreted into any format. The bpp must be the same. Compressed blocks (e.g. BCn) have specific rules. The reinterpretation must be supported by the GPU and API.

Parameters
explicit_resolveWhen absent (default), MSAA textures will have an extra non-MSAA copy where the contents are always resolved (unless store actions are not set to resolve).
When present, there is no extra copy; and you must setup the RTV manually with a different non-MSAA texture where to resolve to. See MSAA with Explicit Resolves.
not_textureWhen absent (default), the texture can be bound as a regular texture for sampling (e.g. with point/bilinear filtering). When present, you can't sample from this texture. This flag is useful if you intend to use it together with uav or explicit_resolve.

MSAA: Explicit vs Implicit resolves

Not long ago, MSAA support was automatic, and worked flawlessly with forward renderers and no postprocessing. Direct3D 9 and OpenGL were not able to access the individual MSAA subsamples from shaders at all.

Fast forward to the present, MSAA resolving should be performed after HDR to avoid halos around edges, and deferred shading can't resolve the G-Buffer otherwise aliasing only gets worse.

Direct3D10 and GL 3.2 introduced the ability of access the MSAA subsamples from within a shader, also giving the ability to write custom resolves.

For those unaware what "resolving MSAA" means; a very brief explanation is that when rendering using 4xMSAA, we're actually rendering to a RT that is twice the resolution.

"Resolving" is the act of scaling down the resolution into the real RT (i.e. think of Photoshop or Gimp's downscale filter modes). See the Resources section at the end for links to detailed explanations of how MSAA works.

To cleanly deal with this new feature without breaking compatibility with D3D9 & older GL render systems while at the same time being able to effortlessly switch MSAA on and off; the notion of "Explicit" and "Implicit" resolves were added.

Implicit resolves

By default all RTTs are implicitly resolved. The behavior of implicitly resolved textures mimics Ogre 1.x (except for implementation and design issues in Ogre 1.x that could cause an RTT to resolve multiple times per frame unnecessarily).

When you render and store actions are set to store_or_resolve, we will automatically resolve to the internal texture.

Note
Cubemaps can't be implicitly resolved. They must use explicit resolves. See RTV Examples.

Explicit resolves

To perform explicit resolves you need to setup an RTV, see Advanced MSAA

Depth Textures

Since Ogre 2.1; depth textures are supported. It has been a feature missing from Ogre since a long time by now.

Depth textures are a bit particular because they may not "own" the depth buffer. They're just a null render target with a "view" on an already existing depth buffer. But... what does this mean?

Depth Buffers can be tricky. Suppose the following example:

compositor_node Example
{
texture myDepthTexture 512 512 PF_D32_FLOAT
texture finalSSAO 512 512 PF_R8G8B8
//Draw the depth
target myDepthTexture
{
pass clear {}
pass render_scene
{
}
}
target finalSSAO
{
pass clear {}
pass render_quad
{
material DepthAnalysis
input 0 myDepthTexture
}
}
out 0 finalSSAO
}

Which simply does "Render a depth only pass to myDepthTexture; and read the depth buffer contents with a render quad, and store the results in a coloured RTT called 'finalSSAO' ".

That one was easy. But what about this one?

compositor_node Example2
{
texture firstPass 512 512 PF_R8G8B8
texture finalColour 512 512 PF_R8G8B8
//Draw everything, colour and depth
target firstPass
{
pass clear {}
pass render_scene
{
}
}
target finalColour
{
pass clear {}
pass render_quad
{
material SSAO
input 0 ??? // Depth, firstPass' depth texture?
input 1 firstPass
}
}
out 0 finalColour
}

The first pass is a pass that includes both colour and depth. The second one, we want to just take the depth and colour buffers separately as input textures to the SSAO material pass.

But how do we take the depth buffer? For that, we need to do two steps:

  1. Request the original RTT that it wants to use a depth texture.
  2. Create a depth texture that will be a "view" to the depth buffer. Depth buffer sharing system should assign the same depth buffer to the RTT and the depth texture "view".

The solution is the following:

compositor_node Example2_fixed
{
//Instruct we want to use a depth texture (32-bit float). The "depth_texture" keyword is necessary.
//Specifying The depth format is optional and so is the depth pool. However recommended to specify
//them to avoid surprises.
texture firstPass 512 512 PF_R8G8B8 depth_format PF_D32_FLOAT depth_texture depth_pool 1
//Declare the depth texture view (which becomes so by using PF_D32_FLOAT as format).
//Settings MUST match (depth format, pools, resolution). Specifying the depth pool is necessary,
//otherwise the depth texture will get its own depth buffer, instead of becoming a view.
texture firstPassDepthTexture 512 512 PF_D32_FLOAT depth_pool 1
texture finalColour 512 512 PF_R8G8B8
//Draw everything, colour and depth
target firstPass
{
pass clear {}
pass render_scene
{
}
}
target finalColour
{
pass clear {}
pass render_quad
{
material SSAO
input 0 firstPassDepthTexture
input 1 firstPass
}
}
out 0 finalColour
}
Note
On a lot of Hardware, depth buffers are compressed (see Depth In Depth and ATI Radeon HD 2000 Programming Guide). Before AMD's GCN Taihiti hardware (AMD Radeon R9 280), depth buffers need to be decompressed when bound for sampling as a depth texture. Trying to use the depth texture as a depth buffer again without clearing it will degrade due to the lack of compression.

It is suggested to copy the depth texture to another depth texture if you desire to use one for sampling and another to keep rendering, in order to maximize performance.

The specifics of depth (de)compression for NVIDIA and Intel aren't known but it's probable they're bound to similar issues.

TODO: Compositor interface to copy depth texture to another depth texture automatically.

Shadow Nodes

The only way to have shadows in Ogre is through shadow nodes.

Stencil shadows and "textured shadows" have been removed from Ogre 2.0; only depth shadow maps are supported.

A shadow node is a special type of Node (in fact, the class inherits from CompositorNode) that is executed inside a regular node (normally, a render_scene pass) instead of being connected to other nodes.

It is possible however, to connect the output from a Shadow Node to a regular Node for further postprocessing (i.e. reflective shadow maps for real time Global Illumination), but Shadow Nodes cannot have input. This particular feature (output to regular nodes) is still a work in progress at the time of writing since ensuring the regular node is executed after the shadow node has been executed can be a bit tricky.

Setting up shadow nodes

Shadow nodes work very similar to regular nodes. Perhaps their most noticeable difference is how are RTTs defined. The following keywords are supposed at shadow node scope:

technique

Specifies which shadow technique to use for the subsequent shadow map declarations. The default is uniform.

Note
planeoptimal has also not been implemented yet.
Format:
technique <uniform|planeoptimal|focused|pssm>

num_splits

Only used by PSSM techniques. Specifies the number of splits per light. Can vary per shadow map. The number of splits must be greater than 2. Default is 3.

Format:
num_splits <num_splits>

num_stable_splits

PSSM tends to be very unstable to camera rotation changes. Just rotate the camera around without changing its position and the shadow mapping artifacts keep flickering.

Ogre::PSSMShadowCameraSetup::setNumStableSplits allows you to fix that problem by switching to ConcentricShadowCamera for the first N splits you specify while the remaining splits will use FocusedShadowCameraSetup.

We achieve rotation stability by sacrificing overall quality. Using ConcentricShadowCamera on higher splits means exponentially sacrificing a lot more quality (and even performance); thus the recommended values are num_stable_splits = 1 or num_stable_splits = 2.

The default is num_stable_splits = 0 which disables the feature.

Format:
num_stable_splits <num_stable_splits>

normal_offset_bias

Normal-offset bias is per cascade / shadow map to fight shadow acne and self shadowing artifacts. Very large values can cause misalignments between the objects and their shadows (if they're touching).

Default is 168.0.

Format:
normal_offset_bias <value>

constant_bias_scale

Constant bias is normally per material (tweak HlmsDatablock::mShadowConstantBias). This value lets you multiply it 'mShadowConstantBias * constantBiasScale' per cascade / shadow map

Large values cause peter-panning.

Default is 1.0.

constant_bias_scale <value>

pssm_lambda

Used only by PSSM techniques. Value should be in range [0; 1]. The default is 0.95. PSSM's lambda is a weight value for a linear interpolation between exponential and linear separation between each split. A higher lambda will use exponential distribution, thus closer shadows will improve quality. A lower lambda will use a linear distribution, pushing the splits further, improving the quality of shadows in the distance.

pssm_lambda <value>
Lambda Close Shadows Far Shadows
0.0 Low Quality High Quality
1.0 High Quality Low Quality

pssm_split_blend

Used only by PSSM techniques. Value in range [0; 1]. The default is 0.125; use 0 to disable it. PSSM's blend defines, in the closest N-1 splits, the blend band size. E.g., a value of 0.1 means that the farthest 10% of the first split is blended with the second split (and so on for the other splits). A higher blend reduces visible seams between splits at a cost of a slightly less defined shadow. See Blend between Cascades for additional info.

pssm_split_blend <value>

pssm_split_fade

  • pssm_split_fade <fade>

Used only by PSSM techniques. Value in range [0; 1]. The default is 0.313; use 0 to disable it. PSSM's fade defines how much of the last split will fade out. E.g., a value of 0.1 means that the farthest 10% of the last split will fade out. A higher fade makes the transition from shadowed to non shadowed areas (and viceversa) smoother at a cost of a less visible distant shadow.

pssm_split_fade <value>

shadow_map

shadow_map <number> <texture_name> [uv <left> <top> <width> <height>] light <light_index> [split <index>]
Parameters
numberThe number of the shadow map being defined.
texture_nameWhat texture to use which must have already been declared, where the shadow map contents will be stored.
uvOptional. After this keyword, 4 more numbers must follow (left, top, width, height).
An atlas allows you to use a region of a texture. Instead of using the whole atlas content, you can use a region of it. This allows you to have multiple shadow maps in the same texture.
left
topSpecifies the left & top origin of the shadow map inside the atlas. In UV space. in range [0; 1].
width
heightSpecifies the width & height of the shadow map inside the atlas. In UV space. in range [0; 1].
lightAfter light must follow the number for the light. Indicates which light index will be associated with this shadow map. This is useful for PSSM because multiple shadow maps may refer to the same light.
splitOptional. After split must follow the split index. Used by PSSM, indicates which split does this shadow map handle.

Example

Declaring a shadow map is not enough. You need to tell OgreNext what do you want to render to it and how. And for that you need render_scene passes.

Shadow nodes can be written with the regular target { pass render_scene {} } syntax. However when you have 6 shadow maps with the same exact pass settings, it's cumbersome to write the pass six times. Instead the shadow_map keyword repeats the passes for you.

The following is a basic script that will set a single shadow map with a focused setup:

compositor_node_shadow myShadowNode
{
technique focused
texture focusedTex 2048 2048 PFG_D32_FLOAT
shadow_map 0 focusedTex light 0
// Render shadow map "0"
shadow_map_target_type directional spot
{
shadow_map 0
{
pass render_scene
{
load
{
all clear
}
rq_first 0
rq_last max
}
}
}
}
@ PFG_D32_FLOAT
Definition: OgrePixelFormatGpu.h:89

The typical setup is to have one directional light for the sun. And then multiple point or spot lights.

This means directional light should use a PSSM setting for best quality while point & spot lights shadow maps could use focused or uniform.

The following script creates 3 shadow maps for 3 PSSM splits, and 3 additional ones for the remaining lights (which can be either directional or spot):

compositor_node_shadow myShadowNode
{
technique pssm
texture tex_for_pssm0 2048 2048 PFG_D32_FLOAT
texture tex_for_pssm1 1024 1024 PFG_D32_FLOAT
texture tex_for_pssm2 1024 1024 PFG_D32_FLOAT
texture tex_for_spot0 2048 2048 PFG_D32_FLOAT
texture tex_for_spot1 2048 2048 PFG_D32_FLOAT
num_splits 3
pssm_lambda 0.95
// Store 1st closest light, splits 0 1 & 2
shadow_map 0 tex_for_pssm0 light 0 split 0
shadow_map 1 tex_for_pssm1 light 0 split 1
shadow_map 2 tex_for_pssm2 light 0 split 2
// Change to focused from now on
technique focused
shadow_map 3 tex_for_spot0 light 1
shadow_map 4 tex_for_spot1 light 2
shadow_map_target_type directional
{
// Render shadow maps 0, 1 and 2.
// Can only be used by directional lights.
shadow_map 0 1 2
{
pass render_scene
{
load
{
all clear
}
}
}
}
shadow_map_target_type directional spot
{
// Render shadow maps 3 and 4
// Can only be used by either directional lights or spot lights.
shadow_map 3 4
{
pass render_scene
{
load
{
all clear
}
}
}
}
}

Shadow map atlas & Point Lights

Instead of rendering each PSSM split into a different texture, you can use an atlas:

compositor_node_shadow PssmWithAtlas
{
technique pssm
texture atlas 3072 2048 PFG_D32_FLOAT
//The splits are distributed in the atlas like this:
// -------------
// | | 2 |
// | 1 |-----|
// | | 3 |
// -------------
num_splits 3
pssm_lambda 0.95
shadow_map 0 atlas uv 0.000000000000000 0.0 0.666666666666667 1.0 light 0 split 0
shadow_map 1 atlas uv 0.666666666666667 0.0 0.333333333333333 0.5 light 0 split 1
shadow_map 2 atlas uv 0.666666666666667 0.5 0.333333333333333 0.5 light 0 split 2
// Before doing anything, clear the whole atlas in one go. This is not
// ideal on iOS & Android though; but recommended on Desktop.
target atlas
{
pass clear
{
colour_value 1 1 1 1
}
}
shadow_map_target_type directional
{
shadow_map 0 1 2
{
pass render_scene
{
// The viewport settings will be automatically
// adjusted to constrain to the atlas regions.
}
}
}
}

Point light shadow mapping must exploit the powerful compositor scripting capabilities: OgreNext uses DPSM (Dual Paraboloid Shadow Maps).

Please note we will be rendering to cubemaps, then converting to DPSM.

We won't be rendering directly as DPSM because testing shows it deforms too much when tessellation is low. We could support it, but it's not a priority. Thus OgreNext first needs to render to a cubemap, which can be shared by all shadow maps, and then a converter transforms it to DPSM.

The reason to use scene -> Cubemap -> DPSM is so that we keep a reasonable memory footprint and be atlas-friendly. If we use cubemaps directly and want to support 8 point lights at 1024x1024, then we would need 1024x1024x6x8 = 192MB.

However with DPSM it would be 8 DPSM and 1 cubemap: 1024x1024x4x8 + 1024x1024x4x6 = 56MB.

The following example setups a script to support two point lights (and only two point lights):

abstract target cubemap_target_shadow
{
pass render_scene
{
load
{
all clear
clear_colour_reverse_depth_aware 1 1 1 1
}
store
{
//We only care about the contents of the colour target with point shadows
depth dont_care
stencil dont_care
}
camera_cubemap_reorient true
}
}
compositor_node_shadow PointLight
{
technique pssm
texture pointLightTex0 2048 2048 PFG_D32_FLOAT
texture pointLightTex1 2048 2048 PFG_D32_FLOAT
texture tmpCubemap 1024 1024 PFG_R32_FLOAT cubemap depth_format PFG_D32_FLOAT
technique focused
shadow_map 0 pointLightTex0 light 0
shadow_map 1 pointLightTex1 light 1
shadow_map_target_type point
{
//shadow_map_repeat tells to repeat what's inside its body for shadow map 0 & 1
shadow_map_repeat 0 1
{
// Render to the cubemap with the camera settings of
// the currently iterated point light shadow map
target tmpCubemap +X : cubemap_target_shadow {}
target tmpCubemap -X : cubemap_target_shadow {}
target tmpCubemap +Y : cubemap_target_shadow {}
target tmpCubemap -Y : cubemap_target_shadow {}
target tmpCubemap +Z : cubemap_target_shadow {}
target tmpCubemap -Z : cubemap_target_shadow {}
// Render to the current shadow map being iterated.
shadow_map
{
pass render_quad
{
// This material can be found in Samples/Media/2.0/materials/Common
material Ogre/DPSM/CubeToDpsm
input 0 tmpCubemap
}
}
}
}
}
@ PFG_R32_FLOAT
Definition: OgrePixelFormatGpu.h:90
bswapNN may be defined as macros in <sys/endian.h> or <sys/bswap.h>
Definition: OgreAndroidLogListener.h:35

See Samples/Media/2.0/scripts/Compositors/ShadowMapDebugging.compositor for an example of a full script that can support directional, spot & point lights all in one, in a single atlas.

Reuse, recalculate and first

Each PASS_SCENE from regular nodes have three settings:

  1. SHADOW_NODE_REUSE
  2. SHADOW_NODE_RECALCULATE
  3. SHADOW_NODE_FIRST_ONLY

This affect when shadow nodes are executed and how they cache their results. The default value is SHADOW_NODE_FIRST_ONLY; in which means Ogre should manage this automatically; however there are times when SHADOW_NODE_REUSE could be useful.

It's easier to explain what they do with examples.

Suppose the user has two render_scene passes, both have the same shadow node associated:

  1. One for opaque geometry.
  2. Another for transparent geometry,

If using SHADOW_NODE_FIRST_ONLY, when the first pass is executed (opaque geometry), Ogre will first execute the shadow nodes, updating the shadow maps; then render the opaque geometry.

When the second pass is executed (transparent geometry), the shadow node won't be executed as the shadow maps are supposed to be up to date; hence the transparent geometry will reuse the results.

Another example: Suppose the user has three passes:

  1. One for opaque geometry.
  2. Another for reflections, seen from a different camera.
  3. The last pass for transparent geometry, rendered using the same camera as opaque geometry.

If using SHADOW_NODE_FIRST_ONLY; the shadow node will be executed before the opaque geometry pass.

Then the reflections' pass comes. It uses a different camera, which means there could be a different set of lights that will be used for shadow casting (since some techniques set shadow cameras relative to the rendering camera for optimum quality, pssm splits become obsolete, some lights are closer to this camera than they were to the player's camera, etc). Ogre has no choice but to recalculate and execute the shadow node again, updating the shadow maps.

When the third pass kicks in, the camera has changed again; thus we need to execute the shadow node... again!

Ogre will log a warning when it detects a suboptimal compositor setup such as this one. To be more specific, Ogre detects that the 3rd pass uses the same results as the 1st pass, but the shadow node is being forced to recalculate in the third one, instead of reusing.

There are several ways to solve this problem:

  1. Render reflections first: This is perhaps the most obvious one. If there are no data dependencies; first perform the reflection pass, and then the opaque & transparent passes; so the shadow node is executed twice instead of three times.
  2. Use two shadow nodes: When the first option isn't viable (i.e. there's a data dependency) using two shadow nodes will guarantee the results don't get overwritten. This option needs more VRAM though.
  3. Use SHADOW_NODE_REUSE in the reflection render_scene pass: This will force Ogre not to execute the shadow node. This assumes you know what you're doing or else you may experience glitches (i.e. pssm splits aren't fully usable from a camera with a different position). This is useful though, if you wish to maintain consistency in the light list being used (since recalculation may cause a different set of lights to be used for shadow maps, since it depends on proximity to the active camera). Another reason to force reusage could be performance: The shadow node is only being executed once.

The setting SHADOW_NODE_RECALCULATE forces Ogre to always recalculate. Ogre will not issue a warning if it detects your node setup is suboptimal because of passes using SHADOW_NODE_RECALCULATE.

Forcing recalculation only makes sense when the application makes relevant changes to the camera between passes that Ogre cannot detect (i.e. change the position or the orientation through listeners)

Shadow mapping setup types

Ogre supports 5 depth shadow mapping techniques. Although they're as old as Ogre 1.4 or older, they've never been mentioned in the manual, and the doxygen documentation is quite cryptic, assuming the reader is quite familiar with the original papers. Here each is technique explained.

Uniform shadow mapping

The oldest form of shadow mapping, and the most simple one. It's very basic and thus probably glitch-free. However it's quality is very bad, even on high resolutions.

The user needs to call SceneManager::setShadowDirectionalLightExtrusionDistance & SceneManager::getShadowFarDistance let Ogre know how far directional lights should be from camera (since theoretically they're infinitely distant). If the value is too low, some casters won't be included and thus won't cast a shadow. Too high and the quality will quickly degrade.

Most likely only useful for testing that shaders are working correctly, and shadows not showing up correctly is not an Ogre bug or the scene (i.e. casters with infinite aabbs can cause trouble for Focused techniques).

Focused

An improved form over uniform shadow mapping. The technique uses the AABB enclosing all casters, an AABB enclosing all receivers visible by the current camera and the camera's frustum to build a hull (which is the intersection of all three, also known as the "intersection body B"). With this hull's information, Focused shadow mapping is able to deduce the optimal extrusion distance (no need to set it like in uniform shadow mapping), and create a much tighter near and far plane, resulting in much superior quality.

SceneManager::getShadowFarDistance is still used, and it can cause major quality improvements, because the camera's frustum used to build the hull is first clipped at the shadow far distance (instead of using the camera's far plane)

Most of the time, this is one of the best choices for general shadow mapping.

PSSM / CSM

PSSM stands for Parallel Split Shadow Mapping aka. Cascaded Shadow Maps.

Shadow maps are divided into "cascades" or "splits"; in order to improve quality. So instead of getting one RTT per light, the user gets multiple RTTs per light. Usually the depth in camera space is determining factor to know which cascade/split to use.

There's a lot of resources on internet regarding PSSM / CSM:

The original technique was introduced by Fan Zhang, Hanqiu Sun, Leilei Xu & Lee Kit Lun

Plane Optimal

TBD

Workspaces

Nodes are useless without setting up a workspace.

A workspace defines what nodes are going to be used and how they're going to be connected. They also need to declare global textures. Declaration order is very important.

Nodes are automatically in use when their connection is specified.

connect

connect <node_name_1> [<output_ch #>] [<output_ch #>] <node_name_2> [<input_ch #>] [<input_ch #>]

Connects the Node "node_name_1" output channels to "node_name_2" input channels. This implicitly means "node_name_1" & "node_name_2" will be used and executed by the workspace (even if they're isolated and never reach the screen)

Parameters
node_name_1The name of the Node that will be executed before "node_name_2"
output_chChannel numbers from node_name_1's output channels that will be connected to node_name_2.
node_name_2The name of the Node that will be executed after "node_name_1"
input_chChannel numbers from node_name_2's inputs channels that will be connected from node_name_2 bindings.

Examples:

//Connect nodeA to nodeB
//A's output channel 0 ==> B's input channel 1
//A's output channel 1 ==> B's input channel 2
//A's output channel 2 ==> B's input channel 0
connect nodeA 0 1 2 nodeB 1 2 0
//Connect nodeA to nodeB
//A's output channel 0 ==> B's input channel 0
//A's output channel 2 ==> B's input channel 1
connect nodeA 0 2 nodeB 0 1
//Connect nodeC to nodeB
//C's output channel 3 ==> B's input channel 1
connect nodeC 3 nodeB 1

Not all output channels must be used. Take in mind that if an output is not used at all, it will still take CPU & GPU processing time.

Note
All nodes must have their input channels connected. If a node has a disconnected input channel, the workspace will fail to initialize and throw a warning.
Note
Nodes that have no input channels will be the first to be executed, regardless of declaration order (but nodes without input channels declared first should run before nodes declared later with no input channels). Take this in mind if you plan to use global textures as a means of passing information (usually a very bad idea).

connect_external

connect_external <external_channel #> <node_name> <input_channel #>

Connects the final render target (i.e. the RenderWindow) to the specified input channel from the node. Implicitly the node will be used and executed. The render target in external channel #0 is always used as the reference for target_width, target_width_scaled and all other parameters that are based on an external RTT. You can use connect_external as many times as you want. The external render targets are passed in C++ code when initializing the Workspace.

It is possible for a Workspace to not use this variable (though rather pointless)

Parameters
external_channelThe index to the external texture or UAV buffer passed to addWorkspace.
node_nameThe name of the Node that will receive the external texture.
input_channelThe number of the input channel from node_name.

Example:

//Pass the external texture to nodeA through channel #0
connect_external 0 nodeA 0
//Pass the external texture to nodeB through channel #0
connect_external 0 nodeB 0
//Pass a second external texture to nodeB through channel #1
connect_external 1 nodeB 1
connect_output <node_name> <input_channel #>

It's the same as connect_external 0 <node_name> <input_channel>.

Provided for compatibility reasons and convenience. Originally only one connect_output was allowed, but now you can use it as many as you need.

alias

alias <node_name> <aliased_name>

Normally, a Node is always reused. So, if node A connects to B and C; and D connects to A; it's always the same node A the one we're talking about. The definition is only instantiated once.

However there may be cases where you want to have multiple instances of the same node definition (i.e. because you want unique local textures, or because you want to repeat a process on a different set of nodes), and hence that's what node aliasing does. Once an alias is declared, the node will be instantiated with a different name (its aliased name), and will be possible to make connections with it.

Parameters
node_nameThe name of the original instance
aliased_nameThe alias name to give to this separate instance. The alias must be unique across the workspace, and must also be unique across the names of original node definitions.

Example:

workspace MyWorkspace
{
alias nodeA UniqueNode1 //Instantiate nodeA, calling it UniqueNode1
connect nodeA 0 UniqueNode1 0
connect nodeA 0 nodeB 0
connect UniqueNode1 0 nodeB 1
}

buffer

buffer <buffer_name> <num_elements> <bytes_per_element> [target_width] [target_width_scaled] [target_height] [target_height_scaled]

Creates an UAV buffer.

Parameters
buffer_nameThe name of the buffer. Unlike textures, there are no naming restrictions (i.e. no global_ prefix). If a buffer local to the node and a global buffer have the same name, the local one takes precedence and a warning is logged.
num_elementsThe number of elements in the UAV. Must be a number higher than 0.
bytes_per_elementBytes per element. Must be a number higher than 0.
target_width
target_height
target_width_scaled
target_height_scaledSee texture they work like their texture counterparts, and when present, will be multiplied against the number of elements.

The size of the UAV buffer is calculated as follows:

finalNumElements = numElements * bytesPerElement;
if( widthFactor > 0 )
finalNumElements *= (widthFactor * width);
if( heightFactor > 0 )
finalNumElements *= (heightFactor * height);

For example if you want to do 512 x height; just set numElements to 512 and target_height or target_height_scaled 1.

Since there are no pixel formats, the bytesPerElement controls such such thing (eg. 4 bytes for RGBA8_UNORM).

UAV Buffers are not just for storing contiguous texture data. For example if you run a compute shader that gathers all lights, you would store the following in an UAV buffer:

struct Lights
{
float3 position;
float3 direction;
float3 diffuse;
};
RWStructuredBuffer<Lights> myLights;
@ Lights
Definition: OgreSceneFormatBase.h:60

In this case, numElements = 16 means we can address up to myLights[15]; and bytesPerElement = 36.

bytesPerElement must account padding according to the HLSL rules (4 x 4 x 3 =>; 4 floats x sizeof(float) x 3).

Because calculation of bytesPerElement can get really tricky by hand (or may change dynamically at runtime), complex cases are best if the UAV is created in C++, and passed to the Workspace via connect_buffer_external.

Why consider UAV buffers for texture operations?

Regular textures have an optimized layout for adapting to most rasterization cases (filtering, stretching). Often these layout is swizzled or tiled (i.e. storing data as RRRR GGGG BBBB AAAA, or storing pixels in morton order). Sometimes they may even be losslessly compressed by the GPU.

When you're working with compute shaders (e.g. for postprocessing) and don't need filtering, your access patterns will likely be flat, linear and contiguous, and thus get higher performance by using an UAV Buffer.

This is not a rule of thumb. You'll need to experiment with both UAV textures and UAV buffers in your compute shaders to see what gives you the best performance.

connect_buffer

connect_buffer <node_name_1> [<output_ch #>] [<output_ch #>] <node_name_2> [<input_ch #>] [<input_ch #>]

Exactly the same as connect, but it connects UAV buffers instead of textures.

Example:

//Connect nodeA to nodeB
//A's output channel 0 ==> B's input channel 1
//A's output channel 1 ==> B's input channel 2
//A's output channel 2 ==> B's input channel 0
connect_buffer nodeA 0 1 2 nodeB 1 2 0
//Connect nodeA to nodeB
//A's output channel 0 ==> B's input channel 0
//A's output channel 2 ==> B's input channel 1
connect_buffer nodeA 0 2 nodeB 0 1
//Connect nodeC to nodeB
//C's output channel 3 ==> B's input channel 1
connect_buffer nodeC 3 nodeB 1

connect_buffer_external

connect_buffer_external <external_channel #> <node_name> <input_channel #>

Connects multiple external UAV buffer. External UAV buffers are provided when instantiating the Workspace via addWorkspace in C++.

It is possible for a Workspace to not use this variable (though rather pointless)

Parameters
external_channelThe index to the external UAV buffer passed to addWorkspace.
node_nameThe name of the Node that will receive the external UAV
input_channelThe number of the input channel from node_name.

Example:

//Pass the external UAV to nodeA through channel #0
connect_buffer_external 0 nodeA 0
//Pass the external UAV to nodeB through channel #0
connect_buffer_external 0 nodeB 0
//Pass a second external UAV to nodeB through channel #1
connect_buffer_external 1 nodeB 1

Data dependencies between nodes and circular dependencies

The Compostor will solve data dependencies and reorder node execution as necessary. It will also detect some circular dependencies (i.e. node A connecting to A; A connecting to B and B connecting to A) report the error and refuse to initialize the workspace, but it may not detect more complex cases (i.e. node A connecting to B, B to C, C to D, D to B) and attempting execution could result in crashes or graphical glitches.

If you happen to encounter a circular dependency that is not reported by Ogre, we would be intereste in knowing more about it. You can submit your bug report to JIRA

Setting up code

Initializing the workspace

To create the workspace, just call the following function with the name of the workspace:

CompositorManager2 *compositorManager = mRoot->getCompositorManager2();
compositorManager->addWorkspace( mSceneMgr, mWindow, mCamera, "MyOwnWorkspace", true );

You can have more than one Workspace instance of the same Workspace definition. This is mostly useful if you're trying to render to two or more different RTs (i.e. two Render Windows, a RenderWindow and an offscreen RTT, etc) or if you want to use completely different SceneManagers.

Simple bootstrap for beginners

If you're a user that doesn't want to deal with compositor nodes, you're a beginner, or you're in a rush, there is an utility function that will help you set up a basic workspace and a compositor node to render the whole scene:

const IdString workspaceName( "MyOwnWorkspace" );
CompositorManager2 *compositorManager = mRoot->getCompositorManager2();
if( !compositorManager->hasWorkspaceDefinition( workspaceName ) )
compositorManager->createBasicWorkspaceDef( workspaceName, ColourValue( 0.6f, 0.0f, 0.6f ) );
compositorManager->addWorkspace( mSceneMgr, mWindow, mCamera, workspaceName, true );

The workspace created by the utility function is equivalent to the following compositor script:

compositor_node MyOwnWorkspace_Node
{
in 0 renderwindow
target renderwindow
{
pass render_scene
{
load
{
all clear
clear_colour 0.6 0 0.6 1
}
rq_first 0
rq_last max
}
}
}
workspace MyOwnWorkspace
{
connect_output MyOwnWorkspace_Node 0
}

Advanced C++ users

Advanced C++ users who want to deal with the CompositorManager2 directly, may find the information in this section useful.

The CompositorManager2 uses a C++ pattern where there is an object Definition and an instance. For example; there is a class called CompositorPassSceneDef and a class called CompositorPassScene. The former is the definition, while the latter is the instance.

All instances share the same definition and have only read-access to them. Modifying the shared definition while there are instances active is undefined and could happen anything ranging from what the user expected, to glitches, crashes, or memory leaks. Only by analyzing the code it is possible to determine which changes are likely to be "safe" (like changing the visibility mask) and which ones require the instance to be destroyed and recreated.

The syntax of the compositor scripts translate almost 1:1 to definitions, rather than instances. Probably the most notable difference is that NodeDefs contain CompositorTargetDef, and these contain CompositorPassDef; while the instances, Targets and Passes are joined together, thus Nodes contain CompositorPasses directly.

Because the CompositorManager2 is still very new, we admit real time changes to nodes (especially channel connections) can be a bit troublesome to deal with unless destroying everything and recreating it, which could be suboptimal for live editing nodes.

We would love to hear your developer feedback on the forums regarding live editing the nodes and further improve the Compositor.

Stereo and Split-Screen Rendering

Rendering in Stereo ala Occulus Rift™ (or splitting the screen in multiple sections for multiplayer) has been made easier with the introduction of execution and viewport masks.

Normally, to render to the left side of the screen and then to the right side; you would need to create a clear pass to clear the whole render target, and two passes render_scene passes with different viewport settings, one for each eye.

With execution and viewport modifier masks, you no longer need to duplicate the number of passes per region of screen you want to draw to. You will have to create a workspace for each region though (i.e. one workspace per eye).

Per-Workspace offset and scale

Each workspace contains an offset and scale to be applied to each pass; passed as a Vector4 to CompositorManager2::addWorkspace. The XY components contain the offset, the ZW contain the scale.

On each pass, its final viewport is calculated this way:

Real left = mDefinition->mVpLeft + vpModifier.x;
Real top = mDefinition->mVpTop + vpModifier.y;
Real width = mDefinition->mVpWidth * vpModifier.z;
Real height = mDefinition->mVpHeight * vpModifier.w;
float Real
Software floating point type.
Definition: OgrePrerequisites.h:109

This means that to render to the left eye, you would need to specify Vector4( 0.0f, 0.0f, 0.5f, 1.0f ) and to render to the right eye you would specify Vector4( 0.5f, 0.0f, 0.5f, 1.0f ).

Viewport modifier mask

You don't want the modifier to affect all passes. The viewport modifer mask is a per-pass 8-bit value that is AND'ed with the workspace's mask. If the result is non-zero, the offset and scale is applied.

For example, you can apply postprocessing passes to entire screen instead of just a single eye.

The most common use for this mask is clearing: The GPU prefers that you clear the entire buffer in one go, rather than two partial clears. Therefore you can use the mask to prevent the clear's viewport from being affected, and end up affecting the whole screen.

There's still a problem though: You have two workspaces (one per eye). The first workspace will work as intended. However the workspace will execute the clear again, and remove the contents drawn to the left eye. The Execution Mask solves this problem.

Execution mask

The execution mask is per-pass 8-bit value that is AND'ed with the workspace's execution mask. When zero, the pass is skipped, when non-zero, the pass is executed.

Continuing the example from the previous section, you can use an execution mask to cause the clear to only be executed when rendering the first left eye; and the clear pass will not be performed when rendering the right eye.

As another example, you could use two render_pass to perform Anaglyph 3D, i.e. red tint on the left eye, cyan tint on the right eye. You would set the viewport modifier mask to 0 so that it's not be affected by the workspace's offset and scale; but set the execution masks so that the red tint pass only gets executed for the left eye's workspace, and the cyan pass only gets executed for the right eye's workspace.

Default values

By default execution and viewport masks default to 0xFF except for Clear passes where the defaults are:

//Override so that it only gets executed on the first execution on the
//whole screen (i.e. clear the whole viewport during the left eye pass)
mExecutionMask = 0x01;
mViewportModifierMask = 0x00;

This assumes that your first workspace (i.e. the left eye / first player in split screen) sets the execution mask to 1; and the other workspaces have the first bit unset for that mask.

Using the defaults, the following example splits the screen in 4 for multiplayer (i.e. Mario Kart™ games and similar) or also useful for editing in 4-POV modelling application; and the clear passes will apply to the whole screen with the first workspace:

for( int i=0; i<4; ++i )
{
Vector4 vpModifier( (i % 2) * 0.5f, (i >> 1) * 0.5f, 0.25f, 0.25f );
m_workspace[i] = mgr->addWorkspace( sceneManager, renderTarget,
playerCam[i], "MainWorkspace", true,
-1, vpModifier,
(1 << i), (1 << i) );
}

Advanced MSAA

What is MSAA?

MSAA (Multisample Antialiasing) is a very common antialiasing technique.

It is quite common to treat MSAA like black magic: it works automatically and makes those staircase effect aka jagged edges aka aliasing disappear.

However there are times where we need to understand how it works, what's going on and how to control it explicitly via Ogre.

Supersampling Antialiasing (SSAA) vs MSAA

It is best to explain what MSAA is by first explaining Supersampling Antialiasing (SSAA).

If we have to render 1920x1080 4xSSAA, then we would just render at 3840x2160 and downscale back to 1920x1080 (either with a basic bilinear filter or something more advanced e.g. bicubic, gaussian, etc).

Huh? What's next, you ask? That's it!

Note
1920x2 = 3840 and 1080x2 = 2160, twice the width and twice the height is 4x the area. Hence 4xSSAA.

Thus SSAA is not a complex algorithm: it's just rendering at higher resolution and then scaling down to blur each of the 4 pixels into 1, producing soft edges.

This operation of scaling down is known as Resolve

The problem with SSAA: it consumes a lot of bandwidth and processing power. 4x of everything to be exact (for 4xSSAA).

That's where MSAA comes in. MSAA in is basic form looks like SSAA: the GPU needs to allocate a 3840x2160 colour target and a 3840x2160 depth buffer. Thus it stll consumes 4x more memory.

MSAA approach to the problem

MSAA exploits a simple fact: only the triangle borders need antialiasing.

Thus for all pixels except the ones touching the border of a triangle, the GPU will only run the pixel shader once and broadcast the colour to all 4 pixels just as if it were rendering at 1920x1080. This saves 4x of colour bandwidth and 4x processing power, making it very efficient.

Note
Internally the GPU may have a hidden 2-bit mask. Thus if a triangle covers all 4 subpixels, it runs the pixel shader once and and sets the mask to 0x3 to indicate all 4 subpixels share the same value.
The specifics of how these hidden bit masks work depend on vendor and GPU model and are implementation details.

The depth however is still populated at 3840x2160.

Ideally it is only at the polygon edges that the pixel shader may run up to 4 times.

The major drawbacks from MSAA are two:

  • Performance is highly dependent on the geometry involved. A scene with lots of sharp & spiky triangles will force the GPU to run the pixel shader up to 4 times very often (grass blades often trigger this worst case scenario). Thus MSAA effectively becomes SSAA. While a scene with smoothly-connected triangles (i.e. barely any edges) will run very fast.
  • Only geometric aliasing (e.g. polygon edges) is fixed. There are other sources of aliasing (texture, shading) which are not considered. Texture aliasing is often fixed with mipmapping though. While fixing shading aliasing is still a hot topic.

The specific of how the GPU keeps the MSAA contents in memory are vendor and device-specific. However they're often not sampling-friendly (poor cache behavior, no bilinear filtering available) therefore we often resolve the contents before working on its data.

But there are exceptions where we need access to the MSAA subsamples before resolve:

  1. HDR tonemapping should happen before resolving, otherwise aliasing effects won't go away. That means an HDR tonemap needs direct access to the MSAA contents. See our HDR sample which deals with this issue.
  2. Resolving a depth buffer makes no sense. Averaging depth is meaningless. Depth is often required by postprocessing effects such as Screen Space Reflections, Depth of Field, SSAO. In order to do this, either the 1920x1080 4xMSAA depth buffer gets copied to a regular 3840x2160 texture (to make it cache- and sampling-friendly) or we pretend MSAA did not happen and just take of the values, or the minimum or maximum of each. There is no right answer and it is a very similar problem mixed resolution particle rendering has.

Resources

Ogre + MSAA with Implicit Resolves

Ogre by default uses implicit resolves. When you call:

texture = textureGpuManager->createTexture(
"MyRtt",
texture->setResolution( 1920, 1080 );
texture->setPixelFormat( PFG_RGBA8_UNORM_SRGB );
texture->setNumMipmaps( 1u );
texture->setSampleDescription( SampleDescription( 4u ) ); // 4x MSAA
texture->scheduleTransitionTo( GpuResidency::Resident );
@ Discard
When the resource is no longer resident, we just throw the GPU data.
Definition: OgreGpuResource.h:89
@ Resident
VRAM and other GPU resources have been allocated for this resource.
Definition: OgreGpuResource.h:66
@ RenderToTexture
Texture can be used as an RTT (FBO in GL terms)
Definition: OgreTextureGpu.h:88
@ Type2D
Definition: OgreTextureGpu.h:57

Ogre creates two textures:

  1. The MSAA surface. Generally you don't have direct access to it. This one occupies 1920x1080x4x4 = 31.64MB
  2. The implicitly resolved texture which can be used for sampling. This one occupies 1920x1080x4 = 7.91MB

StoreActions control when the texture is resolved.

Implicitly resolved textures is how Ogre traditionally worked before Ogre 2.2 Ogre 2.1 tried to implement this but it was very basic and often broken.

Ogre + MSAA with Explicit Resolves

You need to explicitly ask for explicit resolves. When you call:

texture = textureGpuManager->createTexture(
"MyRtt",
texture->setResolution( 1920, 1080 );
texture->setPixelFormat( PFG_RGBA8_UNORM_SRGB );
texture->setNumMipmaps( 1u );
texture->setSampleDescription( SampleDescription( 4u ) ); // 4x MSAA
texture->scheduleTransitionTo( GpuResidency::Resident );
@ MsaaExplicitResolve
MSAA rendering is an antialiasing technique.
Definition: OgreTextureGpu.h:113

OgreNext creates only one texture:

  1. The MSAA surface. This one occupies 1920x1080x4x4 = 31.64MB

Therefore:

  1. Binding this texture to a shader means the shader must access it via Texture2DMS (HLSL), sampler2DMS (GLSL), and texture2d_ms (Metal).
  2. Resolving must be done manually (assuming you want to resolve at all). This means setting up the RTV on the compositor manually.

In compositor scripts one would have to set the rtv like this:

compositor_node MyExplicitMsaaNode
{
// This is the explicit MSAA surface
texture myMsaaTex target_width target_height PFG_RGBA8_UNORM_SRGB msaa_auto explicit_resolve
// This is where myMsaaTex will be resolved to. It's just a regular texture
texture myResolvedResult target_width target_height PFG_RGBA8_UNORM_SRGB
// Create a custom RenderTargetView.
// Normally Ogre automatically generates one with the same name as the texture
// (We could also modify that auto-generated one by specifying 'rtv myMsaaTex')
// but we create another one to emphasize it's custom-made
rtv myCustomRtv
{
// Specify we want to render to myMsaaTex at slot[0] but we want to resolve to myResolvedResult
colour 0 myMsaaTex resolve myResolvedResult
}
target myCustomRtv
{
// Render...
}
}

Now you just rendered to myMsaaTex, resolved into myResolvedResult and can have direct access to MSAA samples (by binding myMsaaTex to a material).

See Samples/Media/2.0/scripts/Compositors/ScreenSpaceReflections.compositor for a specific example

RTV (RenderTargetView)

So far all examples tried to ignore RTVs.

The Compositor creates an rtv automatically with the same name for locally created textures and input textures.

However there are cases when you need to know them.

What is an RTV

When you render to a Texture, the GPU actually needs to setup a lot of things:

  • If it has multiple mipmaps, which mipmap.
  • If it is a Cubemap, a 3D or a 2D Array texture, which slice.
  • If it's MSAA, where to resolve the MSAA contents into (unless we don't want to resolve).
    • What MSAA mipmap to resolve into.
    • What MSAA slice to resolve into.
  • What depth buffer to use.
  • What stencil buffer to use.
  • If using MRT (Multiple Render Targets), you want to setup more than one colour texture at the same time.

An RTV contains the definition of all that.

RTV settings

colour

Format:
colour <texture_name>
colour <mrt_idx> <texture_name> [resolve <resolve_dst_texture_name>] [mip <#>] [resolve_mip <#>] [slice <#>] [resolve_slice <#>] [all_layers [true|false]]
Parameters
mrt_index0-based slot of the MRT (Multiple Render Target). Must be in range [0; Ogre::RenderSystemCapabilities::getNumMultiRenderTargets).
texture_nameName of the colour texture to bind at this slot.
mip
resolve_mipOptional. Mip level to render or resolve to. This keyword must be followed by a number.
slice
resolve_sliceOptional. Slice index to render or resolve to. This keyword must be followed by a number.
all_layersOptional. This keyword must be followed by a boolean. When true slice will be ignored and all slices will be attached instead. The shaders are expected to use gl_ViewportIndex or equivalent to dispatch geometry to the right slice.
See Ogre::RenderPassColourTarget::allLayers.

As a shortcut, if the user only types:

colour texture_name

it is the same as:

colour 0 texture_name resolve texture_name mip 0 resolve_mip 0 slice 0 resolve_slice 0 all_layers false
Note
texture_name MUST have a pixel format that is colour.

depth

Explicitly assigns a depth buffer to this RTV.

Format:
depth <texture_name> [mip <#>] [slice <#>]
Parameters
texture_nameName of the depth texture to bind at this slot.
mipOptional. Mip level to render to. This keyword must be followed by a number.
sliceOptional. Slice index to render to. This keyword must be followed by a number.
Note
Most APIs do not allow to create a 2D Array depth buffer, thus the slice parameter is always 0.
Note
texture_name MUST have a pixel format that is depth.

stencil

Explicitly assigns a stencil buffer to this RTV.

Format:
stencil <texture_name> [mip <#>] [slice <#>]
Parameters
texture_nameName of the stencil texture to bind at this slot.
mipOptional. Mip level to render to. This keyword must be followed by a number.
sliceOptional. Slice index to render to. This keyword must be followed by a number.
Note
Most APIs do not allow to create a 2D Array stencil buffer, thus the slice parameter is always 0.
Note
texture_name MUST have a pixel format that is stencil or depth_stencil.
Note
Only iOS supports setting a stencil texture different from a depth texture. For maximum compatibility always use depth and depth_stencil.

depth_stencil

This is the same as calling depth & stencil with the same parameters. e.g.

depth myDepthStencilTexture
stencil myDepthStencilTexture
// is equivalent to:
depth_stencil myDepthStencilTexture

depth_pool

If depth & stencil are left blank, this setting uses pool ID so that a generic depth buffer from the same pool ID that matches the colour resolution is generated or reused instead.

Set to 0 to disable.

Format:
depth_pool <pool_id>

depth_texture

If depth & stencil are left blank and depth_pool isn't 0, this setting uses indicates the texture from the pool must allow sampling.

This setting is discouraged and deprecated.

Format:
depth_pool <true|false>

depth_format

If depth & stencil are left blank and depth_pool isn't 0, this setting uses indicates the pixel format to prefer.

Format:
depth_format <pixel_format>

depth_read_only

It is invalid to bind a depth buffer as RTV and sample from it at the same time. Doing so is called Framebuffer Feeback and results in Undefined Behavior.

However there are exceptions. If the depth buffer is bound as read_only (which means depth writes are ignored even if Ogre::HlmsMacroblock::mDepthWrite is true) it is possible to use it for both rendering and sampling at the same time.

This can be useful for certain effects such as refractions.

Format:
depth_read_only <true|false>

stencil_read_only

See depth_read_only.

Format:
stencil_read_only <true|false>

RTV Examples

You can create new RTVs or modify autogenerated ones. The only exception is that you can't modify autogenerated ones from input channels, because input channels are evaluted too late during execution.

The following modifies an autogenerated RTV to use a specific depth buffer:

compositor_node RefractionsRenderingNode
{
texture depthBuffer target_width target_height PFG_D32_FLOAT msaa_auto
texture rtt01 target_width target_height target_format msaa_auto
// An RTV named "rtt01" was autogenerated that uses the texture "rtt01"
// and a depth buffer from its default pool_id.
//
// However we will now modify it so it uses a depth buffer of our choosing.
rtv rtt01
{
depth_stencil depthBuffer
}
target rtt01
{
pass render_scene
{
// ...
}
}
}

We can also generate a new RTV with a new name. For example here's an MRT example:

compositor_node ScreenSpaceReflectionsRenderingNode
{
texture gBufferNormals target_width target_height PFG_R10G10B10A2_UNORM msaa_auto explicit_resolve
texture gBufferShadowRoughness target_width target_height PFG_RG16_UNORM msaa_auto explicit_resolve
texture gBufferDepthBuffer target_width target_height PFG_D32_FLOAT msaa_auto
rtv mrtGBuffer
{
colour gBufferNormals gBufferShadowRoughness
depth gBufferDepthBuffer
}
target mrtGBuffer
{
pass render_scene
{
// ...
}
}
}
@ PFG_R10G10B10A2_UNORM
Definition: OgrePixelFormatGpu.h:73
@ PFG_RG16_UNORM
Definition: OgrePixelFormatGpu.h:84

Cubemaps can't be implicitly resolved. In fact cubemap MSAA textures don't exist. The API will refuse.

Here's an example where we create a single 2D MSAA surface to render MSAA, and a cubemap where we resolve to:

abstract target cubemap_target
{
pass render_scene
{
store
{
// This is the default. But change it to store_and_resolve if you wish
// to preserve the MSAA contents of all 6 faces.
colour store_or_resolve
}
// ...
}
}
compositor_node CubemapRendererNodeMsaa
{
in 0 cubemap
// Note: IF you wish to preserve the MSAA contents of all 6 faces, then use
// 'texture tmpMsaa target_width target_height 6 target_format 2d_array msaa 4 explicit_resolve'
// instead. Don't forget to use store_and_resolve (the default is store_or_resolve)
texture tmpMsaa target_width target_height target_format msaa 4 explicit_resolve
texture myCubemap target_width target_height target_format cubemap mipmaps 0
rtv mixedCubemapRtv
{
// Specify we want to render to tmpMsaa at slot[0] but we want to resolve to tmpCubemap
colour 0 tmpMsaa resolve myCubemap
}
target mixedCubemapRtv +X : cubemap_target {}
target mixedCubemapRtv -X : cubemap_target {}
target mixedCubemapRtv +Y : cubemap_target {}
target mixedCubemapRtv -Y : cubemap_target {}
target mixedCubemapRtv +Z : cubemap_target {}
target mixedCubemapRtv -Z : cubemap_target {}
}