Fast Fur Render Pipeline options: Difference between revisions

From Warren's Fast Fur Shader
Jump to navigation Jump to search
Line 10: Line 10:


Its lower performance is because it relies on Geometry shaders to discard all non-visible triangles. Geometry shaders are unfortunately relatively slow to load into GPU memory, and they must reserve enough memory in advance for all possible output triangles, even if they later discard them. Any triangle that is too far away, or is backwards-facing, or is outside the visible screen area will still take time to be discarded by the Geometry shader.
Its lower performance is because it relies on Geometry shaders to discard all non-visible triangles. Geometry shaders are unfortunately relatively slow to load into GPU memory, and they must reserve enough memory in advance for all possible output triangles, even if they later discard them. Any triangle that is too far away, or is backwards-facing, or is outside the visible screen area will still take time to be discarded by the Geometry shader.
Because even discarding triangles still takes some time, selecting a higher maximum layer limit than 16 comes with a performace hit, even when less layers are being rendered.


'''''The Fallback Pipeline is not recommended.''''' It is still faster than other fur shaders that use the same approach, such as XSFur, but the only reason it is included is for compatibility reasons. This is because it doesn't use Hull + Domain shaders, which may not be supported with some games.
'''''The Fallback Pipeline is not recommended.''''' It is still faster than other fur shaders that use the same approach, such as XSFur, but the only reason it is included is for compatibility reasons. This is because it doesn't use Hull + Domain shaders, which may not be supported with some games.
Line 15: Line 17:
== Turbo Pipeline (Fast) ==
== Turbo Pipeline (Fast) ==
[[File:Turbo Pipeline.jpg|none|thumb|The Turbo pipeline uses fast Hull + Domain shaders to discard most non-visible fur.]]
[[File:Turbo Pipeline.jpg|none|thumb|The Turbo pipeline uses fast Hull + Domain shaders to discard most non-visible fur.]]
Depending on view range, the Turbo Pipeline is typically 25% to 100% faster than the Fallback Pipeline, despite also having much higher quality. It can render up to 32 layers of fur.
Depending on view range, the Turbo Pipeline is typically 25% to 100% faster than the Fallback Pipeline, despite also having much higher quality. It can render up to 32 layers of fur. Selecting a higher maximum layer limit comes with a performace hit, even when less layers are being rendered.


This speed comes from its use of Hull + Domain shaders to discard any fully non-visible triangles before the Geometry shader stages. This prevents the relatively slow Geometry shaders from loading into GPU memory if they aren't needed.
This speed comes from its use of Hull + Domain shaders to discard any fully non-visible triangles before the Geometry shader stages. This prevents the relatively slow Geometry shaders from loading into GPU memory if they aren't needed.


Hull + Domain shaders are typically used for tesselation, and are thus built for speed. ''The Turbo Pipeline does not use them for tesselation, though.'' Instead, it uses the Hull + Domain shaders as a very fast all-or-nothing kill-switch. When it wants to discard an entire triangle, the Hull shader specifies a multiplier of 0, which discards it. Otherwise it specifies a multiplier of 1 and the triangle is then passed completely as-is by the Domain shader to the Geometry shader. The Geometry shader is then resposible for making copies of the triangle for each visible fur layer, and discarding the rest.
Hull + Domain shaders are typically used for tesselation, and are thus built for speed. ''The Turbo Pipeline does not use them for tesselation, though.'' Instead, it uses the Hull + Domain shaders as a very fast all-or-nothing kill-switch. When it wants to discard an entire triangle, the Hull shader specifies a multiplier of 0, which discards it. Otherwise it specifies a multiplier of 1 and the triangle is then passed completely as-is by the Domain shader to the Geometry shader. The Geometry shader is then resposible for making copies of the triangle for each visible fur layer, and discarding the rest.
Because even discarding triangles still takes some time, selecting a higher maximum layer limit than 16 comes with a performace hit, even when less layers are being rendered. However, this hit is not as severe as with the Fallback Pipeline.


The Turbo Pipeline is the recommended pipeline.
The Turbo Pipeline is the recommended pipeline.


== Experimental Beta Version Only: Super Pipeline (Fastest, but not usable on some AMD GPUs) ==
== Super Pipeline (Fastest, capable of extremely high resolution, but not usable on some AMD GPUs) ==
[[File:Super Pipeline.jpg|none|thumb|The Super Pipeline uses fast Hull + Domain shaders to make the fur layers.]]
[[File:Super Pipeline.jpg|none|thumb|The Super Pipeline uses fast Hull + Domain shaders to make the fur layers.]]
The Super Pipeline is typically 30% faster than the Turbo Pipeline, but it can also produce pixel-perfect screenshots with no visible gaps in the hairs. It can render up to 264 layers of fur.
'''The Super Pipeline is NOT available for public use, due to AMD crashing issues.'''
 
The Super Pipeline is typically 30% faster than the Turbo Pipeline when rendering at the same quality (ie. up to 32 layers), but it can also produce pixel-perfect screenshots with no visible gaps in the hairs. It can render up to 264 layers of fur.
 
The Super Pipeline's speed and resolution comes from using the Hull + Domain shaders to make only as many copies of each triangle as-needed. This means that un-needed triangles do not need to be discarded, because they are never created in the first place.


Its speed and resolution comes from using the Hull + Domain shaders to make only as many copies of each triangle as-needed. This means that the vast majority of un-needed triangles do not need to be discarded because they are never created in the first place.
Since the Super Pipeline doesn't need to discard triangles, there is no performance hit for having a higher layer limit. The Super Pipeline will, however, limit its maximum layers dynamically, according to the quality settings.


Offloading most of the copying workload from the much slower Geometry shaders gives the Super Pipeline a big performance boost. Unfortunately, it also runs afoul of AMD driver bugs with some of their GPUs. The Vega 64, 6600xt, 6700xt, and 6800xt have all been confirmed to crash (no Intel or Nvidia GPUs have been reported to crash). The Vega 64 crash was found to be due to a driver bug with nested arrays, which was fixed by modifying the shader code to use a different method. However, the other AMD GPUs appear to be crashing due to memory corruption, which is not fixable by modifying the shader code. Unlike the Turbo Pipeline, the Super Pipeline's Domain shaders copy and modify the data coming from the Vertex shaders before sending it to the Geometry shaders. The task of reserving and managing memory for these copies is handled by the GPU drivers, and this is where the AMD driver bugs seem to strike. Typically the GPUs will lockup after about ~10 seconds.
Offloading most of the copying workload from the much slower Geometry shaders gives the Super Pipeline a big performance boost. Unfortunately, it also runs afoul of AMD driver bugs with some of their GPUs. The Vega 64, 6600xt, 6700xt, and 6800xt have all been confirmed to crash (no Intel or Nvidia GPUs have been reported to crash). The Vega 64 crash was found to be due to a driver bug with nested arrays, which was fixed by modifying the shader code to use a different method. However, the other AMD GPUs appear to be crashing due to memory corruption, which is not fixable by modifying the shader code. Unlike the Turbo Pipeline, the Super Pipeline's Domain shaders copy and modify the data coming from the Vertex shaders before sending it to the Geometry shaders. The task of reserving and managing memory for these copies is handled by the GPU drivers, and this is where the AMD driver bugs seem to strike. Typically these GPUs will lockup after about ~10 seconds.


Until driver support can be confirmed to be reliable on '''all''' GPUs, the Super Pipeline will not be available publicly. It currently only exists as an experimental Beta version.
Until driver support can be confirmed to be reliable on '''all''' GPUs, the Super Pipeline will not be available publicly. It currently only exists as an experimental Beta version.


'''The Super Pipeline must NEVER be used in public games!'''
'''The Super Pipeline must NEVER be used in public games!'''

Revision as of 19:09, 8 July 2023

WFFS currently uses the Unity Built-in Render Pipeline (BiRP). This graphics pipeline consists of 4 programmable shader stages: Vertex, Hull+Domain, Geometry, and Fragment.

(NOTE: WFFS does not currently support Unity's other render pipelines, namely the Universal Render Pipeline (URP), High Definition Render Pipeline (HDRP), and Scriptable Render Pipeline (SRP). Future support for these pipelines is planned, but not until the shader is more polished, reliable, and feature-complete.)

WFFS has 2 options (not including the experimental Super Pipeline) for how it will use the available BiRP stages:

Fallback Pipeline (Slow)

The Fallback Pipeline relies on slow Geometry shaders to discard all non-visible fur.

The Fallback Pipeline is the slowest pipeline, and will also have the lowest quality. It can render up to 32 layers of fur.

Its lower performance is because it relies on Geometry shaders to discard all non-visible triangles. Geometry shaders are unfortunately relatively slow to load into GPU memory, and they must reserve enough memory in advance for all possible output triangles, even if they later discard them. Any triangle that is too far away, or is backwards-facing, or is outside the visible screen area will still take time to be discarded by the Geometry shader.

Because even discarding triangles still takes some time, selecting a higher maximum layer limit than 16 comes with a performace hit, even when less layers are being rendered.

The Fallback Pipeline is not recommended. It is still faster than other fur shaders that use the same approach, such as XSFur, but the only reason it is included is for compatibility reasons. This is because it doesn't use Hull + Domain shaders, which may not be supported with some games.

Turbo Pipeline (Fast)

The Turbo pipeline uses fast Hull + Domain shaders to discard most non-visible fur.

Depending on view range, the Turbo Pipeline is typically 25% to 100% faster than the Fallback Pipeline, despite also having much higher quality. It can render up to 32 layers of fur. Selecting a higher maximum layer limit comes with a performace hit, even when less layers are being rendered.

This speed comes from its use of Hull + Domain shaders to discard any fully non-visible triangles before the Geometry shader stages. This prevents the relatively slow Geometry shaders from loading into GPU memory if they aren't needed.

Hull + Domain shaders are typically used for tesselation, and are thus built for speed. The Turbo Pipeline does not use them for tesselation, though. Instead, it uses the Hull + Domain shaders as a very fast all-or-nothing kill-switch. When it wants to discard an entire triangle, the Hull shader specifies a multiplier of 0, which discards it. Otherwise it specifies a multiplier of 1 and the triangle is then passed completely as-is by the Domain shader to the Geometry shader. The Geometry shader is then resposible for making copies of the triangle for each visible fur layer, and discarding the rest.

Because even discarding triangles still takes some time, selecting a higher maximum layer limit than 16 comes with a performace hit, even when less layers are being rendered. However, this hit is not as severe as with the Fallback Pipeline.

The Turbo Pipeline is the recommended pipeline.

Super Pipeline (Fastest, capable of extremely high resolution, but not usable on some AMD GPUs)

The Super Pipeline uses fast Hull + Domain shaders to make the fur layers.

The Super Pipeline is NOT available for public use, due to AMD crashing issues.

The Super Pipeline is typically 30% faster than the Turbo Pipeline when rendering at the same quality (ie. up to 32 layers), but it can also produce pixel-perfect screenshots with no visible gaps in the hairs. It can render up to 264 layers of fur.

The Super Pipeline's speed and resolution comes from using the Hull + Domain shaders to make only as many copies of each triangle as-needed. This means that un-needed triangles do not need to be discarded, because they are never created in the first place.

Since the Super Pipeline doesn't need to discard triangles, there is no performance hit for having a higher layer limit. The Super Pipeline will, however, limit its maximum layers dynamically, according to the quality settings.

Offloading most of the copying workload from the much slower Geometry shaders gives the Super Pipeline a big performance boost. Unfortunately, it also runs afoul of AMD driver bugs with some of their GPUs. The Vega 64, 6600xt, 6700xt, and 6800xt have all been confirmed to crash (no Intel or Nvidia GPUs have been reported to crash). The Vega 64 crash was found to be due to a driver bug with nested arrays, which was fixed by modifying the shader code to use a different method. However, the other AMD GPUs appear to be crashing due to memory corruption, which is not fixable by modifying the shader code. Unlike the Turbo Pipeline, the Super Pipeline's Domain shaders copy and modify the data coming from the Vertex shaders before sending it to the Geometry shaders. The task of reserving and managing memory for these copies is handled by the GPU drivers, and this is where the AMD driver bugs seem to strike. Typically these GPUs will lockup after about ~10 seconds.

Until driver support can be confirmed to be reliable on all GPUs, the Super Pipeline will not be available publicly. It currently only exists as an experimental Beta version.

The Super Pipeline must NEVER be used in public games!