When the same uniform set is bound multiple times within a render pass
(e.g., OPAQUE pass then ALPHA pass with different instance buffers),
the dynamic offset bits were being OR'd together, corrupting the packed
frame indices.
For example, if OPAQUE binds set 1 with frame_idx=1 (0x10) and ALPHA
binds set 1 with frame_idx=2 (0x20), the OR produces 0x30 instead of
the correct 0x20, causing the shader to read from the wrong buffer
offset.
This fix clears the relevant bits for each set being bound before
OR'ing the new values, ensuring only the latest frame indices are used.
Fixes rendering artifacts in the mobile renderer which uses two dynamic
uniforms (uniform buffer + storage buffer) in set 1, unlike the clustered
renderer which only has one.
This work is a heavily refactored and rewritten from TheForge's initial
code.
TheForge's original code had too many race conditions and was
fundamentally flawed as it was too easy to incur into those data races
by accident.
However they identified the proper places that needed changes, and the
idea was sound. I used their work as a blueprint to design this work.
This PR implements:
- Introduction of UMA buffers used by a few buffers
(most notably the ones filled by _fill_instance_data).
Ironically this change seems to positively affect PC more than it does
on Mobile.
Updates D3D12 Memory Allocator to get GPU_UPLOAD heap support.
Metal implementation by Stuart Carnie.
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: TheForge team
Introduce a specialised `texture_get_data` for `RenderDeviceDriver`,
which can retrieve the texture data from the GPU driver for shared
textures (`TEXTURE_USAGE_CPU_READ_BIT`).
Closes#108115
Remove cross-project includes from `hashfuncs.h`.
Improve hashing function for `Color` (based on values instead of `String`).
Move `Variant` comparison from `hash_map.h` to `dictionary.cpp` (`VariantComparatorDictionary`), where it's used.
Remove now unnecessary `HashableHasher`.
This change introduces a new protected type, `ReflectedShaderStage` to
`RenderingShaderContainer` that derived types use to access SPIR-V and
the reflected module, `SpvReflectShaderModule` allowing implementations
to use the reflection information to compile their platform-specific
module.
* Fixes memory leak in `reflect_spirv` that would not deallocate the
`SpvReflectShaderModule` if an error occurred.
* Removes unnecessary allocation when creating `SpvReflectShaderModule`
by passing `NO_COPY` flag to `spvReflectCreateShaderModule2`
constructor function.
* Replaces `VectorView` with `Span` for consistency
* Fixes unnecessary allocations in D3D12 shader container in
`_convert_spirv_to_nir` and `_convert_spirv_to_dxil` which implicitly
converted the old `VectorView` to a `Vector`
This allows Apple platforms to override the default stack size of
a thread in the WorkerThreadPool, which is 512KiB by default.
This must be increased, as SPIRV-Cross, used by the Metal driver, can
use deeply nested stacks, as can debug builds.