`compute_list_add_barrier` ends and begins a new compute list with the
same state, so calling this immediately before `compute_list_end` is
redundant, and ends up creating empty compute encoders in Metal.
This method is used to generate headers for embedding files into the binary
(think about the new `#embed` feature in C23 and C++26).
While the stringification step itself was plenty fast, it then proceeded
to wrap everything using the `textwrap` module. `textwrap` is *very*
slow, as it's apparently optimized for human text.
This patch reimplements the wrapping logic using a simple regex,
resulting in a tremendous speed improvement (~6x), and switches to `map`
for the stringification itself (thanks Rémi!)
It also removes a (practically) unused argument, `initial_indent`.
The generated files are pretty much the same, with a tiny difference in
line length (for some reason the old logic overshot the requested line
length)
This is useful for 3D games with a pixel art appearance, or when
using a resolution scale of `0.5` to improve performance without
compromising crispness too much when not using FSR 1.0.
The property hints now allow decreasing the scale further to accomodate
for pixel art use cases, as well as increased precision in the value
(useful for a scale of `0.3333`).
Co-authored-by: Daniel Savage <dansvg@gmail.com>
Co-authored-by: Kaleb Reid <78945904+Kaleb-Reid@users.noreply.github.com>
A number of headers in the codebase included `rendering_server.h` just for
some enum definitions. This means that any change to `rendering_server.h` or
one of its dependencies would trigger a massive incremental rebuild.
With this change, we decouple a number of classes from `rendering_server.h`,
greatly speeding up incremental rebuilds for that area.
On my machine, this reduces incremental compilation time after an edit of
`rendering_server.h` by 60s (from 2m57s).
Reduce the mipmap levels so that the border is pixel perfect
Always use the high quality downsample to reduce jaggies
Fix the jacobian approximation so that it actually helps account for the octahedral distortion
This allows the ARG to better optimize their usage for mobile devices.
In particular, this allows the ARG to avoid loading the contents of the texture into tile memory when rendering into the texture if the contents won't be used anyway.
Also optimize all tonemappers to perform less calculations per-pixel.
Note: unlike `white`, `agx_white` is limited to a minimum of `2.0` and defaults to `16.29`. When using a RGB10A2 render buffer, `agx_white` will be ignored and a value of `2.0` will be used instead to ensure good behavior on the Mobile renderer.
Mobile devices are typically bandwidth bound which means we need to do as few texture samples as possible.
They typically use TBDR GPUs which means that all rendering takes place on special optimized tiles. As a side effect, reading back memory from tile to VRAM is really slow, especially on Mali devices.
This commit uses a technique where you do a small blur while downsampling, and then another small blur while upsampling to get really high quality glow. While this doesn't reduce the renderpass count very much, it does reduce the texture read bandwidth by almost 10 times. Overall glow was more texture-read bound than memory write, bound, so this was a huge win.
A side effect of this new technique is that we can gather the glow as we upsample instead of gathering the glow in the final tonemap pass. Doing so allows us to significantly reduce the cost of the tonemap pass as well.
Additionally, change the minimum `tonemap_white` parameter to `1.0`; users can increase `tonemap_exposure` for a similar effect to decreasing `tonemap_white` below `1.0`.
Co-authored-by: Hei <40064911+Lielay9@users.noreply.github.com>
Co-authored-by: Hugo Locurcio <hugo.locurcio@hugo.pro>