Using gprof I found the engine spending 10 - 20% of time in the
_render_canvas_item_tree function. The function profiles as using
about 0.09ms. Swapping the loop with two memset() calls reduces
the time spent in this function a lot, and the time per call to
about 0.02ms.
Likewise the render_canvas function was using ~10% of time, replacing
the loop there dropped per-call time from 0.22ms to 0.18ms.