Notes on fragment shader backend
I'm just using this page for now to collect my thoughts on how to write the backend for the fragment processor, and to collect links that may be useful for future reference. I'm focusing on the backend, since it's the most difficult part of the compiler due to the fragment shader's novel architecture.
- Retargetable Graph-Coloring Register Allocation for Irregular Architectures - used by various mesa backends, should work well for us
- Iterated Register Coalescing - standard technique for register coalescing
It seems to me that the main problem isn't scheduling (mostly an issue of finding the right heuristics and doing the actual grunt work to see if you can add an instruction to a packet) or register allocation (thinking of using the above-linked algorithm), but how the two should interact. Mainly, the issue has to do with how to deal with register coalescing and spills. Due to the architecture's pipelined nature and abundance of pipeline registers, scheduling has to be able to change the semantics of the program. Furthermore, scheduling would be constrained in how it could pipeline together operations (replacing normal registers with pipeline registers) if register allocation were to be performed first, because it would be harder to determine if it's legal to replace a normal register with a pipeline register. On the other side, scheduling will have the tendency to "hide" certain reads and writes, either because a register was replaced with a pipeline register, or because the instruction writes to a register that isn't the overall destination for an instruction packet (for example, a varying load unit when an ALU is also being used). Certainly, the register allocator will want to take advantage of that reduction in register pressure. Therefore, it seems that the best option is to have an instruction scheduling pass before register allocation.
The difficulty with that, though, is that modern graph-coloring register allocators expect to be able to change the program semantics, even in the middle of allocation. Iterated register coalescing, for example, interleaves register coalescing/copy folding passes into the process of reducing the interference graph. However, doing so changes the way that instructions can be ordered and gives new opportunities to the scheduler, and therefore can change the interference graph. Again, adding spill code (temporary reads and writes) can once again change the structure of the program and therefore the interference graph. This breaks the guarantee implicit in both stages that the interference graph won't be changed