An easy solution to handle multiple concurrent warp operations by staging half-operands in their own per-warp register. This might increase area requirement by quite a bit. TODO: Commit is not being handled correctly yet