This fixes correctness for TRANSPOSE_AT_PRODUCE/COLUMN=0/0, provided the matrices are already stored in the correct layout in GMEM.
This fixes correctness for TRANSPOSE_AT_PRODUCE/COLUMN=0/0, provided the matrices are already stored in the correct layout in GMEM.