kernels

Files

Hansung Kim 21b6655c10 sgemm_impl: Implement fast coalesced wmma_store

Enables a fairer comparison between core-coupled tensor core to Hopper
tensor core, where the latter benefits from coalesced full-throughput
moveout to GMEM because it does not use the 1x2 interleaved register
mapping.  This means the result matrix will be stored swizzled in the
GMEM, without breaking correctness.

2024-10-29 22:34:22 -07:00

kernel

generate_matrix.py: Rand [0,1); also save non-swizzled row-major B

2024-10-29 14:55:32 -07:00

opencl

convolution: Fix write_operand_file after upstream merge

2024-02-27 15:45:22 -08:00

regression

sgemm_impl: Implement fast coalesced wmma_store

2024-10-29 22:34:22 -07:00

riscv

Vortex 2.0 changes: