Files
kernels/tests
Hansung Kim 88cddc2b66 sgemm_tcore: Support data move for fp16-packed elements
Since core does not support memory accesses to non-word-aligned
addresses, pack fp16 elements in pairs into fp32 values, and do regular
tile movement with conditionally compressed column dimensions.
Perf seems to stay the same for fp32 256x256.
2024-07-30 21:43:10 -07:00
..
2023-11-10 02:47:05 -08:00
2023-11-11 15:49:39 -08:00
2024-03-24 01:47:00 -07:00
2023-11-10 02:47:05 -08:00