Hansung Kim
88cddc2b66
sgemm_tcore: Support data move for fp16-packed elements
...
Since core does not support memory accesses to non-word-aligned
addresses, pack fp16 elements in pairs into fp32 values, and do regular
tile movement with conditionally compressed column dimensions.
Perf seems to stay the same for fp32 256x256.
2024-07-30 21:43:10 -07:00
..
2024-06-07 18:11:19 -07:00
2023-11-14 05:37:46 -08:00
2023-11-14 22:31:30 -08:00
2023-11-14 05:37:46 -08:00
2023-11-10 02:47:05 -08:00
2023-11-14 05:37:46 -08:00
2024-04-24 21:10:21 -07:00
2024-06-22 01:37:00 -07:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2024-06-07 18:11:19 -07:00
2024-06-19 17:45:01 -07:00
2024-06-19 20:49:44 -07:00
2024-06-12 22:44:14 -07:00
2024-07-30 21:43:10 -07:00
2024-06-06 15:19:39 -07:00
2023-11-27 02:21:47 -08:00
2023-11-14 05:37:46 -08:00
2024-03-27 15:15:52 -07:00
2024-06-07 18:27:02 -07:00
2023-11-27 02:21:47 -08:00