Files
kernels/tests/regression/flash_attention
Hansung Kim 221d5f75c2 flash: Optimize smem alloc for tcore for 8banks
Divide into first half & last half for warpgroup 0 & 1, and
allocate Q/K and P/V in different banks for parallel acccess.
2024-09-19 21:31:39 -07:00
..
2024-08-15 21:04:59 -07:00
2024-08-14 20:46:09 -07:00