Commit Graph

4 Commits

Author SHA1 Message Date
Hansung Kim
e809d25305 flash: Fix rowsum and write fake exp
GEMM part is disabled for faster debugging, the kernel reads the result
of A*B directly from input binary.
2024-08-15 16:32:21 -07:00
Hansung Kim
53dfc690b9 flash: Allocate smem properly for rowsum and scratch 2024-08-14 21:50:20 -07:00
Hansung Kim
9cabe3413b Fix overlapping smem in rowmax 2024-08-14 21:09:53 -07:00
Hansung Kim
692d028afd Add flash attention kernel skeleton 2024-08-14 20:46:09 -07:00