kernels

Files

Hansung Kim 88cddc2b66 sgemm_tcore: Support data move for fp16-packed elements

Since core does not support memory accesses to non-word-aligned
addresses, pack fp16 elements in pairs into fp32 values, and do regular
tile movement with conditionally compressed column dimensions.
Perf seems to stay the same for fp32 256x256.

2024-07-30 21:43:10 -07:00

.gitignore

sgemm_tg: Use reg mapping functions

2024-05-12 22:22:54 -07:00

common.h

Add args.bin to ELF

2024-06-06 15:19:39 -07:00

half.hpp

sgemm_tcore: Support fp16 input generation in host code

2024-07-29 17:18:35 -07:00

kernel.4warps.cpp

sgemm_tcore: Rewrite with sgemm_Wg parametrization

2024-05-13 13:22:06 -07:00

kernel.activation.cpp

Use SWISH in activate_block for tcore and gemmini

2024-06-19 15:41:50 -07:00

kernel.cpp

sgemm_tcore: Support data move for fp16-packed elements

2024-07-30 21:43:10 -07:00

kernel.warpspecial_dma.cpp

sgemm_tcore: Replace hardcoded NUM_LANES with NUM_THREADS

2024-06-12 21:01:37 -07:00

kernel.warpspecial.cpp

sgemm_tcore: Replace hardcoded NUM_LANES with NUM_THREADS

2024-06-12 21:01:37 -07:00

main.cpp

sgemm_tcore: Template-ize kernel code

2024-07-29 20:11:51 -07:00

Makefile

sgemm_tcore: Skip load at last k-iter; do DMA by default

2024-07-19 16:37:51 -07:00

util.hpp

sgemm_tcore: Support data move for fp16-packed elements

2024-07-30 21:43:10 -07:00