Richard Yan
91a82c9f0f
merge kernel changes from kernels-asplos-ae
2025-01-29 22:11:25 -08:00
Richard Yan
a61bf257ff
modify makefile to point to new locations
2025-01-29 21:27:59 -08:00
Richard Yan
0d842a5930
more renaming and cleanup
2025-01-29 21:22:41 -08:00
Richard Yan
f98cd9bc22
remove old ci
2025-01-29 20:39:47 -08:00
Richard Yan
d4b78377a1
fix virgo kernel scripts
2025-01-29 20:19:42 -08:00
Richard Yan
0e6bcf51f1
cleanup
2025-01-29 18:38:49 -08:00
Richard Yan
5ba132e87b
regression restructure
2025-01-29 18:30:32 -08:00
Hansung Kim
3de51577ef
Check-in gemmini headers instead of submodule
2025-01-29 17:10:37 -08:00
Richard Yan
e86aac3a6f
Merge branch 'new-cisc' into kernels-asplos-ae
2025-01-29 17:03:54 -08:00
Richard Yan
24894b1712
Merge branch 'new-cisc' of https://github.com/hansungk/vortex into new-cisc
2025-01-29 17:03:05 -08:00
Richard Yan
d47ef75614
update idle kernel
2025-01-29 17:00:08 -08:00
Richard Yan
ec41200845
updated no dma gemmini kernel
2025-01-29 16:59:44 -08:00
Hansung Kim
c26558bc93
Add fence before rescale
2025-01-28 23:48:02 -08:00
Hansung Kim
198a25cb16
Set NUM_CORES to 8 for Volta/Ampere
2025-01-28 22:49:36 -08:00
Hansung Kim
f2b5a3409d
Merge branch 'new-cisc' into kernels-asplos-ae
2025-01-28 21:18:12 -08:00
Richard Yan
8c45b8b4b7
Merge branch 'new-cisc' of https://github.com/hansungk/vortex-private into new-cisc
2025-01-28 17:14:49 -08:00
Hansung Kim
e43f3c02a9
sgemm_impl: FP_SIZE to 16
2025-01-28 17:06:04 -08:00
Richard Yan
b1e6495630
update kernels
2025-01-28 16:39:17 -08:00
Hansung Kim
d98a414765
Change gemmini_mmio.h to fp16 GEMM setting
2025-01-28 16:36:55 -08:00
Hansung Kim
e4c0bbd039
sgemm: Check-in argument binaries
2025-01-28 16:04:56 -08:00
Hansung Kim
45e9407c99
sgemm: Check-in argument binaries
2025-01-28 15:58:27 -08:00
Hansung Kim
9894efe6c9
Update toolchain env paths for dork
2025-01-28 15:04:14 -08:00
Hansung Kim
5ef4c8023e
sgemm_impl: Disable wmma fast store
...
Doesn't seem to have a big impact on tcore util.
2024-11-11 14:06:15 -08:00
Hansung Kim
7d7cb5f60a
flash: Disable perf loop multiplier
2024-11-10 22:44:02 -08:00
Hansung Kim
4448f31fdc
fence: Fix moving fence to start of loop
...
For unknown reasons, guarding the fence with a tid == 0 branch causes a
TL source ID re-used assertion. Just call the fence from all
thread/warps as a workaround. At least, all threads in a warp will
coalesce into one request.
2024-11-09 22:04:45 -08:00
Hansung Kim
cb916ead39
Fix potential bitwidth bug in compute API
2024-11-09 20:59:58 -08:00
Hansung Kim
68054689c9
flash: Move fence to start of loop; wrap all MMIO in one tid=0 branch
2024-11-09 20:59:26 -08:00
Hansung Kim
fcd8b0b892
flash: Disable rescale flag check
...
GEMM-II finishes much earlier than softmax for this to be a problem.
2024-11-09 20:37:58 -08:00
Hansung Kim
1c9b022156
flash: Rename nowarpspec to default
2024-11-09 19:58:45 -08:00
Hansung Kim
8fe6d918f2
flash: Update tcore kernel to use new CISC
2024-11-09 19:49:20 -08:00
Hansung Kim
76a6aaf085
flash: doc update
2024-11-09 19:09:09 -08:00
Hansung Kim
673e07ed43
flash: Add non-warp-specialized gemmini flash kernel
2024-11-09 19:08:39 -08:00
Hansung Kim
ac42f2dbba
sgemm_gemmini_dma: Update with new compute API
2024-11-09 16:49:39 -08:00
Hansung Kim
ad75561efe
flash: Reduce fence calls to improve util
2024-11-09 16:44:17 -08:00
Hansung Kim
6990fcc1e6
Add compute-and-mvout-to-spad API
2024-11-09 16:43:45 -08:00
Hansung Kim
952b8debbb
flash: Update to use new CISC interface
2024-11-09 16:21:34 -08:00
Hansung Kim
dc89309ad0
Merge branch 'kernels-flash' into new-cisc
2024-11-09 14:42:46 -08:00
Hansung Kim
365b1d8e67
flash: Add begin end markers
2024-11-09 10:19:16 -08:00
Hansung Kim
1e3d476e70
Switch header configs to flash
2024-11-08 21:56:42 -08:00
Richard Yan
c114a7a4ab
new gemm kernel
2024-11-08 20:55:27 -08:00
Hansung Kim
4e087a8aab
flash: Fix loop iteration for gemmini
...
Kernel is software-pipelined around 2 GEMMs and softmax; it requires two
iterations to fully complete a tile.
2024-11-08 16:43:08 -08:00
Hansung Kim
4055255018
flash: Fix tcore kernel for CISC arg field changes
2024-11-08 16:40:16 -08:00
Hansung Kim
c001618fb9
sgemm_impl: Fix wrong next block_m logic for DMA
2024-10-29 22:35:56 -07:00
Hansung Kim
21b6655c10
sgemm_impl: Implement fast coalesced wmma_store
...
Enables a fairer comparison between core-coupled tensor core to Hopper
tensor core, where the latter benefits from coalesced full-throughput
moveout to GMEM because it does not use the 1x2 interleaved register
mapping. This means the result matrix will be stored swizzled in the
GMEM, without breaking correctness.
2024-10-29 22:34:22 -07:00
Hansung Kim
6b39a6fe70
Add convenience script for switching input/args binaries
2024-10-29 20:14:33 -07:00
Hansung Kim
8dadbdd42d
tensor: Do DMA mvin for next m/n loop at the last k iter
...
This increases util by pulling the DMA wait time out of the K-loop
wraparound (next N) and overlapping it with the last K iter.
2024-10-29 19:43:22 -07:00
Hansung Kim
367fa927f8
sgemm_impl: Fix default FP_SIZE to 16
2024-10-29 14:56:18 -07:00
Hansung Kim
e1b0fc3944
generate_matrix.py: Rand [0,1); also save non-swizzled row-major B
2024-10-29 14:55:32 -07:00
Hansung Kim
24064dc7a2
sgemm_impl: Do proper addr gen and store for wgmma
2024-10-29 01:31:55 -07:00
Hansung Kim
bd7a8e39b9
sgemm_impl: Split out smem addr gen to functions
...
so that the addr gen code can also be used for wgmma.
2024-10-29 01:30:48 -07:00