Hansung Kim
|
02feb36b12
|
idle: Use barriers instead to hang the core
|
2024-06-22 01:37:00 -07:00 |
|
Hansung Kim
|
11e6d34e1c
|
Add idle kernel
Only spawns 1 thread that does a busy wait up to a counter. Other cores
do not issue any instructions after the scheduling prologue.
|
2024-06-20 14:00:32 -07:00 |
|
Hansung Kim
|
63418a7496
|
sgemm_gemmini_dma: Skip mvout to scratchpad
Not necessary either for activation on gmem
|
2024-06-19 20:49:44 -07:00 |
|
Richard Yan
|
12a96d9c16
|
Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels
|
2024-06-19 17:46:24 -07:00 |
|
Richard Yan
|
a1e165724f
|
skip move to spad
|
2024-06-19 17:45:58 -07:00 |
|
Richard Yan
|
c06cc40e59
|
make non dma gemmini use 64x64 tile size
|
2024-06-19 17:45:01 -07:00 |
|
Hansung Kim
|
bebdd3353e
|
Use SWISH in activate_block for tcore and gemmini
|
2024-06-19 15:41:50 -07:00 |
|
Hansung Kim
|
ae9e707280
|
sgemm_{gemmini_dma,tcore}: Separate activate_block
|
2024-06-19 14:50:22 -07:00 |
|
Hansung Kim
|
b586e0f881
|
sgemm_gemmini_dma: Update activation to match tcore
|
2024-06-18 15:30:12 -07:00 |
|
Hansung Kim
|
50b843d8c4
|
sgemm_tcore: Fix address overlap for DMA
Enforce square shapes of tiles in smem. TODO need to configure loop
bounds correctly.
|
2024-06-18 15:06:07 -07:00 |
|
Hansung Kim
|
36b02ad595
|
sgemm_tcore: Add warp-specialized kernel with activations
FIXME; only tested with WARP_SPECIALIZED == 0.
|
2024-06-17 19:14:33 -07:00 |
|
Hansung Kim
|
1a44063c5d
|
sgemm_gemmini_dma: Initial activation kernel with gemmini+DMA
Currently does spurrious fmul's in repetition.
|
2024-06-17 16:56:29 -07:00 |
|
Hansung Kim
|
85cace9524
|
sgemm_tcore: Fix smem allocation for non-dma
|
2024-06-15 01:28:27 -07:00 |
|
Hansung Kim
|
cfb6ae4a91
|
sgemm_tcore: Fix wrong double-buf addr for wmma_load
|
2024-06-15 00:51:35 -07:00 |
|
Hansung Kim
|
9d6ff196b3
|
sgemm_tcore: Use old opcodes to match frozen rtl
|
2024-06-15 00:26:57 -07:00 |
|
Hansung Kim
|
095ccfd79a
|
sgemm_gemmini_duo: Check in serialized kernel as separate file
|
2024-06-12 22:44:14 -07:00 |
|
Hansung Kim
|
1f26b4ef10
|
Remove checked in binary
|
2024-06-12 22:03:29 -07:00 |
|
Hansung Kim
|
95e9adb2d0
|
sgemm_gemmini_duo: Fix device addr in main.cpp
|
2024-06-12 21:57:24 -07:00 |
|
Hansung Kim
|
f5d82f85e5
|
sgemm_gemmini_duo: Split per-gemmini code to function
|
2024-06-12 21:17:03 -07:00 |
|
Hansung Kim
|
ce4f3a24e3
|
sgemm_tcore: Replace hardcoded NUM_LANES with NUM_THREADS
|
2024-06-12 21:01:37 -07:00 |
|
Hansung Kim
|
91efc0fc14
|
Check in VX_config.h with 4core/8warp/8threads default
|
2024-06-12 20:52:08 -07:00 |
|
Hansung Kim
|
21452661f2
|
sgemm_tcore: Fix double-buffered addr for GEMMINI_DMA
|
2024-06-12 13:36:29 -07:00 |
|
Hansung Kim
|
635da96154
|
sgemm_tcore: Constify smem pointer for wmma_load
|
2024-06-12 13:36:29 -07:00 |
|
Richard Yan
|
f73029889b
|
oopsie
|
2024-06-12 13:34:19 -07:00 |
|
Richard Yan
|
5c6526e414
|
Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels
|
2024-06-12 02:12:45 -07:00 |
|
Richard Yan
|
f37f5d5612
|
dual gemmini kernel + quad core vortex
|
2024-06-12 02:12:38 -07:00 |
|
Hansung Kim
|
32e31c51a4
|
sgemm_tcore: Blocksize 64; Fix kernel launch on larger dim
& fix addrgen assembly too large offset error
|
2024-06-11 22:27:12 -07:00 |
|
Hansung Kim
|
03d1df8f53
|
sgemm_tcore: Separate transpose control on AS read/write
Make separate control flags on transposed AS read/write to make it easy
to model bank-conflict-free GMEM _and_ SMEM access.
|
2024-06-11 21:16:23 -07:00 |
|
Hansung Kim
|
34eaab4c87
|
sgemm_tcore: Fix warp-specialized kernel for larger dim
|
2024-06-11 20:50:32 -07:00 |
|
Hansung Kim
|
9febfb9bdc
|
sgemm_tcore: Move global_dmem_load back to kernel.cpp
|
2024-06-11 20:12:30 -07:00 |
|
Hansung Kim
|
ca7fd84a83
|
sgemm_tcore: Split util functions to a header file
|
2024-06-11 19:06:22 -07:00 |
|
Hansung Kim
|
dab9d7c6fc
|
sgemm_tcore: Fix kernel launch for smaller TBs than cluster threads
E.g. bm32bn32bk32wm16wn8
|
2024-06-11 14:09:31 -07:00 |
|
Hansung Kim
|
e3c4a4d2f5
|
sgemm_tcore: Improve agen for !transpose_as smem load
|
2024-06-10 22:08:37 -07:00 |
|
Hansung Kim
|
dc7bd6b248
|
sgemm_tcore: Fix warp_row/col calculation bug
|
2024-06-10 19:52:37 -07:00 |
|
Hansung Kim
|
3b2f5a31de
|
sgemm_tcore: Improve write_result addr gen
|
2024-06-10 19:34:00 -07:00 |
|
Hansung Kim
|
a22762db94
|
sgemm_tcore: Add GEMMINI_DMA to non-warp-specialized mode
~63% util for 128x128; ~83% for the k-loop.
FIXME: result is not correct currently. Need to fix the transpose
|
2024-06-10 19:32:39 -07:00 |
|
Hansung Kim
|
51d9cffb2d
|
Merge remote-tracking branch 'origin/kernels' into kernels
|
2024-06-10 16:41:36 -07:00 |
|
Hansung Kim
|
39449ece37
|
Add warp-specialized{, and dma enabled} kernel
NOTE: warpspecial_dma is hacked, need to get rid of dma invocation in
the consmer code.
|
2024-06-10 16:39:49 -07:00 |
|
Richard Yan
|
357435bc96
|
Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels
|
2024-06-09 14:35:02 -07:00 |
|
Richard Yan
|
c327474e3b
|
power specific code for kernel
|
2024-06-09 14:34:58 -07:00 |
|
Hansung Kim
|
e4eed20de3
|
sgemm_gemmini_dma: Fix device addr for operands
|
2024-06-09 13:35:16 -07:00 |
|
Hansung Kim
|
90662484b1
|
sgemm_tcore: inline left out
|
2024-06-07 20:34:29 -07:00 |
|
Hansung Kim
|
d61dd85872
|
sgemm_tcore: Remove unused SIMT core code
|
2024-06-07 20:32:02 -07:00 |
|
Hansung Kim
|
aaf4a89b57
|
Fix asm label already defined error
|
2024-06-07 19:55:28 -07:00 |
|
Hansung Kim
|
9e8988df6b
|
Patch args device address for dma kernel
|
2024-06-07 18:32:07 -07:00 |
|
Hansung Kim
|
fc8f0c99f0
|
Merge branch 'tensor_core' into kernels
|
2024-06-07 18:27:02 -07:00 |
|
Hansung Kim
|
800d9801b5
|
tensor: Test with multiple accumulators
|
2024-06-07 18:19:20 -07:00 |
|
Hansung Kim
|
080923e869
|
common.mk: Add more aggressive inline flag
|
2024-06-07 18:14:40 -07:00 |
|
Hansung Kim
|
2cac995db9
|
tensor: generate 8x8 in correctness script
|
2024-06-07 18:13:57 -07:00 |
|
Richard Yan
|
7cf59c9480
|
dma and demo kernels
|
2024-06-07 18:11:19 -07:00 |
|