Hansung Kim
|
200fd3e08c
|
sgemm_tcore: Revert to packed smem alloc
|
2024-05-25 22:47:59 -07:00 |
|
Hansung Kim
|
bc7bd1a1dd
|
sgemm_tcore: Write reference C matrix to file
|
2024-05-25 22:47:15 -07:00 |
|
Hansung Kim
|
0a884e1ead
|
tensor: spawn on all warps, 8 lanes
|
2024-05-25 20:19:57 -07:00 |
|
Hansung Kim
|
b892c22f00
|
sgemm_tcore: Reflect WMITER/WNITER in threadblock size
|
2024-05-16 23:31:52 -07:00 |
|
Hansung Kim
|
18ecebddc0
|
sgemm_tcore: Fix round-down error with CORES_PER_CLUSTER
|
2024-05-16 21:36:24 -07:00 |
|
Hansung Kim
|
78b2a318c1
|
sgemm_tcore: Implement A transpose for coalesced smem access
|
2024-05-16 20:22:15 -07:00 |
|
Hansung Kim
|
8f64fae7a7
|
sgemm_tcore: Addr gen for local_k; add SIMT-only for reference
|
2024-05-16 14:11:09 -07:00 |
|
Hansung Kim
|
df1aa62916
|
sgemm_tcore: Add warptiling parameters
FIXME: accumulation is done wrong
|
2024-05-15 15:23:26 -07:00 |
|
Hansung Kim
|
5de8e7c33a
|
sgemm_tg: Fix device address to use ELF operands
|
2024-05-13 23:09:57 -07:00 |
|
Hansung Kim
|
9d2b533d5c
|
sgemm_tg: Do operand elf stitching for kernel.elf as well
|
2024-05-13 16:48:13 -07:00 |
|
Hansung Kim
|
09b23ffe87
|
sgemm_tg: 1-octet 8-lane kernel
|
2024-05-13 14:52:33 -07:00 |
|
Hansung Kim
|
d848e88f72
|
sgemm_tcore: Move C from regF->GMEM directly
|
2024-05-13 14:00:50 -07:00 |
|
Hansung Kim
|
9e60b1834c
|
sgemm_tcore: Rewrite with sgemm_Wg parametrization
|
2024-05-13 13:22:06 -07:00 |
|
Hansung Kim
|
5c298c81df
|
sgemm_tg: Use reg mapping functions
|
2024-05-12 22:22:54 -07:00 |
|
Hansung Kim
|
8a521a1de8
|
Add 8-lane operand mapping
|
2024-05-10 23:23:11 -07:00 |
|
Hansung Kim
|
6af0c305ea
|
Fix path to OBJCOPY
|
2024-05-08 13:27:11 -07:00 |
|
Hansung Kim
|
6ba6a1e2e5
|
Merge branch 'kernels' into tensor_core
|
2024-05-08 13:25:31 -07:00 |
|
Hansung Kim
|
5821bfd10d
|
Repeat vx_wmma issue & hardcode dst address
|
2024-05-08 13:22:26 -07:00 |
|
Hansung Kim
|
7775830814
|
Hardcode chipyard device addresses
|
2024-05-07 16:30:30 -07:00 |
|
Hansung Kim
|
b4c812f9f8
|
Write expected_C to a binary file
|
2024-05-05 18:27:56 -07:00 |
|
joshua
|
5bd25985c6
|
i kinda forgot most of changes
|
2024-05-04 23:01:47 -07:00 |
|
Hansung Kim
|
a606a9ef42
|
common.mk: properly handle unspecified CONFIG
|
2024-04-29 17:14:28 -07:00 |
|
Richard Yan
|
01f4a69ae9
|
dma mvout, double buffering & other opts
|
2024-04-28 01:18:51 -07:00 |
|
Richard Yan
|
d21e7b92c7
|
internal accumulation, forced rematerialization, better unrolling
|
2024-04-25 15:28:12 -07:00 |
|
Richard Yan
|
a44edf2b65
|
Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels
|
2024-04-24 22:10:40 -07:00 |
|
Richard Yan
|
6eafa2de54
|
write operands to elf
|
2024-04-24 22:09:30 -07:00 |
|
Hansung Kim
|
df881fd69f
|
Generate separate ELF for radiance
|
2024-04-24 21:10:21 -07:00 |
|
Hansung Kim
|
793779aa6c
|
sgemm_wg: 128x128 config
|
2024-04-24 21:10:21 -07:00 |
|
Hansung Kim
|
689043b45e
|
Add regression flops
|
2024-04-24 21:10:21 -07:00 |
|
Hansung Kim
|
6cbfbfb856
|
sgemm_wg: Output CPU data to binary
|
2024-04-24 21:10:21 -07:00 |
|
Richard Yan
|
4e9855dc33
|
highly unrolled a/b load
|
2024-04-16 22:19:30 -07:00 |
|
Richard Yan
|
449d99f0bb
|
dram gemm kernel
|
2024-04-16 17:15:22 -07:00 |
|
Richard Yan
|
99621a0df9
|
Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels
|
2024-04-15 10:22:19 -07:00 |
|
Richard Yan
|
041d49fb58
|
update gemmini only kernel
|
2024-04-15 10:22:00 -07:00 |
|
Richard Yan
|
0bb7aeb45b
|
add gpu+gemmini gemm kernel
|
2024-04-15 10:13:37 -07:00 |
|
Hansung Kim
|
37a60b1141
|
sgemm_wg: Output C result to binary
|
2024-04-14 12:36:06 -07:00 |
|
Hansung Kim
|
3383b70732
|
sgemm_wg: Hardcode device address
|
2024-04-14 12:36:00 -07:00 |
|
Richard Yan
|
7bf72c9568
|
cycle counting for fence
|
2024-04-09 19:53:17 -07:00 |
|
Hansung Kim
|
93a00101ae
|
sgemm_wg: revert to faster params
|
2024-04-04 21:06:14 -07:00 |
|
Richard Yan
|
84a31f3384
|
thread parallel data loading for word strided bank
|
2024-04-01 11:10:32 -07:00 |
|
Richard Yan
|
e6db1a83af
|
Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels
|
2024-04-01 11:09:43 -07:00 |
|
Hansung Kim
|
fa2b6e2ad0
|
sgemm_wg: Explicitly limit unroll to reduce stack spilling
This needs to be done case-by-case for different BK/TM/TN combinations and
examining the assembly.
|
2024-03-29 02:48:29 -07:00 |
|
Hansung Kim
|
537b97eb20
|
common.mk: Don't clean all *.elf
|
2024-03-28 20:17:26 -07:00 |
|
Hansung Kim
|
a9b0814211
|
sgemm_wg: Document tiling parameter constraints
|
2024-03-28 18:17:00 -07:00 |
|
Hansung Kim
|
9673db4e8c
|
sgemm_wg: Fix possible divide-by-0
|
2024-03-28 17:35:47 -07:00 |
|
joshua
|
d8f9359fae
|
test case update
|
2024-03-28 13:04:02 -07:00 |
|
Hansung Kim
|
9555b790e7
|
sgemm_wg: ifdef-guard cluster specific code
|
2024-03-27 22:45:51 -07:00 |
|
Hansung Kim
|
09822764e7
|
sgemm_wg: Remove software-based barrier implementation
Intra-cluster barrier is now implemented in hardware, transparent to the ISA.
|
2024-03-27 22:43:45 -07:00 |
|
Hansung Kim
|
fa6adceb7e
|
vecaddx: Hardcode args/input device address to match chipyard
Don't use mem_alloc/mem_free API
|
2024-03-27 15:15:52 -07:00 |
|
joshua
|
e16584ddd9
|
bleh still not work
|
2024-03-27 00:26:04 -07:00 |
|