kernels

Author	SHA1	Message	Date
Richard Yan	91a82c9f0f	merge kernel changes from kernels-asplos-ae	2025-01-29 22:11:25 -08:00
Richard Yan	a61bf257ff	modify makefile to point to new locations	2025-01-29 21:27:59 -08:00
Richard Yan	0d842a5930	more renaming and cleanup	2025-01-29 21:22:41 -08:00
Richard Yan	f98cd9bc22	remove old ci	2025-01-29 20:39:47 -08:00
Richard Yan	d4b78377a1	fix virgo kernel scripts	2025-01-29 20:19:42 -08:00
Richard Yan	0e6bcf51f1	cleanup	2025-01-29 18:38:49 -08:00
Richard Yan	5ba132e87b	regression restructure	2025-01-29 18:30:32 -08:00
Hansung Kim	3de51577ef	Check-in gemmini headers instead of submodule	2025-01-29 17:10:37 -08:00
Richard Yan	e86aac3a6f	Merge branch 'new-cisc' into kernels-asplos-ae	2025-01-29 17:03:54 -08:00
Richard Yan	24894b1712	Merge branch 'new-cisc' of https://github.com/hansungk/vortex into new-cisc	2025-01-29 17:03:05 -08:00
Richard Yan	d47ef75614	update idle kernel	2025-01-29 17:00:08 -08:00
Richard Yan	ec41200845	updated no dma gemmini kernel	2025-01-29 16:59:44 -08:00
Hansung Kim	c26558bc93	Add fence before rescale	2025-01-28 23:48:02 -08:00
Hansung Kim	198a25cb16	Set NUM_CORES to 8 for Volta/Ampere	2025-01-28 22:49:36 -08:00
Hansung Kim	f2b5a3409d	Merge branch 'new-cisc' into kernels-asplos-ae	2025-01-28 21:18:12 -08:00
Richard Yan	8c45b8b4b7	Merge branch 'new-cisc' of https://github.com/hansungk/vortex-private into new-cisc	2025-01-28 17:14:49 -08:00
Hansung Kim	e43f3c02a9	sgemm_impl: FP_SIZE to 16	2025-01-28 17:06:04 -08:00
Richard Yan	b1e6495630	update kernels	2025-01-28 16:39:17 -08:00
Hansung Kim	d98a414765	Change gemmini_mmio.h to fp16 GEMM setting	2025-01-28 16:36:55 -08:00
Hansung Kim	e4c0bbd039	sgemm: Check-in argument binaries	2025-01-28 16:04:56 -08:00
Hansung Kim	45e9407c99	sgemm: Check-in argument binaries	2025-01-28 15:58:27 -08:00
Hansung Kim	9894efe6c9	Update toolchain env paths for dork	2025-01-28 15:04:14 -08:00
Hansung Kim	5ef4c8023e	sgemm_impl: Disable wmma fast store Doesn't seem to have a big impact on tcore util.	2024-11-11 14:06:15 -08:00
Hansung Kim	7d7cb5f60a	flash: Disable perf loop multiplier	2024-11-10 22:44:02 -08:00
Hansung Kim	4448f31fdc	fence: Fix moving fence to start of loop For unknown reasons, guarding the fence with a tid == 0 branch causes a TL source ID re-used assertion. Just call the fence from all thread/warps as a workaround. At least, all threads in a warp will coalesce into one request.	2024-11-09 22:04:45 -08:00
Hansung Kim	cb916ead39	Fix potential bitwidth bug in compute API	2024-11-09 20:59:58 -08:00
Hansung Kim	68054689c9	flash: Move fence to start of loop; wrap all MMIO in one tid=0 branch	2024-11-09 20:59:26 -08:00
Hansung Kim	fcd8b0b892	flash: Disable rescale flag check GEMM-II finishes much earlier than softmax for this to be a problem.	2024-11-09 20:37:58 -08:00
Hansung Kim	1c9b022156	flash: Rename nowarpspec to default	2024-11-09 19:58:45 -08:00
Hansung Kim	8fe6d918f2	flash: Update tcore kernel to use new CISC	2024-11-09 19:49:20 -08:00
Hansung Kim	76a6aaf085	flash: doc update	2024-11-09 19:09:09 -08:00
Hansung Kim	673e07ed43	flash: Add non-warp-specialized gemmini flash kernel	2024-11-09 19:08:39 -08:00
Hansung Kim	ac42f2dbba	sgemm_gemmini_dma: Update with new compute API	2024-11-09 16:49:39 -08:00
Hansung Kim	ad75561efe	flash: Reduce fence calls to improve util	2024-11-09 16:44:17 -08:00
Hansung Kim	6990fcc1e6	Add compute-and-mvout-to-spad API	2024-11-09 16:43:45 -08:00
Hansung Kim	952b8debbb	flash: Update to use new CISC interface	2024-11-09 16:21:34 -08:00
Hansung Kim	dc89309ad0	Merge branch 'kernels-flash' into new-cisc	2024-11-09 14:42:46 -08:00
Hansung Kim	365b1d8e67	flash: Add begin end markers	2024-11-09 10:19:16 -08:00
Hansung Kim	1e3d476e70	Switch header configs to flash	2024-11-08 21:56:42 -08:00
Richard Yan	c114a7a4ab	new gemm kernel	2024-11-08 20:55:27 -08:00
Hansung Kim	4e087a8aab	flash: Fix loop iteration for gemmini Kernel is software-pipelined around 2 GEMMs and softmax; it requires two iterations to fully complete a tile.	2024-11-08 16:43:08 -08:00
Hansung Kim	4055255018	flash: Fix tcore kernel for CISC arg field changes	2024-11-08 16:40:16 -08:00
Hansung Kim	c001618fb9	sgemm_impl: Fix wrong next block_m logic for DMA	2024-10-29 22:35:56 -07:00
Hansung Kim	21b6655c10	sgemm_impl: Implement fast coalesced wmma_store Enables a fairer comparison between core-coupled tensor core to Hopper tensor core, where the latter benefits from coalesced full-throughput moveout to GMEM because it does not use the 1x2 interleaved register mapping. This means the result matrix will be stored swizzled in the GMEM, without breaking correctness.	2024-10-29 22:34:22 -07:00
Hansung Kim	6b39a6fe70	Add convenience script for switching input/args binaries	2024-10-29 20:14:33 -07:00
Hansung Kim	8dadbdd42d	tensor: Do DMA mvin for next m/n loop at the last k iter This increases util by pulling the DMA wait time out of the K-loop wraparound (next N) and overlapping it with the last K iter.	2024-10-29 19:43:22 -07:00
Hansung Kim	367fa927f8	sgemm_impl: Fix default FP_SIZE to 16	2024-10-29 14:56:18 -07:00
Hansung Kim	e1b0fc3944	generate_matrix.py: Rand [0,1); also save non-swizzled row-major B	2024-10-29 14:55:32 -07:00
Hansung Kim	24064dc7a2	sgemm_impl: Do proper addr gen and store for wgmma	2024-10-29 01:31:55 -07:00
Hansung Kim	bd7a8e39b9	sgemm_impl: Split out smem addr gen to functions so that the addr gen code can also be used for wgmma.	2024-10-29 01:30:48 -07:00

1 2 3 4 5 ...

2667 Commits