vortex

Author	SHA1	Message	Date
Hansung Kim	f3afd4a6f9	Hardcode NUM_THREADS/.. only when SYNTHESIS They're duplicately set in VX_config.vh which is confusing.	2024-07-23 15:15:16 -07:00
Richard Yan	ed247e21bb	Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl	2024-07-20 23:37:58 -07:00
Richard Yan	7d422cc9b0	pre-submission changes	2024-07-20 23:33:56 -07:00
Hansung Kim	14b811f334	Update doc	2024-07-19 16:39:05 -07:00
Hansung Kim	4b093e3ff7	tensor: Mark PARTIAL_BW on power impact	2024-06-26 14:25:26 -07:00
Hansung Kim	9a6fe79bd3	VX_operands_dup: Add counter for RF read/write accesses	2024-06-22 16:35:23 -07:00
Hansung Kim	fb973a51b6	core_wrapper: Only terminate when core 0 is finished; more slack time	2024-06-22 16:34:42 -07:00
Hansung Kim	46fe1897bf	VX_platform.vh: Undefine FIRESIM by default	2024-06-22 16:34:08 -07:00
Hansung Kim	d4f6f8a257	Set NUM_ALU_BLOCKS=2, NUM_FPU_BLOCKS=1	2024-06-22 16:33:42 -07:00
Hansung Kim	a9b75dd492	Set default to 4cores/8barriers in VX_config.{h,vh}	2024-06-12 20:51:15 -07:00
Hansung Kim	86deaa8e07	Give some slack time for other cores to finish	2024-06-12 09:47:21 -07:00
Richard Yan	1833e8a176	Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl	2024-06-12 02:17:01 -07:00
Richard Yan	7947df8a6c	config change, move ucode	2024-06-12 02:15:08 -07:00
Hansung Kim	5218292b6f	core_wrapper: Use finished and !reset to determine termination	2024-06-11 16:28:05 -07:00
Hansung Kim	de10d5a957	Don't print from mem_scheduler in reset	2024-06-09 22:44:33 -07:00
Hansung Kim	5d5e4a468c	Merge remote-tracking branch 'refs/remotes/origin/rtl' into rtl	2024-06-09 15:58:32 -07:00
Richard Yan	a47389fc0e	Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl	2024-06-09 15:15:31 -07:00
Richard Yan	67a13410fd	gate level sim changes	2024-06-09 15:15:01 -07:00
Hansung Kim	1bacbb839f	Add GPR_DUPLICATED to synthesis in VX_platform.vh	2024-06-09 14:00:34 -07:00
Hansung Kim	874a3bf194	Doc changes	2024-06-09 13:41:00 -07:00
Hansung Kim	12f8722dd5	Shush display	2024-06-03 13:04:09 -07:00
Hansung Kim	9caafb2d8a	tensor: Decode rd of macro-op to designate additional accumulator This is useful when you want to have the tensor core output to multiple accumulator registers, e.g. when doing outer product within the RF.	2024-05-31 19:17:56 -07:00
Hansung Kim	0ebbb8e223	tensor: Fix perf counter; comment out dpi	2024-05-31 00:32:32 -07:00
Hansung Kim	73293061ea	tensor: Enlarge metadata queue	2024-05-30 23:21:23 -07:00
Hansung Kim	52bb827a46	Handle BLOCK_SIZE != 1 in dispatch_unit + change ALU and FPU unit to use it as well	2024-05-30 23:20:21 -07:00
Hansung Kim	a02773eb92	Add more efficient dispatch_unit Instead of having a single candidate to be considered for dispatch (designated by 'batch_idx' counter), add a dispatch_unit variant that considerse all `ISSUE_WIDTH dispatch signals and picks a valid one in a round-robin manner. This increases core utilization significantly due to better overlapping of smem/tensor ops.	2024-05-30 21:55:42 -07:00
Hansung Kim	574cc0e5f0	tensor: Document configuring queue depths	2024-05-30 18:33:15 -07:00
Hansung Kim	83f9f6d84f	tensor: Fix sync for dpu warp queue as well	2024-05-30 18:22:36 -07:00
Hansung Kim	0a032ab400	tensor: Fix out-of-sync enqueue to dpu and metadata queue	2024-05-30 18:03:04 -07:00
Hansung Kim	97f37b1c75	tensor: Add commit stall injection for debugging	2024-05-30 18:00:26 -07:00
Hansung Kim	06e0f901ff	tensor: Handle backpressure from metadata queue	2024-05-30 17:34:49 -07:00
Hansung Kim	dfb2276657	tensor: Remove redundant issue queue outside pdu	2024-05-30 17:29:59 -07:00
Hansung Kim	2743d32bd2	tensor: Handle wid queue backpressure in dpu	2024-05-30 15:25:00 -07:00
Hansung Kim	2e2decc8b6	Shrink size of D_half latch	2024-05-30 12:46:45 -07:00
Hansung Kim	73a2f5781e	Do two-cycle compute with 1 FEDP per lane	2024-05-30 12:41:41 -07:00
Hansung Kim	35273b3d74	Set correct dpu hmma latency	2024-05-29 17:14:54 -07:00
Hansung Kim	5ed6041e33	tensor: Properly stall dpu upon commit backpressure & better-reasoned queue depths	2024-05-29 17:05:53 -07:00
Hansung Kim	f5a9ca5bf3	tensor: Enqueue both insts in pair to issue queue Otherwise the first-in-pair instructions can run ahead, latching their inputs for the next pair before the second-in-pair insts finish compute on the current one. Might introduce more frontend stalls, need more experimenting	2024-05-29 14:47:25 -07:00
Hansung Kim	e9df173745	tensor: Use chisel-generated dpu module	2024-05-29 13:34:25 -07:00
Hansung Kim	c03a5b070c	tensor: Issue queue for dpu to improve utilization	2024-05-27 18:25:10 -07:00
Hansung Kim	28f6cd59b5	tensor: Improve commit efficiency by decoupling dpu with fifo	2024-05-26 22:00:25 -07:00
Hansung Kim	864265bda5	tensor: Fix consecutive commits to write to same warp ... by splitting the pending_uops queue across warps.	2024-05-25 20:04:31 -07:00
Hansung Kim	5a95eba1f5	tensor: Clear c_*_tile before compute This didn't really cause any problem, but just to be sure.	2024-05-25 19:54:44 -07:00
Hansung Kim	8775458a8f	Stage half-operands per warp An easy solution to handle multiple concurrent warp operations by staging half-operands in their own per-warp register. This might increase area requirement by quite a bit. TODO: Commit is not being handled correctly yet	2024-05-25 19:09:56 -07:00
Hansung Kim	45d86b26a2	tensor: Add counter for dpu operations	2024-05-16 22:15:01 -07:00
Hansung Kim	5034d8d14b	tensor: Add buffer to hide 2cyc commit latency Since operand and commit throughput are the same (2 cycles), it is unnecessary to stall the dpu during the multi-cycle commit. This enables the dpu to operate at full throughput of 1 operand every 2 cycles.	2024-05-16 20:09:08 -07:00
Hansung Kim	317695a8d0	Add perf counters on LSU resp valid tmasks	2024-05-16 15:34:54 -07:00
Hansung Kim	89e7d65926	tensor: Add ready signal to enforce 1 warp occupancy Currently disabled as the timing behavior is already ~accurate	2024-05-16 15:34:54 -07:00
Hansung Kim	1a1094b2bb	tensor: Add dispatch unit to narrow to BLOCK_SIZE=1	2024-05-16 15:34:54 -07:00
Hansung Kim	9f9ec10960	tensor: Enable scaling NUM_THREADS by octets todo: lane-to-octet mapping is arbitrary atm	2024-05-16 15:34:50 -07:00

1 2 3 4 5 ...

2469 Commits