vortex

Author	SHA1	Message	Date
Hansung Kim	4dcbc31a88	tensor: Separate async commit from tensor commit With this we can prioritize commit of the async hgmma instructions over the "ghost" commits from the TC.	2024-10-11 21:32:20 -07:00
Hansung Kim	717fe7ff29	tensor: Fix FSM when commit not ready	2024-10-11 20:24:31 -07:00
Hansung Kim	2934b1bd94	tensor: Split execution module from pipeline logic	2024-10-11 20:09:09 -07:00
Hansung Kim	f7f23e0c05	tensor: Doc update	2024-10-11 18:00:36 -07:00
Hansung Kim	42b9d23f83	tensor: Write release logic for hgmma Upon completion of an op, tensor_core_hopper sends a "ghost" commit signal down the pipeline with the `wb` and `tensor` bit set in commit_if. The scoreboard receives this signal via writeback_if and resets the inuse_tensor status bit back to zero, which unblocks the HGMMA_WAIT instruction.	2024-10-11 17:58:44 -07:00
Hansung Kim	408a9b5d2a	tensor: Write stall logic for hgmma_wait HGMMA_WAIT instruction stalls at issue when inuse_tensor is set, which is done by the previous HGMMA insn. Currently inuse_tensor is never set back to zero.	2024-10-11 17:18:01 -07:00
Hansung Kim	72f9dedce3	tensor: Disable micro-ops for hopper Have an uarch FSM handle the stepping mechanism entirely.	2024-10-11 15:59:31 -07:00
Hansung Kim	100d69ef21	Doc update on accumulator regs	2024-10-11 15:47:58 -07:00
Hansung Kim	d9ad4809ec	Add 'tensor' bit to commit_if and writeback_if For use in the asynchronous tensor instruction. When 1'b1, sets/unsets the inuse_tensor status bit in the scoreboard to signal kickoff/completion of the asynchronous tensor op.	2024-10-11 15:42:25 -07:00
Hansung Kim	58c9761829	Revert decode change for hopper Share the same insn as non-hopper TC.	2024-10-09 21:53:04 -07:00
Hansung Kim	7ab14445f0	tensor: Test many-commit per execute with an FSM Trick is to set commit_if.data.eop to 0, since the commit module only signals instruction completion to VX_schedule if the eop bit is 1. Otherwise it underflows the pending_instr buffer. The same eop trick works for VX_scoreboard, which works around the invalid rd writeback error.	2024-10-07 21:29:44 -07:00
Hansung Kim	e8ca4677df	Remove old code for pending_instr underflow fix	2024-10-07 20:21:35 -07:00
Hansung Kim	4cac1adf7d	Add dummy code for decoupled Hopper tensor core Define EXT_T_HOPPER that, when EXT_T_ENABLE is defined, distinguishes whether to instantiate core-coupled Volta-style or decoupled Hopper-style Tensor Core.	2024-10-07 17:10:59 -07:00
Hansung Kim	da54162241	tensor: Add FP16 parameter and expose to VX_core	2024-09-10 15:32:17 -07:00
Richard Yan	3f8c28c7d6	sync rf, x0 fix	2024-09-05 16:49:05 -07:00
Hansung Kim	2b1a9b7c16	tensor: Rename & docs	2024-08-23 16:21:45 -07:00
Hansung Kim	45f6ae5aad	tensor: Doc comments	2024-08-20 14:46:40 -07:00
Hansung Kim	20faf87b80	tensor: Rename halves_buf to reduce confusion	2024-08-19 16:42:02 -07:00
Hansung Kim	789d873e19	Disable reduce_unit for timing optimization Currently the critical path @1GHz is found at the accumulators inside reduce_unit.	2024-08-16 15:28:56 -07:00
Hansung Kim	1410b39143	Disable trace during the very start of simulation	2024-08-13 16:01:29 -07:00
Hansung Kim	d4d18c2823	tensor: spurious assert, doc, remove unused param	2024-07-29 16:06:55 -07:00
Hansung Kim	4e0dcdadac	tensor: Share B operand buffer between threadgroups The two threadgroups use the same B fragment, so no need to duplicately store them in the operand buffer. To do this, pull the operand buffer out of the threadgroups to the octet-level.	2024-07-27 20:42:08 -07:00
Hansung Kim	01f6024a76	tensor: Split flops into structural module to get separate area/power numbers in hierarchical	2024-07-26 16:26:48 -07:00
Hansung Kim	7f43bab0aa	tensor: Parameterize result buffer depth	2024-07-25 16:31:45 -07:00
Hansung Kim	14b811f334	Update doc	2024-07-19 16:39:05 -07:00
Hansung Kim	4b093e3ff7	tensor: Mark PARTIAL_BW on power impact	2024-06-26 14:25:26 -07:00
Hansung Kim	9a6fe79bd3	VX_operands_dup: Add counter for RF read/write accesses	2024-06-22 16:35:23 -07:00
Hansung Kim	86deaa8e07	Give some slack time for other cores to finish	2024-06-12 09:47:21 -07:00
Richard Yan	7947df8a6c	config change, move ucode	2024-06-12 02:15:08 -07:00
Hansung Kim	874a3bf194	Doc changes	2024-06-09 13:41:00 -07:00
Hansung Kim	12f8722dd5	Shush display	2024-06-03 13:04:09 -07:00
Hansung Kim	9caafb2d8a	tensor: Decode rd of macro-op to designate additional accumulator This is useful when you want to have the tensor core output to multiple accumulator registers, e.g. when doing outer product within the RF.	2024-05-31 19:17:56 -07:00
Hansung Kim	0ebbb8e223	tensor: Fix perf counter; comment out dpi	2024-05-31 00:32:32 -07:00
Hansung Kim	73293061ea	tensor: Enlarge metadata queue	2024-05-30 23:21:23 -07:00
Hansung Kim	52bb827a46	Handle BLOCK_SIZE != 1 in dispatch_unit + change ALU and FPU unit to use it as well	2024-05-30 23:20:21 -07:00
Hansung Kim	a02773eb92	Add more efficient dispatch_unit Instead of having a single candidate to be considered for dispatch (designated by 'batch_idx' counter), add a dispatch_unit variant that considerse all `ISSUE_WIDTH dispatch signals and picks a valid one in a round-robin manner. This increases core utilization significantly due to better overlapping of smem/tensor ops.	2024-05-30 21:55:42 -07:00
Hansung Kim	574cc0e5f0	tensor: Document configuring queue depths	2024-05-30 18:33:15 -07:00
Hansung Kim	83f9f6d84f	tensor: Fix sync for dpu warp queue as well	2024-05-30 18:22:36 -07:00
Hansung Kim	0a032ab400	tensor: Fix out-of-sync enqueue to dpu and metadata queue	2024-05-30 18:03:04 -07:00
Hansung Kim	97f37b1c75	tensor: Add commit stall injection for debugging	2024-05-30 18:00:26 -07:00
Hansung Kim	06e0f901ff	tensor: Handle backpressure from metadata queue	2024-05-30 17:34:49 -07:00
Hansung Kim	dfb2276657	tensor: Remove redundant issue queue outside pdu	2024-05-30 17:29:59 -07:00
Hansung Kim	2743d32bd2	tensor: Handle wid queue backpressure in dpu	2024-05-30 15:25:00 -07:00
Hansung Kim	5ed6041e33	tensor: Properly stall dpu upon commit backpressure & better-reasoned queue depths	2024-05-29 17:05:53 -07:00
Hansung Kim	f5a9ca5bf3	tensor: Enqueue both insts in pair to issue queue Otherwise the first-in-pair instructions can run ahead, latching their inputs for the next pair before the second-in-pair insts finish compute on the current one. Might introduce more frontend stalls, need more experimenting	2024-05-29 14:47:25 -07:00
Hansung Kim	c03a5b070c	tensor: Issue queue for dpu to improve utilization	2024-05-27 18:25:10 -07:00
Hansung Kim	28f6cd59b5	tensor: Improve commit efficiency by decoupling dpu with fifo	2024-05-26 22:00:25 -07:00
Hansung Kim	864265bda5	tensor: Fix consecutive commits to write to same warp ... by splitting the pending_uops queue across warps.	2024-05-25 20:04:31 -07:00
Hansung Kim	8775458a8f	Stage half-operands per warp An easy solution to handle multiple concurrent warp operations by staging half-operands in their own per-warp register. This might increase area requirement by quite a bit. TODO: Commit is not being handled correctly yet	2024-05-25 19:09:56 -07:00
Hansung Kim	45d86b26a2	tensor: Add counter for dpu operations	2024-05-16 22:15:01 -07:00

1 2 3

139 Commits