vortex

Author	SHA1	Message	Date
Hansung Kim	c88fd89f1f	tensor: Don't make initiate_valid depend on ready	2024-10-24 19:29:21 -07:00
Richard Yan	b64e53ff02	Merge branch 'rtl' of github.com:hansungk/vortex-private into rtl	2024-10-24 16:51:22 -07:00
Richard Yan	155cbb0abc	tc rf read port	2024-10-24 16:51:15 -07:00
Hansung Kim	40565de8cd	tensor: Fix initiate sync with meta queue when !commit.ready	2024-10-24 16:41:54 -07:00
Hansung Kim	3ebeb43568	tensor: Fix inflight_tensor decrement, add under/overflow checks	2024-10-24 14:36:29 -07:00
Hansung Kim	8337488ed3	tensor: Don't check invalid writeback reg for ghost writes	2024-10-24 14:36:18 -07:00
Hansung Kim	e855a47295	Add missing commit_if.tensor bit inits	2024-10-24 13:28:30 -07:00
Hansung Kim	c77a25c968	tensor: Add missing HOPPER guard	2024-10-23 20:33:45 -07:00
Hansung Kim	78df981366	tensor: Simply metadata queue Enqueue all different-warp reqs into the queue. There is a slight chance that an HGMMA_WAIT might be blocked from commit when there are multiple different-warp HGMMAs blocking the dequeue end, but it should be uncommon.	2024-10-22 22:01:18 -07:00
Hansung Kim	69cbbdd89b	tensor: Consider inflight ops for HGMMA blocking This allows for back-to-back issue of HGMMA past the scoreboard, which helps to minimize downtime in DPU activity in-between operations. HGMMA_WAIT now only unblocks when all previous HGMMAs have finished writeback.	2024-10-22 21:32:33 -07:00
Hansung Kim	98eb7cb594	tensor: Block both HGMMA/HGMMA_WAIT at scoreboard If we let back-to-back HGMMAs pass at scoreboard, we can't accurately keep track of the busy state of the tensor core and block WAITs accordingly. TODO: Distinguish "ready-to-fire" from "ready-to-use-writeback".	2024-10-22 21:10:55 -07:00
Hansung Kim	83979c3341	tensor: Fully connect writeback IO	2024-10-22 20:17:00 -07:00
Hansung Kim	47dff74d3a	tensor: Fix commit/metadata logic for HGMMA Block HGMMA commit until previous ones are all done; always commit HGMMA_WAIT after it passes the scoreboard.	2024-10-22 20:01:37 -07:00
Hansung Kim	3abaaff16f	tensor: Fix tag and data assignment for p0/p1 bus	2024-10-22 17:47:04 -07:00
Hansung Kim	8a8f682194	tensor: Bore smem IO from core to tensor core	2024-10-22 17:42:30 -07:00
Hansung Kim	9131558950	tensor: Connect Chisel-generated TensorCoreDecoupled module Elaborates, but most of the IOs are tied to fake.	2024-10-22 15:16:24 -07:00
Hansung Kim	32ccdeef01	Merge branch 'tensor-decoupled' into rtl	2024-10-21 22:57:07 -07:00
Hansung Kim	0f06afc3ef	Update doc	2024-10-21 22:37:20 -07:00
Richard Yan	cde8da1f3b	add tag to tc smem interface	2024-10-17 14:48:39 -07:00
Hansung Kim	4dcbc31a88	tensor: Separate async commit from tensor commit With this we can prioritize commit of the async hgmma instructions over the "ghost" commits from the TC.	2024-10-11 21:32:20 -07:00
Hansung Kim	717fe7ff29	tensor: Fix FSM when commit not ready	2024-10-11 20:24:31 -07:00
Hansung Kim	2934b1bd94	tensor: Split execution module from pipeline logic	2024-10-11 20:09:09 -07:00
Hansung Kim	f7f23e0c05	tensor: Doc update	2024-10-11 18:00:36 -07:00
Hansung Kim	42b9d23f83	tensor: Write release logic for hgmma Upon completion of an op, tensor_core_hopper sends a "ghost" commit signal down the pipeline with the `wb` and `tensor` bit set in commit_if. The scoreboard receives this signal via writeback_if and resets the inuse_tensor status bit back to zero, which unblocks the HGMMA_WAIT instruction.	2024-10-11 17:58:44 -07:00
Hansung Kim	408a9b5d2a	tensor: Write stall logic for hgmma_wait HGMMA_WAIT instruction stalls at issue when inuse_tensor is set, which is done by the previous HGMMA insn. Currently inuse_tensor is never set back to zero.	2024-10-11 17:18:01 -07:00
Hansung Kim	72f9dedce3	tensor: Disable micro-ops for hopper Have an uarch FSM handle the stepping mechanism entirely.	2024-10-11 15:59:31 -07:00
Hansung Kim	100d69ef21	Doc update on accumulator regs	2024-10-11 15:47:58 -07:00
Hansung Kim	d9ad4809ec	Add 'tensor' bit to commit_if and writeback_if For use in the asynchronous tensor instruction. When 1'b1, sets/unsets the inuse_tensor status bit in the scoreboard to signal kickoff/completion of the asynchronous tensor op.	2024-10-11 15:42:25 -07:00
Hansung Kim	58c9761829	Revert decode change for hopper Share the same insn as non-hopper TC.	2024-10-09 21:53:04 -07:00
Hansung Kim	7ab14445f0	tensor: Test many-commit per execute with an FSM Trick is to set commit_if.data.eop to 0, since the commit module only signals instruction completion to VX_schedule if the eop bit is 1. Otherwise it underflows the pending_instr buffer. The same eop trick works for VX_scoreboard, which works around the invalid rd writeback error.	2024-10-07 21:29:44 -07:00
Hansung Kim	e8ca4677df	Remove old code for pending_instr underflow fix	2024-10-07 20:21:35 -07:00
Hansung Kim	4cac1adf7d	Add dummy code for decoupled Hopper tensor core Define EXT_T_HOPPER that, when EXT_T_ENABLE is defined, distinguishes whether to instantiate core-coupled Volta-style or decoupled Hopper-style Tensor Core.	2024-10-07 17:10:59 -07:00
Richard Yan	8bf7f39f04	add tensor core memory interface	2024-10-07 02:56:38 -07:00
Hansung Kim	da54162241	tensor: Add FP16 parameter and expose to VX_core	2024-09-10 15:32:17 -07:00
Hansung Kim	a968bdd69b	tensor: Fix HALF_PRECISION to 1	2024-09-08 01:43:21 -07:00
Richard Yan	3f8c28c7d6	sync rf, x0 fix	2024-09-05 16:49:05 -07:00
Hansung Kim	2b1a9b7c16	tensor: Rename & docs	2024-08-23 16:21:45 -07:00
Hansung Kim	45f6ae5aad	tensor: Doc comments	2024-08-20 14:46:40 -07:00
Hansung Kim	20faf87b80	tensor: Rename halves_buf to reduce confusion	2024-08-19 16:42:02 -07:00
Hansung Kim	789d873e19	Disable reduce_unit for timing optimization Currently the critical path @1GHz is found at the accumulators inside reduce_unit.	2024-08-16 15:28:56 -07:00
Hansung Kim	715539b2c3	Guard trace printf in mem_scheduler for synthesis	2024-08-15 06:09:39 -07:00
Hansung Kim	119c52004e	Enable LSU dedup in VX_platform.vh	2024-08-15 13:39:43 -07:00
Hansung Kim	1410b39143	Disable trace during the very start of simulation	2024-08-13 16:01:29 -07:00
Hansung Kim	d39e24643d	tensor: Parameterize fedp for fp16/fp32	2024-08-12 20:01:56 -07:00
Hansung Kim	15e93e01d8	tensor: Split packed fp16 and wire correctly to DPU	2024-08-07 11:16:38 -07:00
Hansung Kim	d4d18c2823	tensor: spurious assert, doc, remove unused param	2024-07-29 16:06:55 -07:00
Hansung Kim	4e0dcdadac	tensor: Share B operand buffer between threadgroups The two threadgroups use the same B fragment, so no need to duplicately store them in the operand buffer. To do this, pull the operand buffer out of the threadgroups to the octet-level.	2024-07-27 20:42:08 -07:00
Hansung Kim	7ad3f64528	tensor: Remove old ready_reg DPI code	2024-07-27 17:36:02 -07:00
Hansung Kim	01f6024a76	tensor: Split flops into structural module to get separate area/power numbers in hierarchical	2024-07-26 16:26:48 -07:00
Hansung Kim	7f43bab0aa	tensor: Parameterize result buffer depth	2024-07-25 16:31:45 -07:00

1 2 3 4 5 ...

2469 Commits