vortex

Author	SHA1	Message	Date
Hansung Kim	2743d32bd2	tensor: Handle wid queue backpressure in dpu	2024-05-30 15:25:00 -07:00
Hansung Kim	2e2decc8b6	Shrink size of D_half latch	2024-05-30 12:46:45 -07:00
Hansung Kim	73a2f5781e	Do two-cycle compute with 1 FEDP per lane	2024-05-30 12:41:41 -07:00
Hansung Kim	35273b3d74	Set correct dpu hmma latency	2024-05-29 17:14:54 -07:00
Hansung Kim	5ed6041e33	tensor: Properly stall dpu upon commit backpressure & better-reasoned queue depths	2024-05-29 17:05:53 -07:00
Hansung Kim	f5a9ca5bf3	tensor: Enqueue both insts in pair to issue queue Otherwise the first-in-pair instructions can run ahead, latching their inputs for the next pair before the second-in-pair insts finish compute on the current one. Might introduce more frontend stalls, need more experimenting	2024-05-29 14:47:25 -07:00
Hansung Kim	e9df173745	tensor: Use chisel-generated dpu module	2024-05-29 13:34:25 -07:00
Hansung Kim	c03a5b070c	tensor: Issue queue for dpu to improve utilization	2024-05-27 18:25:10 -07:00
Hansung Kim	28f6cd59b5	tensor: Improve commit efficiency by decoupling dpu with fifo	2024-05-26 22:00:25 -07:00
Hansung Kim	864265bda5	tensor: Fix consecutive commits to write to same warp ... by splitting the pending_uops queue across warps.	2024-05-25 20:04:31 -07:00
Hansung Kim	5a95eba1f5	tensor: Clear c_*_tile before compute This didn't really cause any problem, but just to be sure.	2024-05-25 19:54:44 -07:00
Hansung Kim	8775458a8f	Stage half-operands per warp An easy solution to handle multiple concurrent warp operations by staging half-operands in their own per-warp register. This might increase area requirement by quite a bit. TODO: Commit is not being handled correctly yet	2024-05-25 19:09:56 -07:00
Hansung Kim	45d86b26a2	tensor: Add counter for dpu operations	2024-05-16 22:15:01 -07:00
Hansung Kim	5034d8d14b	tensor: Add buffer to hide 2cyc commit latency Since operand and commit throughput are the same (2 cycles), it is unnecessary to stall the dpu during the multi-cycle commit. This enables the dpu to operate at full throughput of 1 operand every 2 cycles.	2024-05-16 20:09:08 -07:00
Hansung Kim	317695a8d0	Add perf counters on LSU resp valid tmasks	2024-05-16 15:34:54 -07:00
Hansung Kim	89e7d65926	tensor: Add ready signal to enforce 1 warp occupancy Currently disabled as the timing behavior is already ~accurate	2024-05-16 15:34:54 -07:00
Hansung Kim	1a1094b2bb	tensor: Add dispatch unit to narrow to BLOCK_SIZE=1	2024-05-16 15:34:54 -07:00
Hansung Kim	9f9ec10960	tensor: Enable scaling NUM_THREADS by octets todo: lane-to-octet mapping is arbitrary atm	2024-05-16 15:34:50 -07:00
Richard Yan	d624b3e50a	store fencing, large smem, fix tensor core for firesim	2024-05-15 21:45:48 -07:00
Richard Yan	0dd5335851	fix merge error once again	2024-05-08 11:31:43 -07:00
Richard Yan	16dfae7d3f	Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl	2024-05-08 11:28:39 -07:00
Richard Yan	629279977e	fix merge error	2024-05-08 11:28:36 -07:00
Hansung Kim	be748b109a	Fix faulty merge on syn-only flags	2024-05-07 18:37:25 -07:00
Hansung Kim	f71e705d53	Revert to old LSUQ_SIZE	2024-05-07 16:23:32 -07:00
Richard Yan	4aad161739	Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl	2024-05-07 14:00:31 -07:00
Richard Yan	37616f3334	firesim modifications	2024-05-07 13:59:25 -07:00
Richard Yan	c9a3eaad79	accelerator cisc	2024-05-07 13:58:32 -07:00
Richard Yan	14d1552f08	potential deadlock	2024-05-07 13:56:51 -07:00
Richard Yan	1e5dff52c1	shrink queue sizes	2024-05-07 13:54:23 -07:00
Hansung Kim	868bbdb15e	tensor: more doc	2024-05-07 13:54:10 -07:00
Richard Yan	b70df8cbc9	proper srams	2024-05-07 13:52:07 -07:00
Hansung Kim	9c1d797250	tensor: add missing }	2024-05-05 18:36:15 -07:00
Hansung Kim	fb626ee21c	tensor: doc	2024-05-05 18:35:52 -07:00
Hansung Kim	9ea291eea2	Merge remote-tracking branch 'origin/tensor_core' into rtl	2024-05-05 17:03:57 -07:00
joshua	5bd25985c6	i kinda forgot most of changes	2024-05-04 23:01:47 -07:00
Hansung Kim	1c7acab160	tensor: Fix lint errors	2024-05-03 15:43:02 -07:00
Hansung Kim	5a0ee98a61	Remove duplicate port connection	2024-05-03 15:07:24 -07:00
Hansung Kim	bc45c40231	tensor: Rename half.hpp -> half.h addResource() thinks it's a Verilog source file if it ends in .hpp, for some reason.	2024-05-02 16:17:20 -07:00
Hansung Kim	c4b94e4f2c	Wrap hardcoded configs with SYNTHESIS	2024-05-02 16:17:04 -07:00
Hansung Kim	c4d71bc3d6	tensor: Fix multiple driver error on VCS	2024-05-01 21:40:48 -07:00
Hansung Kim	7fc5b6a374	tensor: Fix elaboration error on VCS	2024-05-01 21:40:45 -07:00
Hansung Kim	675e8ea130	Merge branch 'tensor_core' into rtl	2024-05-01 16:18:14 -07:00
Hansung Kim	9a688a05b1	Add (unconnected) FPU perf counters mainly for debugging	2024-04-29 15:20:55 -07:00
Hansung Kim	100fbbc048	Increase FPUQ_SIZE This should at least be FMA_LATENCY to not bottleneck things.	2024-04-29 15:19:48 -07:00
Richard Yan	85213d2876	synthesizable design	2024-04-17 18:05:51 -07:00
Richard Yan	17fd29c114	Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl	2024-04-16 23:03:04 -07:00
Richard Yan	8de5470da4	round robin warp scheduling	2024-04-16 23:03:00 -07:00
Hansung Kim	217bc189da	ifdef-guard VX_operand* to enable including both in Chisel	2024-04-15 22:06:47 -07:00
Hansung Kim	4752b86858	Limit NUM_SFU_LANES to 4 Simulation seems to not like SFU_LANES=8; dial back for now	2024-04-15 21:48:59 -07:00
Hansung Kim	978b1fe2d0	Add operands stage with duplicated RF for rs1/2/3	2024-04-15 16:45:59 -07:00

1 2 3 4 5 ...

1187 Commits