Commit Graph

1187 Commits

Author SHA1 Message Date
Hansung Kim
2743d32bd2 tensor: Handle wid queue backpressure in dpu 2024-05-30 15:25:00 -07:00
Hansung Kim
2e2decc8b6 Shrink size of D_half latch 2024-05-30 12:46:45 -07:00
Hansung Kim
73a2f5781e Do two-cycle compute with 1 FEDP per lane 2024-05-30 12:41:41 -07:00
Hansung Kim
35273b3d74 Set correct dpu hmma latency 2024-05-29 17:14:54 -07:00
Hansung Kim
5ed6041e33 tensor: Properly stall dpu upon commit backpressure
& better-reasoned queue depths
2024-05-29 17:05:53 -07:00
Hansung Kim
f5a9ca5bf3 tensor: Enqueue both insts in pair to issue queue
Otherwise the first-in-pair instructions can run ahead, latching their
inputs for the next pair before the second-in-pair insts finish compute
on the current one.  Might introduce more frontend stalls, need more
experimenting
2024-05-29 14:47:25 -07:00
Hansung Kim
e9df173745 tensor: Use chisel-generated dpu module 2024-05-29 13:34:25 -07:00
Hansung Kim
c03a5b070c tensor: Issue queue for dpu to improve utilization 2024-05-27 18:25:10 -07:00
Hansung Kim
28f6cd59b5 tensor: Improve commit efficiency by decoupling dpu with fifo 2024-05-26 22:00:25 -07:00
Hansung Kim
864265bda5 tensor: Fix consecutive commits to write to same warp
... by splitting the pending_uops queue across warps.
2024-05-25 20:04:31 -07:00
Hansung Kim
5a95eba1f5 tensor: Clear c_*_tile before compute
This didn't really cause any problem, but just to be sure.
2024-05-25 19:54:44 -07:00
Hansung Kim
8775458a8f Stage half-operands per warp
An easy solution to handle multiple concurrent warp operations by
staging half-operands in their own per-warp register.  This might
increase area requirement by quite a bit.

TODO: Commit is not being handled correctly yet
2024-05-25 19:09:56 -07:00
Hansung Kim
45d86b26a2 tensor: Add counter for dpu operations 2024-05-16 22:15:01 -07:00
Hansung Kim
5034d8d14b tensor: Add buffer to hide 2cyc commit latency
Since operand and commit throughput are the same (2 cycles), it is
unnecessary to stall the dpu during the multi-cycle commit.
This enables the dpu to operate at full throughput of 1 operand every 2
cycles.
2024-05-16 20:09:08 -07:00
Hansung Kim
317695a8d0 Add perf counters on LSU resp valid tmasks 2024-05-16 15:34:54 -07:00
Hansung Kim
89e7d65926 tensor: Add ready signal to enforce 1 warp occupancy
Currently disabled as the timing behavior is already ~accurate
2024-05-16 15:34:54 -07:00
Hansung Kim
1a1094b2bb tensor: Add dispatch unit to narrow to BLOCK_SIZE=1 2024-05-16 15:34:54 -07:00
Hansung Kim
9f9ec10960 tensor: Enable scaling NUM_THREADS by octets
todo: lane-to-octet mapping is arbitrary atm
2024-05-16 15:34:50 -07:00
Richard Yan
d624b3e50a store fencing, large smem, fix tensor core for firesim 2024-05-15 21:45:48 -07:00
Richard Yan
0dd5335851 fix merge error once again 2024-05-08 11:31:43 -07:00
Richard Yan
16dfae7d3f Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl 2024-05-08 11:28:39 -07:00
Richard Yan
629279977e fix merge error 2024-05-08 11:28:36 -07:00
Hansung Kim
be748b109a Fix faulty merge on syn-only flags 2024-05-07 18:37:25 -07:00
Hansung Kim
f71e705d53 Revert to old LSUQ_SIZE 2024-05-07 16:23:32 -07:00
Richard Yan
4aad161739 Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl 2024-05-07 14:00:31 -07:00
Richard Yan
37616f3334 firesim modifications 2024-05-07 13:59:25 -07:00
Richard Yan
c9a3eaad79 accelerator cisc 2024-05-07 13:58:32 -07:00
Richard Yan
14d1552f08 potential deadlock 2024-05-07 13:56:51 -07:00
Richard Yan
1e5dff52c1 shrink queue sizes 2024-05-07 13:54:23 -07:00
Hansung Kim
868bbdb15e tensor: more doc 2024-05-07 13:54:10 -07:00
Richard Yan
b70df8cbc9 proper srams 2024-05-07 13:52:07 -07:00
Hansung Kim
9c1d797250 tensor: add missing } 2024-05-05 18:36:15 -07:00
Hansung Kim
fb626ee21c tensor: doc 2024-05-05 18:35:52 -07:00
Hansung Kim
9ea291eea2 Merge remote-tracking branch 'origin/tensor_core' into rtl 2024-05-05 17:03:57 -07:00
joshua
5bd25985c6 i kinda forgot most of changes 2024-05-04 23:01:47 -07:00
Hansung Kim
1c7acab160 tensor: Fix lint errors 2024-05-03 15:43:02 -07:00
Hansung Kim
5a0ee98a61 Remove duplicate port connection 2024-05-03 15:07:24 -07:00
Hansung Kim
bc45c40231 tensor: Rename half.hpp -> half.h
addResource() thinks it's a Verilog source file if it ends in .hpp, for
some reason.
2024-05-02 16:17:20 -07:00
Hansung Kim
c4b94e4f2c Wrap hardcoded configs with SYNTHESIS 2024-05-02 16:17:04 -07:00
Hansung Kim
c4d71bc3d6 tensor: Fix multiple driver error on VCS 2024-05-01 21:40:48 -07:00
Hansung Kim
7fc5b6a374 tensor: Fix elaboration error on VCS 2024-05-01 21:40:45 -07:00
Hansung Kim
675e8ea130 Merge branch 'tensor_core' into rtl 2024-05-01 16:18:14 -07:00
Hansung Kim
9a688a05b1 Add (unconnected) FPU perf counters
mainly for debugging
2024-04-29 15:20:55 -07:00
Hansung Kim
100fbbc048 Increase FPUQ_SIZE
This should at least be FMA_LATENCY to not bottleneck things.
2024-04-29 15:19:48 -07:00
Richard Yan
85213d2876 synthesizable design 2024-04-17 18:05:51 -07:00
Richard Yan
17fd29c114 Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl 2024-04-16 23:03:04 -07:00
Richard Yan
8de5470da4 round robin warp scheduling 2024-04-16 23:03:00 -07:00
Hansung Kim
217bc189da ifdef-guard VX_operand* to enable including both in Chisel 2024-04-15 22:06:47 -07:00
Hansung Kim
4752b86858 Limit NUM_SFU_LANES to 4
Simulation seems to not like SFU_LANES=8; dial back for now
2024-04-15 21:48:59 -07:00
Hansung Kim
978b1fe2d0 Add operands stage with duplicated RF for rs1/2/3 2024-04-15 16:45:59 -07:00