Hansung Kim
f3afd4a6f9
Hardcode NUM_THREADS/.. only when SYNTHESIS
...
They're duplicately set in VX_config.vh which is confusing.
2024-07-23 15:15:16 -07:00
Richard Yan
ed247e21bb
Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl
2024-07-20 23:37:58 -07:00
Richard Yan
7d422cc9b0
pre-submission changes
2024-07-20 23:33:56 -07:00
Hansung Kim
14b811f334
Update doc
2024-07-19 16:39:05 -07:00
Hansung Kim
4b093e3ff7
tensor: Mark PARTIAL_BW on power impact
2024-06-26 14:25:26 -07:00
Hansung Kim
9a6fe79bd3
VX_operands_dup: Add counter for RF read/write accesses
2024-06-22 16:35:23 -07:00
Hansung Kim
fb973a51b6
core_wrapper: Only terminate when core 0 is finished; more slack time
2024-06-22 16:34:42 -07:00
Hansung Kim
46fe1897bf
VX_platform.vh: Undefine FIRESIM by default
2024-06-22 16:34:08 -07:00
Hansung Kim
d4f6f8a257
Set NUM_ALU_BLOCKS=2, NUM_FPU_BLOCKS=1
2024-06-22 16:33:42 -07:00
Hansung Kim
a9b75dd492
Set default to 4cores/8barriers in VX_config.{h,vh}
2024-06-12 20:51:15 -07:00
Hansung Kim
86deaa8e07
Give some slack time for other cores to finish
2024-06-12 09:47:21 -07:00
Richard Yan
1833e8a176
Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl
2024-06-12 02:17:01 -07:00
Richard Yan
7947df8a6c
config change, move ucode
2024-06-12 02:15:08 -07:00
Hansung Kim
5218292b6f
core_wrapper: Use finished and !reset to determine termination
2024-06-11 16:28:05 -07:00
Hansung Kim
de10d5a957
Don't print from mem_scheduler in reset
2024-06-09 22:44:33 -07:00
Hansung Kim
5d5e4a468c
Merge remote-tracking branch 'refs/remotes/origin/rtl' into rtl
2024-06-09 15:58:32 -07:00
Richard Yan
a47389fc0e
Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl
2024-06-09 15:15:31 -07:00
Richard Yan
67a13410fd
gate level sim changes
2024-06-09 15:15:01 -07:00
Hansung Kim
1bacbb839f
Add GPR_DUPLICATED to synthesis in VX_platform.vh
2024-06-09 14:00:34 -07:00
Hansung Kim
874a3bf194
Doc changes
2024-06-09 13:41:00 -07:00
Hansung Kim
12f8722dd5
Shush display
2024-06-03 13:04:09 -07:00
Hansung Kim
9caafb2d8a
tensor: Decode rd of macro-op to designate additional accumulator
...
This is useful when you want to have the tensor core output to multiple
accumulator registers, e.g. when doing outer product within the RF.
2024-05-31 19:17:56 -07:00
Hansung Kim
0ebbb8e223
tensor: Fix perf counter; comment out dpi
2024-05-31 00:32:32 -07:00
Hansung Kim
73293061ea
tensor: Enlarge metadata queue
2024-05-30 23:21:23 -07:00
Hansung Kim
52bb827a46
Handle BLOCK_SIZE != 1 in dispatch_unit
...
+ change ALU and FPU unit to use it as well
2024-05-30 23:20:21 -07:00
Hansung Kim
a02773eb92
Add more efficient dispatch_unit
...
Instead of having a single candidate to be considered for dispatch
(designated by 'batch_idx' counter), add a dispatch_unit variant that
considerse all `ISSUE_WIDTH dispatch signals and picks a valid one in a
round-robin manner.
This increases core utilization significantly due to better overlapping
of smem/tensor ops.
2024-05-30 21:55:42 -07:00
Hansung Kim
574cc0e5f0
tensor: Document configuring queue depths
2024-05-30 18:33:15 -07:00
Hansung Kim
83f9f6d84f
tensor: Fix sync for dpu warp queue as well
2024-05-30 18:22:36 -07:00
Hansung Kim
0a032ab400
tensor: Fix out-of-sync enqueue to dpu and metadata queue
2024-05-30 18:03:04 -07:00
Hansung Kim
97f37b1c75
tensor: Add commit stall injection for debugging
2024-05-30 18:00:26 -07:00
Hansung Kim
06e0f901ff
tensor: Handle backpressure from metadata queue
2024-05-30 17:34:49 -07:00
Hansung Kim
dfb2276657
tensor: Remove redundant issue queue outside pdu
2024-05-30 17:29:59 -07:00
Hansung Kim
2743d32bd2
tensor: Handle wid queue backpressure in dpu
2024-05-30 15:25:00 -07:00
Hansung Kim
2e2decc8b6
Shrink size of D_half latch
2024-05-30 12:46:45 -07:00
Hansung Kim
73a2f5781e
Do two-cycle compute with 1 FEDP per lane
2024-05-30 12:41:41 -07:00
Hansung Kim
35273b3d74
Set correct dpu hmma latency
2024-05-29 17:14:54 -07:00
Hansung Kim
5ed6041e33
tensor: Properly stall dpu upon commit backpressure
...
& better-reasoned queue depths
2024-05-29 17:05:53 -07:00
Hansung Kim
f5a9ca5bf3
tensor: Enqueue both insts in pair to issue queue
...
Otherwise the first-in-pair instructions can run ahead, latching their
inputs for the next pair before the second-in-pair insts finish compute
on the current one. Might introduce more frontend stalls, need more
experimenting
2024-05-29 14:47:25 -07:00
Hansung Kim
e9df173745
tensor: Use chisel-generated dpu module
2024-05-29 13:34:25 -07:00
Hansung Kim
c03a5b070c
tensor: Issue queue for dpu to improve utilization
2024-05-27 18:25:10 -07:00
Hansung Kim
28f6cd59b5
tensor: Improve commit efficiency by decoupling dpu with fifo
2024-05-26 22:00:25 -07:00
Hansung Kim
864265bda5
tensor: Fix consecutive commits to write to same warp
...
... by splitting the pending_uops queue across warps.
2024-05-25 20:04:31 -07:00
Hansung Kim
5a95eba1f5
tensor: Clear c_*_tile before compute
...
This didn't really cause any problem, but just to be sure.
2024-05-25 19:54:44 -07:00
Hansung Kim
8775458a8f
Stage half-operands per warp
...
An easy solution to handle multiple concurrent warp operations by
staging half-operands in their own per-warp register. This might
increase area requirement by quite a bit.
TODO: Commit is not being handled correctly yet
2024-05-25 19:09:56 -07:00
Hansung Kim
45d86b26a2
tensor: Add counter for dpu operations
2024-05-16 22:15:01 -07:00
Hansung Kim
5034d8d14b
tensor: Add buffer to hide 2cyc commit latency
...
Since operand and commit throughput are the same (2 cycles), it is
unnecessary to stall the dpu during the multi-cycle commit.
This enables the dpu to operate at full throughput of 1 operand every 2
cycles.
2024-05-16 20:09:08 -07:00
Hansung Kim
317695a8d0
Add perf counters on LSU resp valid tmasks
2024-05-16 15:34:54 -07:00
Hansung Kim
89e7d65926
tensor: Add ready signal to enforce 1 warp occupancy
...
Currently disabled as the timing behavior is already ~accurate
2024-05-16 15:34:54 -07:00
Hansung Kim
1a1094b2bb
tensor: Add dispatch unit to narrow to BLOCK_SIZE=1
2024-05-16 15:34:54 -07:00
Hansung Kim
9f9ec10960
tensor: Enable scaling NUM_THREADS by octets
...
todo: lane-to-octet mapping is arbitrary atm
2024-05-16 15:34:50 -07:00