Commit Graph

1141 Commits

Author SHA1 Message Date
Hansung Kim
d9ad4809ec Add 'tensor' bit to commit_if and writeback_if
For use in the asynchronous tensor instruction.  When 1'b1, sets/unsets
the inuse_tensor status bit in the scoreboard to signal
kickoff/completion of the asynchronous tensor op.
2024-10-11 15:42:25 -07:00
Hansung Kim
58c9761829 Revert decode change for hopper
Share the same insn as non-hopper TC.
2024-10-09 21:53:04 -07:00
Hansung Kim
7ab14445f0 tensor: Test many-commit per execute with an FSM
Trick is to set commit_if.data.eop to 0, since the commit module only
signals instruction completion to VX_schedule if the eop bit is 1.
Otherwise it underflows the pending_instr buffer.

The same eop trick works for VX_scoreboard, which works around the
invalid rd writeback error.
2024-10-07 21:29:44 -07:00
Hansung Kim
e8ca4677df Remove old code for pending_instr underflow fix 2024-10-07 20:21:35 -07:00
Hansung Kim
4cac1adf7d Add dummy code for decoupled Hopper tensor core
Define EXT_T_HOPPER that, when EXT_T_ENABLE is defined, distinguishes
whether to instantiate core-coupled Volta-style or decoupled
Hopper-style Tensor Core.
2024-10-07 17:10:59 -07:00
Hansung Kim
da54162241 tensor: Add FP16 parameter and expose to VX_core 2024-09-10 15:32:17 -07:00
Hansung Kim
a968bdd69b tensor: Fix HALF_PRECISION to 1 2024-09-08 01:43:21 -07:00
Richard Yan
3f8c28c7d6 sync rf, x0 fix 2024-09-05 16:49:05 -07:00
Hansung Kim
2b1a9b7c16 tensor: Rename & docs 2024-08-23 16:21:45 -07:00
Hansung Kim
45f6ae5aad tensor: Doc comments 2024-08-20 14:46:40 -07:00
Hansung Kim
20faf87b80 tensor: Rename halves_buf to reduce confusion 2024-08-19 16:42:02 -07:00
Hansung Kim
789d873e19 Disable reduce_unit for timing optimization
Currently the critical path @1GHz is found at the accumulators inside
reduce_unit.
2024-08-16 15:28:56 -07:00
Hansung Kim
715539b2c3 Guard trace printf in mem_scheduler for synthesis 2024-08-15 06:09:39 -07:00
Hansung Kim
119c52004e Enable LSU dedup in VX_platform.vh 2024-08-15 13:39:43 -07:00
Hansung Kim
1410b39143 Disable trace during the very start of simulation 2024-08-13 16:01:29 -07:00
Hansung Kim
d39e24643d tensor: Parameterize fedp for fp16/fp32 2024-08-12 20:01:56 -07:00
Hansung Kim
15e93e01d8 tensor: Split packed fp16 and wire correctly to DPU 2024-08-07 11:16:38 -07:00
Hansung Kim
d4d18c2823 tensor: spurious assert, doc, remove unused param 2024-07-29 16:06:55 -07:00
Hansung Kim
4e0dcdadac tensor: Share B operand buffer between threadgroups
The two threadgroups use the same B fragment, so no need to duplicately
store them in the operand buffer.  To do this, pull the operand buffer
out of the threadgroups to the octet-level.
2024-07-27 20:42:08 -07:00
Hansung Kim
7ad3f64528 tensor: Remove old ready_reg DPI code 2024-07-27 17:36:02 -07:00
Hansung Kim
01f6024a76 tensor: Split flops into structural module
to get separate area/power numbers in hierarchical
2024-07-26 16:26:48 -07:00
Hansung Kim
7f43bab0aa tensor: Parameterize result buffer depth 2024-07-25 16:31:45 -07:00
Hansung Kim
f3afd4a6f9 Hardcode NUM_THREADS/.. only when SYNTHESIS
They're duplicately set in VX_config.vh which is confusing.
2024-07-23 15:15:16 -07:00
Richard Yan
ed247e21bb Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl 2024-07-20 23:37:58 -07:00
Richard Yan
7d422cc9b0 pre-submission changes 2024-07-20 23:33:56 -07:00
Hansung Kim
14b811f334 Update doc 2024-07-19 16:39:05 -07:00
Hansung Kim
4b093e3ff7 tensor: Mark PARTIAL_BW on power impact 2024-06-26 14:25:26 -07:00
Hansung Kim
9a6fe79bd3 VX_operands_dup: Add counter for RF read/write accesses 2024-06-22 16:35:23 -07:00
Hansung Kim
fb973a51b6 core_wrapper: Only terminate when core 0 is finished; more slack time 2024-06-22 16:34:42 -07:00
Hansung Kim
46fe1897bf VX_platform.vh: Undefine FIRESIM by default 2024-06-22 16:34:08 -07:00
Hansung Kim
d4f6f8a257 Set NUM_ALU_BLOCKS=2, NUM_FPU_BLOCKS=1 2024-06-22 16:33:42 -07:00
Hansung Kim
a9b75dd492 Set default to 4cores/8barriers in VX_config.{h,vh} 2024-06-12 20:51:15 -07:00
Hansung Kim
86deaa8e07 Give some slack time for other cores to finish 2024-06-12 09:47:21 -07:00
Richard Yan
1833e8a176 Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl 2024-06-12 02:17:01 -07:00
Richard Yan
7947df8a6c config change, move ucode 2024-06-12 02:15:08 -07:00
Hansung Kim
5218292b6f core_wrapper: Use finished and !reset to determine termination 2024-06-11 16:28:05 -07:00
Hansung Kim
de10d5a957 Don't print from mem_scheduler in reset 2024-06-09 22:44:33 -07:00
Hansung Kim
5d5e4a468c Merge remote-tracking branch 'refs/remotes/origin/rtl' into rtl 2024-06-09 15:58:32 -07:00
Richard Yan
a47389fc0e Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl 2024-06-09 15:15:31 -07:00
Richard Yan
67a13410fd gate level sim changes 2024-06-09 15:15:01 -07:00
Hansung Kim
1bacbb839f Add GPR_DUPLICATED to synthesis in VX_platform.vh 2024-06-09 14:00:34 -07:00
Hansung Kim
874a3bf194 Doc changes 2024-06-09 13:41:00 -07:00
Hansung Kim
12f8722dd5 Shush display 2024-06-03 13:04:09 -07:00
Hansung Kim
9caafb2d8a tensor: Decode rd of macro-op to designate additional accumulator
This is useful when you want to have the tensor core output to multiple
accumulator registers, e.g. when doing outer product within the RF.
2024-05-31 19:17:56 -07:00
Hansung Kim
0ebbb8e223 tensor: Fix perf counter; comment out dpi 2024-05-31 00:32:32 -07:00
Hansung Kim
73293061ea tensor: Enlarge metadata queue 2024-05-30 23:21:23 -07:00
Hansung Kim
52bb827a46 Handle BLOCK_SIZE != 1 in dispatch_unit
+ change ALU and FPU unit to use it as well
2024-05-30 23:20:21 -07:00
Hansung Kim
a02773eb92 Add more efficient dispatch_unit
Instead of having a single candidate to be considered for dispatch
(designated by 'batch_idx' counter), add a dispatch_unit variant that
considerse all `ISSUE_WIDTH dispatch signals and picks a valid one in a
round-robin manner.

This increases core utilization significantly due to better overlapping
of smem/tensor ops.
2024-05-30 21:55:42 -07:00
Hansung Kim
574cc0e5f0 tensor: Document configuring queue depths 2024-05-30 18:33:15 -07:00
Hansung Kim
83f9f6d84f tensor: Fix sync for dpu warp queue as well 2024-05-30 18:22:36 -07:00