Hansung Kim
9ea291eea2
Merge remote-tracking branch 'origin/tensor_core' into rtl
2024-05-05 17:03:57 -07:00
joshua
5bd25985c6
i kinda forgot most of changes
2024-05-04 23:01:47 -07:00
Hansung Kim
1c7acab160
tensor: Fix lint errors
2024-05-03 15:43:02 -07:00
Hansung Kim
5a0ee98a61
Remove duplicate port connection
2024-05-03 15:07:24 -07:00
Hansung Kim
bc45c40231
tensor: Rename half.hpp -> half.h
...
addResource() thinks it's a Verilog source file if it ends in .hpp, for
some reason.
2024-05-02 16:17:20 -07:00
Hansung Kim
c4b94e4f2c
Wrap hardcoded configs with SYNTHESIS
2024-05-02 16:17:04 -07:00
Hansung Kim
c4d71bc3d6
tensor: Fix multiple driver error on VCS
2024-05-01 21:40:48 -07:00
Hansung Kim
7fc5b6a374
tensor: Fix elaboration error on VCS
2024-05-01 21:40:45 -07:00
Hansung Kim
675e8ea130
Merge branch 'tensor_core' into rtl
2024-05-01 16:18:14 -07:00
Hansung Kim
9a688a05b1
Add (unconnected) FPU perf counters
...
mainly for debugging
2024-04-29 15:20:55 -07:00
Hansung Kim
100fbbc048
Increase FPUQ_SIZE
...
This should at least be FMA_LATENCY to not bottleneck things.
2024-04-29 15:19:48 -07:00
Richard Yan
85213d2876
synthesizable design
2024-04-17 18:05:51 -07:00
Richard Yan
17fd29c114
Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl
2024-04-16 23:03:04 -07:00
Richard Yan
8de5470da4
round robin warp scheduling
2024-04-16 23:03:00 -07:00
Hansung Kim
217bc189da
ifdef-guard VX_operand* to enable including both in Chisel
2024-04-15 22:06:47 -07:00
Hansung Kim
4752b86858
Limit NUM_SFU_LANES to 4
...
Simulation seems to not like SFU_LANES=8; dial back for now
2024-04-15 21:48:59 -07:00
Hansung Kim
978b1fe2d0
Add operands stage with duplicated RF for rs1/2/3
2024-04-15 16:45:59 -07:00
Hansung Kim
87b966a5fa
Add perf counter for stall by any operand hazard
2024-04-15 01:01:26 -07:00
Hansung Kim
7ae54bd280
Remove unused IO in core_wrapper
2024-04-13 17:13:39 -07:00
Richard Yan
d3e0f18fd5
Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl
2024-04-09 19:55:11 -07:00
Richard Yan
41a79a03a4
parametrize memory interface in core wrapper and update config.vh
2024-04-09 19:55:06 -07:00
Hansung Kim
6c632200d5
Divide by per-breakdown cycle for avg stall cycles
2024-04-03 15:29:51 -07:00
Hansung Kim
62c7d1f4cf
Report any fire cycles from scoreboard as well
2024-03-29 12:23:15 -07:00
Hansung Kim
50263a5f7d
Rename sched_barrier_stalls -> perf_sched_barrier_idles
...
Sched stall by barrier is really idle because it causes !scheduler_if.valid,
which is counted as part of sched_idle.
2024-03-28 22:45:12 -07:00
joshua
d8f9359fae
test case update
2024-03-28 13:04:02 -07:00
joshua
08d7721e11
annoying swizzling problems
2024-03-28 03:00:15 -07:00
joshua
e16584ddd9
bleh still not work
2024-03-27 00:26:04 -07:00
Hansung Kim
dd90736382
Reformat perfcount report
2024-03-23 01:07:46 -07:00
Hansung Kim
3e6a9a6104
Expose scoreboard fires to perf interface
2024-03-23 01:06:40 -07:00
Hansung Kim
d99295793c
Periodically report perf counter; reformat operand/FU stalls
2024-03-23 00:02:02 -07:00
Hansung Kim
83e151a189
Add valid / fire / cycles-issued perf counters to dispatch
2024-03-23 00:01:15 -07:00
Hansung Kim
573be030c8
Add issue-stall-by-operand-hazard perf counters
...
Do the same reduce by + instead of OR fix for scoreboard counters.
2024-03-23 00:00:08 -07:00
Hansung Kim
dda67da84c
Add issue-stall-by-unit-busy perf counters
...
Add per-issue-width counters instead of using reduce "OR" and causing
undercounting.
2024-03-21 18:11:12 -07:00
Hansung Kim
3718a57937
Docs
2024-03-21 15:44:50 -07:00
joshua
b254281295
initial tcore impl
2024-03-21 01:29:38 -07:00
Hansung Kim
9438862389
Add perf counter for barrier schedule stalls
2024-03-20 15:29:28 -07:00
joshua
f9b4509936
initial tensor core
2024-03-20 02:46:00 -07:00
joshua
978dd3bdfe
seemingly working fp32 implementation
2024-03-19 17:56:59 -07:00
Hansung Kim
7014ae24da
Prettier perf count reports
2024-03-19 15:25:46 -07:00
Hansung Kim
b25deb8a2e
Fix assignment for perf counters
2024-03-19 14:06:44 -07:00
Hansung Kim
df4b21507e
Customize global barrier response logic for clusters
2024-03-18 14:30:32 -07:00
Hansung Kim
2525df9c5f
Use GBAR_CLUSTER_ENABLE to guard cluster-specific modification
2024-03-17 18:24:04 -07:00
Hansung Kim
7f8abe99ff
Fix wrong multicore parametrization in wrapper
2024-03-17 18:23:09 -07:00
Hansung Kim
40e2888733
Connect core gbar signals in wrapper
2024-03-17 14:09:43 -07:00
Hansung Kim
28f54bde7f
Merge remote-tracking branch 'sungwoong/master' into rtl
2024-03-14 09:15:59 -07:00
Hansung Kim
bd67ff3439
Fix creating bogus mem reqs when commit is stalled
...
When commit stage is stalled, LSU ready is deasserted for mem writes
since stores commit immediately; however, the same was not applied to
valid, creating duplicate memory write requests. Fix by guarding both
ready and valid properly.
2024-03-13 20:43:27 -07:00
Hansung Kim
8317a3fbe5
Fix fence by disallowing x-initialization instead of all-0 mask
...
Setting mem_req_mask to all-zero triggers an assertion error in
mem_scheduler. Instead, disallow initialize-by-x in instruction decode
which is the source of x-propagation. Since this seems to only happen
in VCS, define-gate it accordingly.
This reverts commit a15f4fd483 .
2024-03-07 17:39:18 -08:00
Hansung Kim
010c4675ce
Fix undeclared mem_perf_if
2024-03-07 15:00:43 -08:00
Hansung Kim
b63333a4ec
Merge remote-tracking branch 'upstream/master' into vortex2
2024-03-07 14:45:48 -08:00
joshua
beb3dce46d
integer reduction unit
2024-03-06 01:39:17 -08:00