updated documentation

2024-01-03 10:23:38 -08:00
parent bd18b03cc3
commit f2e8317412
7 changed files with 53 additions and 110 deletions
--- a/docs/microarchitecture.md
+++ b/docs/microarchitecture.md
@@ -24,71 +24,57 @@ Vortex uses the SIMT (Single Instruction, Multiple Threads) execution model with
  - Control the number of warps to activate during execution
  - `WSPAWN` *count, addr*: activate count warps and jump to addr location
 - **Control-Flow Divergence**
-  - Control threads to activate when a branch diverges
-    - `SPLIT` *predicate*: apply 'taken' predicate thread mask adn save 'not-taken' into IPDOM stack
-    - `JOIN`: restore 'not-taken' thread mask
+  - Control threads activation when a branch diverges
+    - `SPLIT` *taken, predicate*: apply predicate thread mask and save current state into IPDOM stack
+    - `JOIN`: pop IPDOM stack to restore thread mask
+    - `PRED` *predicate, restore_mask*: thread predicate instruction
 - **Warp Synchronization**
  - `BAR` *id, count*: stall warps entering barrier *id* until count is reached

 ### Vortex Pipeline/Datapath

-![Image of Vortex Microarchitecture](./assets/img/vortex_microarchitecture_v2.png)
+![Image of Vortex Microarchitecture](./assets/img/vortex_microarchitecture.png)

-Vortex has a 5-stage pipeline: FI | ID | Issue | EX | WB.
+Vortex has a 6-stage pipeline:
+
+- **Schedule**
+  - Warp Scheduler
+    - Schedule the next PC into the pipeline
+    - Track stalled, active warps
+  - IPDOM Stack
+    - Save split/join states for divergent threads
+  - Inflight Tracker
+    - Track in-flight instructions

 - **Fetch**
-  - Warp Scheduler
-    - Track stalled & active warps, resolve branches and barriers, maintain split/join IPDOM stack
-  - Instruction Cache
-    - Retrieve instruction from cache, issue I-cache requests/responses
+  - Retrieve instructions from memory
+  - Handle I-cache requests/responses
 - **Decode**
-  - Decode fetched instructions, notify warp scheduler when the following instructions are decoded:
-    - Branch, tmc, split/join, wspawn
-  - Precompute used_regs mask (needed for Issue stage)
+  - Decode fetched instructions
+  - Notify warp scheduler on control instructions
 - **Issue**
-  - Scheduling
-    - In-order issue (operands/execute unit ready), out-of-order commit
  - IBuffer
-    - Store fetched instructions, separate queues per-warp, selects next warp through round-robin scheduling
+    - Store decoded instructions in separate per-warp queues
  - Scoreboard
    - Track in-use registers
-  - GPRs (General-Purpose Registers) stage
-    - Fetch issued instruction operands and send operands to execute unit
+    - Check register use for decoded instructions
+  - Operands Collector
+    - Fetch the operands for issued instructions from the register file
 - **Execute**
  - ALU Unit
-    - Single-cycle operations (+,-,>>,<<,&,|,^), Branch instructions (Share ALU resources)
-  - MULDIV Unit
-    - Multiplier - done in 2 cycles
-    - Divider - division and remainder, done in 32 cycles
-      - Implements serial alogrithm (Stalls the pipeline)
+    - Handle arithmetic and branch operations
  - FPU Unit
-    - Multi-cycle operations, uses `FPnew` Library on ASIC, uses hard DSPs on FPGA
-  - CSR Unit
-    - Store constant status registers - device caps, FPU status flags, performance counters
-    - Handle external CSR requests (requests from host CPU)
+    - Handle floating-point operations
  - LSU Unit
-    - Handle load/store operations, issue D-cache requests, handle D-cache responses
-    - Commit load responses - saves storage, Scoreboard tracks completion
-  - GPGPU Unit
-    - Handle GPGPU instructions
-      - TMC, WSPAWN, SPLIT, BAR
-    - JOIN is handled by Warp Scheduler (upon SPLIT response)
+    - Handle load/store operations
+  - SFU Unit
+    - Handle warp control operations
+    - Handle Control Status Registers (CSRs) operations
 - **Commit**
-  - Commit
-    - Update CSR flags, update performance counters
-  - Writeback
-    - Write result back to GPRs, notify Scoreboard (release in-use register), select candidate instruction (ALU unit has highest priority)
- **Clustering**
-  - Group mulitple cores into clusters (optionally share L2 cache)
-  - Group multiple clusters (optionally share L3 cache)
-  - Configurable at build time
-  - Default configuration:
-    - #Clusters = 1
-    - #Cores = 4
-    - #Warps = 4
-    - #Threads = 4
- **FPGA AFU Interface**
-  - Manage CPU-GPU comunication
-    - Query devices caps, load kernel instructions and resource buffers, start kernel execution, read destination buffers
-  - Local Memory - GPU access to local DRAM
-  - Reserved I/O addresses - redirect to host CPU, console output
+  - Write result back to the register file and update the Scoreboard.
+
+### Vortex clustering architecture
+- Sockets
+  - Grouping multiple cores sharing L1 cache
+- Clusters
+  - Grouping of sockets sharing L2 cache