From e3e5c178ffa5d48726ac00dacd297c8ddcf69f4a Mon Sep 17 00:00:00 2001 From: Malik Aki Burton Date: Fri, 23 Apr 2021 18:31:52 -0400 Subject: [PATCH] Added codebase and microarch guides and updated vortex and simulation guides slightly --- doc/Codebase.md | 35 +++++++++++++++ doc/Microarchitecture.md | 94 ++++++++++++++++++++++++++++++++++++++++ doc/Simulation.md | 7 ++- doc/Vortex.md | 2 +- 4 files changed, 133 insertions(+), 5 deletions(-) create mode 100644 doc/Codebase.md create mode 100644 doc/Microarchitecture.md diff --git a/doc/Codebase.md b/doc/Codebase.md new file mode 100644 index 00000000..3d018855 --- /dev/null +++ b/doc/Codebase.md @@ -0,0 +1,35 @@ +# Vortex Codebase + +The directory/file layout of the Vortex codebase is as followed: + +- `benchmark`: contains opencl, risc-v, and vector tests + - `opencl`: contains basic kernel operation tests (i.e. vector add, transpose, dot product) + - `riscv`: contains official riscv tests which are pre-compiled into binaries + - `vector`: tests for vector instructions (not yet implemented) +- `ci`: contain tests to be run during continuous integration (Travis CI) + - driver, opencl, riscv_isa, and runtime tests +- `driver`: contains driver software implementation (software that is run on the host to communicate with the vortex processor) + - `opae`: contains code for driver that runs on FPGA + - `rtlsim`: contains code for driver that runs on local machine (driver built using verilator which converts rtl to c++ binary) + - `simx`: contains code for driver that runs on local machine (vortex) + - `include`: contains vortex.h which has the vortex API that is used by the drivers +- `runtime`: contains software used inside kernel programs to expose GPGPU capabilities + - `include`: contains vortex API needed for runtime + - `linker`: contains linker file for compiling kernels + - `src`: contains implementation of vortex API (from include folder) + - `tests`: contains runtime tests + - `simple`: contains test for GPGPU functionality allowed in vortex +- `simx`: contains simX, the cycle approximate simulator for vortex +- `miscs`: contains old code that is no longer used +- `hw`: + - `unit_tests`: contains unit test for RTL of cache and queue + - `syn`: contains all synthesis scripts (quartus and yosys) + - `quartus`: contains code to synthesis cache, core, pipeline, top, and vortex stand-alone + - `simulate`: contains RTL simulator (verilator) + - `testbench.cpp`: runs either the riscv, runtime, or opencl tests + - `opae`: contains source code for the accelerator functional unit (AFU) and code which programs the fpga + - `rtl`: contains rtl source code + - `cache`: contains cache subsystem code + - `fp_cores`: contains floating point unit code + - `interfaces`: contains code that handles communication for each of the units of the microarchitecture + - `libs`: contains general-purpose modules (i.e., buffers, encoders, arbiters, pipe registers) \ No newline at end of file diff --git a/doc/Microarchitecture.md b/doc/Microarchitecture.md new file mode 100644 index 00000000..d8892e62 --- /dev/null +++ b/doc/Microarchitecture.md @@ -0,0 +1,94 @@ +# Vortex Microarchitecture + +### Vortex GPGPU Execution Model + +Vortex uses the SIMT (Single Instruction, Multiple Threads) execution model with a single warp issued per cycle. + +- **Threads** + - Smallest unit of computation + - Each thread has its own register file (32 int + 32 fp registers) + - Threads execute in parallel +- **Warps** + - A logical clster of threads + - Each thread in a warp execute the same instruction + - The PC is shared; maintain thread mask for Writeback + - Warp's execution is time-multiplexed at log steps + - Ex. warp 0 executes at cycle 0, warp 1 executes at cycle 1 + +### Vortex RISC-V ISA Extension + +- **Thread Mask Control** + - Control the number of warps to activate during execution + - `TMC` *count*: activate count threads +- **Warp Scheduling** + - Control the number of warps to activate during execution + - `WSPAWN` *count, addr*: activate count warps and jump to addr location +- **Control-Flow Divergence** + - Control threads to activate when a branch diverges + - `SPLIT` *predicate*: apply 'taken' predicate thread mask adn save 'not-taken' into IPDOM stack + - `JOIN`: restore 'not-taken' thread mask +- **Warp Synchronization** + - `BAR` *id, count*: stall warps entering barrier *id* until count is reached + +### Vortex Pipeline/Datapath + +![Image of Vortex Microarchitecture](vortex_microarchitecture_v2.png) + +Vortex has a 5-stage pipeline: FI | ID | Issue | EX | WB. + +- **Fetch** + - Warp Scheduler + - Track stalled & active warps, resolve branches and barriers, maintain split/join IPDOM stack + - Instruction Cache + - Retrieve instruction from cache, issue I-cache requests/responses +- **Decode** + - Decode fetched instructions, notify warp scheduler when the following instructions are decoded: + - Branch, tmc, split/join, wspawn + - Precompute used_regs mask (needed for Issue stage) +- **Issue** + - Scheduling + - In-order issue (operands/execute unit ready), out-of-order commit + - IBuffer + - Store fetched instructions, separate queues per-warp, selects next warp through round-robin scheduling + - Scoreboard + - Track in-use registers + - GPRs (General-Purpose Registers) stage + - Fetch issued instruction operands and send operands to execute unit +- **Execute** + - ALU Unit + - Single-cycle operations (+,-,>>,<<,&,|,^), Branch instructions (Share ALU resources) + - MULDIV Unit + - Multiplier - done in 2 cycles + - Divider - division and remainder, done in 32 cycles + - Implements serial alogrithm (Stalls the pipeline) + - FPU Unit + - Multi-cycle operations, uses `FPnew` Library on ASIC, uses hard DSPs on FPGA + - CSR Unit + - Store constant status registers - device caps, FPU status flags, performance counters + - Handle external CSR requests (requests from host CPU) + - LSU Unit + - Handle load/store operations, issue D-cache requests, handle D-cache responses + - Commit load responses - saves storage, Scoreboard tracks completion + - GPGPU Unit + - Handle GPGPU instructions + - TMC, WSPAWN, SPLIT, BAR + - JOIN is handled by Warp Scheduler (upon SPLIT response) +- **Commit** + - Commit + - Update CSR flags, update performance counters + - Writeback + - Write result back to GPRs, notify Scoreboard (release in-use register), select candidate instruction (ALU unit has highest priority) +- **Clustering** + - Group mulitple cores into clusters (optionally share L2 cache) + - Group multiple clusters (optionally share L3 cache) + - Configurable at build time + - Default configuration: + - #Clusters = 1 + - #Cores = 4 + - #Warps = 4 + - #Threads = 4 +- **FPGA AFU Interface** + - Manage CPU-GPU comunication + - Query devices caps, load kernel instructions and resource buffers, start kernel execution, read destination buffers + - Local Memory - GPU access to local DRAM + - Reserved I/O addresses - redirect to host CPU, console output \ No newline at end of file diff --git a/doc/Simulation.md b/doc/Simulation.md index b6861628..dfd042bb 100644 --- a/doc/Simulation.md +++ b/doc/Simulation.md @@ -24,10 +24,9 @@ Running tests under specific drivers (rtlsim,simx,fpga) is done using the script - *L3cache* - used to enable the shared l3cache among the Vortex clusters. - *Driver* - used to specify which driver to run the Vortex simulation (either rtlsim, vlsim, fpga, or simx). - *Debug* - used to enable debug mode for the Vortex simulation. -- *Scope* - -- *Perf* - is used to enable the detailed performance counters within the Vortex simulation. -- *App* - is used to specify which test/benchmark to run in the Vortex simulation. The main choices are vecadd, sgemm, basic, demo, and dogfood. Other tests/benchmarks are located in the `/benchmarks/opencl` folder though not all of them work wit the current version of Vortex. -- *Args* - +- *Perf* - used to enable the detailed performance counters within the Vortex simulation. +- *App* - used to specify which test/benchmark to run in the Vortex simulation. The main choices are vecadd, sgemm, basic, demo, and dogfood. Other tests/benchmarks are located in the `/benchmarks/opencl` folder though not all of them work wit the current version of Vortex. +- *Args* - used to pass additional arguments to the application. Example use of command line arguments: Run the sgemm benchmark using the vlsim driver with a Vortex configuration of 1 cluster, 4 cores, 4 warps, and 4 threads. diff --git a/doc/Vortex.md b/doc/Vortex.md index 36846b30..16dad807 100644 --- a/doc/Vortex.md +++ b/doc/Vortex.md @@ -2,7 +2,7 @@ ### Table of Contents -- Vortex Architecture +- Vortex Microarchitecture - Vortex Software - [Vortex Simulation](https://github.com/vortexgpgpu/vortex-dev/blob/master/doc/Simulation.md) - [FPGA](https://github.com/vortexgpgpu/vortex-dev/blob/master/doc/Flubber_FPGA_Startup_Guide.md)