@@ -36,12 +36,18 @@ Major parameters of interest include:
|
||||
|
||||
* Scratchpad and accumulator memory parameters (``sp_banks``, ``sp_capacity``, ``acc_capacity``): Determine the properties of the Gemmini scratchpad memory: overall capacity of the scratchpad or accumulators (in KiB), and the number of banks the scratchpad is divided into.
|
||||
|
||||
* Type parameters (``inputType``, ``outputType``, ``accType``): Determine the data-types flowing through different parts of a Gemmini accelerator. For example, ``inputType`` may be an 8-bit fixed-point number, while ``accType``, which determines the type of partial accumulations in a matrix multiplication, may be a 32-bit integer. ``outputType`` only determines the type of the data passed between two processing elements (PEs); for example, an 8-bit multiplication may produce a 16-bit result which must be shared between PEs in a systolic array.
|
||||
* Type parameters (``inputType``, ``outputType``, ``accType``): Determine the data-types flowing through different parts of a Gemmini accelerator. For example, ``inputType`` may be an 8-bit fixed-point number, while ``accType``, which determines the type of partial accumulations in a matrix multiplication, may be a 32-bit integer. ``outputType`` only determines the type of the data passed between two processing elements (PEs); for example, an 8-bit multiplication may produce a 16-bit result which must be shared between PEs in a systolic array. If your datatype is a floating-point number, then you might also want to change the ``pe_latency`` parameter, which specifies how many shift registers to add inside the PEs. This might be necessary if your datatype cannot complete a multiply-accumulate operation within a single cycle.
|
||||
|
||||
* Access-execute queue parameters (``ld_queue_length``, ``st_queue_length``, ``ex_queue_length``, ``rob_entries``): To implement access-execute decoupling, a Gemmini accelerator has a load instruction queue, a store instruction queue, and an execute instruction queue. The relative sizes of these queue determine the level of access-execute decoupling. Gemmini also implements a reorder buffer (ROB) - the number of entries in the ROB determines possible dependency management limitations.
|
||||
|
||||
* DMA parameters (``dma_maxbytes``, ``dma_buswidth``, ``mem_pipeline``): Gemmini implements a DMA to move data from main memory to the Gemmini scratchpad, and from the Gemmini accumulators to main memory. The size of these DMA transactions is determined by the DMA parameters. These DMA parameters are tightly coupled with Rocket Chip SoC system parameters: in particular ``dma_buswidth`` is associated with the ``SystemBusKey`` ``beatBytes`` parameter, and ``dma_maxbytes`` is associated with ``cacheblockbytes`` Rocket Chip parameters.
|
||||
|
||||
There are also optional features, which can be either enabled or left out of Gemmini at elaboration-time. For example:
|
||||
|
||||
Scaling during "move-in" operations (``mvin_scale_args``, ``mvin_scale_acc_args``): When data is being moved in from DRAM or main memory into Gemmini's local scratchpad memory, it can optionally be multiplied by a scaling factor. These parameters specify what the datatype of the scaling factor is, and how the scaling is actually done. If these are set to ``None``, then this optional feature will be disabled at elaboration time. If both the scratchpad inputs are accumulator inputs are to be scaled in the same say, then the ``mvin_scale_shared`` parameter can be set to ``true`` so that the multipliers and functional units are shared.
|
||||
|
||||
|
||||
|
||||
Gemmini Software
|
||||
------------------
|
||||
|
||||
@@ -51,9 +57,11 @@ The ISA includes configuration instructions, data movement instructions (from ma
|
||||
Since Gemmini instructions are not exposed through the GNU binutils assembler, several C macros are provided in order to construct the instruction encodings to call these instructions.
|
||||
|
||||
The Gemmini generator includes a C matrix multiplication library which wraps the calls to the custom Gemmini instructions.
|
||||
The ``software`` directory of the generator includes the aforementioned library and macros, as well as bare-metal tests, and some FireMarshal workloads to run the tests in a Linux environment. In particular, the matrix multiplication C library can be found in the ``software/gemmini-rocc-tests/include/gemmini.h`` file.
|
||||
The ``software`` directory of the generator (within the generator repository in ``generators/gemmini/software``) includes the aforementioned library and macros, as well as bare-metal tests, and some FireMarshal workloads to run the tests in a Linux environment. In particular, the matrix multiplication C library can be found in the ``generators/gemmini/software/gemmini-rocc-tests/include/gemmini.h`` file.
|
||||
|
||||
The Gemmini generator generates a C header file based on the generator parameters. This header files gets compiled together with the matrix multiplication library to tune library performance. The generated header file can be found under ``software/gemmini-rocc-tests/include/gemmini_params.h``
|
||||
The Gemmini generator generates a C header file based on the generator parameters. This header files gets compiled together with the matrix multiplication library to tune library performance. The generated header file can be found under ``generators/gemmini/software/gemmini-rocc-tests/include/gemmini_params.h``
|
||||
|
||||
Gemmini can also be used to run ONNX-specified neural-networks through a port of Microsoft's ONNX-Runtime framework. The port is included as the `onnxruntime-riscv <https://github.com/pranav-prakash/onnxruntime-riscv>`__ repository submoduled in the `software` directory. The port is under development, and usage documentation can be found `within its repository <https://github.com/pranav-prakash/onnxruntime-riscv/blob/systolic/systolic_runner/docs>`__.
|
||||
|
||||
Build and Run Gemmini Tests
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Submodule generators/gemmini updated: d6f36d37d1...caaf781ec9
@@ -29,6 +29,7 @@ git config submodule.toolchains/qemu.update none
|
||||
|
||||
# Don't automatically initialize generators with big submodules (e.g. linux source)
|
||||
git config submodule.generators/sha3.update none
|
||||
git config submodule.generators/gemmini.update none
|
||||
|
||||
# Disable updates to the FireSim submodule until explicitly requested
|
||||
git config submodule.sims/firesim.update none
|
||||
@@ -51,11 +52,16 @@ git config --unset submodule.vlsi/hammer-synopsys-plugins.update
|
||||
git config --unset submodule.vlsi/hammer-mentor-plugins.update
|
||||
|
||||
git config --unset submodule.generators/sha3.update
|
||||
git config --unset submodule.generators/gemmini.update
|
||||
git config --unset submodule.software/firemarshal.update
|
||||
|
||||
# Non-recursive clone to exclude riscv-linux
|
||||
git submodule update --init generators/sha3
|
||||
|
||||
# Non-recursive clone to exclude gemmini-software
|
||||
git submodule update --init generators/gemmini
|
||||
git submodule update --init --recursive generators/gemmini/software/gemmini-rocc-tests
|
||||
|
||||
git config --unset submodule.sims/firesim.update
|
||||
# Minimal non-recursive clone to initialize sbt dependencies
|
||||
git submodule update --init sims/firesim
|
||||
|
||||
Submodule toolchains/esp-tools/riscv-isa-sim updated: 3c930b4031...13384cac1e
Reference in New Issue
Block a user