add more to docs | 1st spelling pass | more links | proper formatting
This commit is contained in:
@@ -1,80 +1,69 @@
|
||||
|
||||
|
||||
Adding An Accelerator/Device
|
||||
===============================
|
||||
|
||||
Accelerators or custom IO devices can be added to your SoC in several ways:
|
||||
+ MMIO Peripheral (a.k.a TileLink-Attached Accelerator)
|
||||
+ Tightly-Coupled RoCC Accelerator
|
||||
|
||||
These approaches differ in the method of the communication between the processor and the custom block.
|
||||
* MMIO Peripheral (a.k.a TileLink-Attached Accelerator)
|
||||
* Tightly-Coupled RoCC Accelerator
|
||||
|
||||
With the TileLink-Attached approach, the processor communicates with MMIO peripherals through memory-mapped registers.
|
||||
These approaches differ in the method of the communication between the processor and the custom block.
|
||||
|
||||
In contrast, the processor communicates with a RoCC accelerators through a custom protocol and custom non-standard ISA instructions reserved in the RISC-V ISA encoding space. Each core can have up to four accelerators that are controlled by custom instructions and share resources with the CPU.
|
||||
With the TileLink-Attached approach, the processor communicates with MMIO peripherals through memory-mapped registers.
|
||||
|
||||
In contrast, the processor communicates with a RoCC accelerators through a custom protocol and custom non-standard ISA instructions reserved in the RISC-V ISA encoding space.
|
||||
Each core can have up to four accelerators that are controlled by custom instructions and share resources with the CPU.
|
||||
RoCC coprocessor instructions have the following form.
|
||||
|
||||
::
|
||||
.. code-block::
|
||||
customX rd, rs1, rs2, funct
|
||||
|
||||
The X will be a number 0-3, and determines the opcode of the instruction,
|
||||
which controls which accelerator an instruction will be routed to.
|
||||
The ``rd``, ``rs1``, and ``rs2`` fields are the register numbers of the destination
|
||||
register and two source registers. The ``funct`` field is a 7-bit integer that
|
||||
the accelerator can use to distinguish different instructions from each other.
|
||||
|
||||
Note that communication through a RoCC interfaces requires a custom software toolchain, whereas MMIO peripherals can use that standard toolchain with approriate driver support.
|
||||
The X will be a number 0-3, and determines the opcode of the instruction, which controls which accelerator an instruction will be routed to.
|
||||
The ``rd``, ``rs1``, and ``rs2`` fields are the register numbers of the destination register and two source registers.
|
||||
The ``funct`` field is a 7-bit integer that the accelerator can use to distinguish different instructions from each other.
|
||||
|
||||
Note that communication through a RoCC interface requires a custom software toolchain, whereas MMIO peripherals can use that standard toolchain with approriate driver support.
|
||||
|
||||
Integrating into the Generator Build System
|
||||
-------------------------------------------
|
||||
|
||||
While developing, you want to include Chisel code in a submodule so that it
|
||||
can be shared by different projects. To add a submodule to the project
|
||||
template, make sure that your project is organized as follows.
|
||||
While developing, you want to include Chisel code in a submodule so that it can be shared by different projects.
|
||||
To add a submodule to the REBAR framework, make sure that your project is organized as follows.
|
||||
|
||||
.. code-block::
|
||||
yourproject/
|
||||
build.sbt
|
||||
src/main/scala/
|
||||
YourFile.scala
|
||||
|
||||
Put this in a git repository and make it accessible. Then add it as a submodule
|
||||
to under the following directory hierarchy: ``rebar/generators/yourproject``.
|
||||
Put this in a git repository and make it accessible.
|
||||
Then add it as a submodule to under the following directory hierarchy: ``generators/yourproject``.
|
||||
|
||||
::
|
||||
.. code-block:: shell
|
||||
cd generators/
|
||||
git submodule add https://git-repository.com/yourproject.git
|
||||
|
||||
Then add `yourproject` to the ReBAR top-level build.sbt file.
|
||||
Then add ``yourproject`` to the REBAR top-level build.sbt file.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
lazy val yourproject = project.settings(commonSettings).dependsOn(rocketchip)
|
||||
|
||||
|
||||
You can then import the classes defined in the submodule in a new project if
|
||||
you add it as a dependency. For instance, if you want to use this code in
|
||||
the `example` project, change the final line in build.sbt to the following.
|
||||
the ``example`` project, change the final line in build.sbt to the following.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
lazy val example = (project in file(".")).settings(commonSettings).dependsOn(testchipip, yourproject)
|
||||
|
||||
|
||||
Finally, add `yourproject` to the `PACKAGES` variable in the `Makefrag`. This will allow make to detect
|
||||
that your source files have changed when building the verilog/firrtl files.
|
||||
|
||||
|
||||
Finally, add ``yourproject`` to the ``PACKAGES`` variable in the ``common.mk`` file in the REBAR top level.
|
||||
This will allow make to detect that your source files have changed when building the Verilog/FIRRTL files.
|
||||
|
||||
MMIO Peripheral
|
||||
------------------
|
||||
|
||||
The easiest way to create a TileLink peripheral is to use the
|
||||
TLRegisterRouter, which abstracts away the details of handling the TileLink
|
||||
protocol and provides a convenient interface for specifying memory-mapped
|
||||
registers. To create a RegisterRouter-based peripheral, you will need to
|
||||
specify a parameter case class for the configuration settings, a bundle trait
|
||||
with the extra top-level ports, and a module implementation containing the
|
||||
actual RTL.
|
||||
The easiest way to create a TileLink peripheral is to use the ``TLRegisterRouter``, which abstracts away the details of handling the TileLink protocol and provides a convenient interface for specifying memory-mapped registers.
|
||||
To create a RegisterRouter-based peripheral, you will need to specify a parameter case class for the configuration settings, a bundle trait with the extra top-level ports, and a module implementation containing the actual RTL.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
case class PWMParams(address: BigInt, beatBytes: Int)
|
||||
|
||||
trait PWMTLBundle extends Bundle {
|
||||
@@ -103,16 +92,12 @@ actual RTL.
|
||||
}
|
||||
|
||||
|
||||
Once you have these classes, you can construct the final peripheral by
|
||||
extending the TLRegisterRouter and passing the proper arguments. The first
|
||||
set of arguments determines where the register router will be placed in the
|
||||
global address map and what information will be put in its device tree entry.
|
||||
The second set of arguments is the IO bundle constructor, which we create
|
||||
by extending TLRegBundle with our bundle trait. The final set of arguments
|
||||
is the module constructor, which we create by extends TLRegModule with our
|
||||
module trait.
|
||||
Once you have these classes, you can construct the final peripheral by extending the ``TLRegisterRouter`` and passing the proper arguments.
|
||||
The first set of arguments determines where the register router will be placed in the global address map and what information will be put in its device tree entry.
|
||||
The second set of arguments is the IO bundle constructor, which we create by extending ``TLRegBundle`` with our bundle trait.
|
||||
The final set of arguments is the module constructor, which we create by extends ``TLRegModule`` with our module trait.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
class PWMTL(c: PWMParams)(implicit p: Parameters)
|
||||
extends TLRegisterRouter(
|
||||
c.address, "pwm", Seq("ucbbar,pwm"),
|
||||
@@ -120,20 +105,17 @@ module trait.
|
||||
new TLRegBundle(c, _) with PWMTLBundle)(
|
||||
new TLRegModule(c, _, _) with PWMTLModule)
|
||||
|
||||
The full module code can be found in ``generators/example/src/main/scala/PWM.scala``.
|
||||
|
||||
The full module code with comments can be found in src/main/scala/example/PWM.scala.
|
||||
After creating the module, we need to hook it up to our SoC.
|
||||
Rocket Chip accomplishes this using the cake pattern.
|
||||
This basically involves placing code inside traits.
|
||||
In the Rocket Chip cake, there are two kinds of traits: a ``LazyModule`` trait and a module implementation trait.
|
||||
|
||||
After creating the module, we need to hook it up to our SoC. Rocketchip
|
||||
accomplishes this using the [cake pattern](http://www.cakesolutions.net/teamblogs/2011/12/19/cake-pattern-in-depth).
|
||||
This basically involves placing code inside traits. In the RocketChip cake,
|
||||
there are two kinds of traits: a LazyModule trait and a module implementation
|
||||
trait.
|
||||
The ``LazyModule`` trait runs setup code that must execute before all the hardware gets elaborated.
|
||||
For a simple memory-mapped peripheral, this just involves connecting the peripheral's TileLink node to the MMIO crossbar.
|
||||
|
||||
The LazyModule trait runs setup code that must execute before all the hardware
|
||||
gets elaborated. For a simple memory-mapped peripheral, this just involves
|
||||
connecting the peripheral's TileLink node to the MMIO crossbar.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
trait HasPeripheryPWM extends HasSystemNetworks {
|
||||
implicit val p: Parameters
|
||||
|
||||
@@ -147,17 +129,15 @@ connecting the peripheral's TileLink node to the MMIO crossbar.
|
||||
}
|
||||
|
||||
|
||||
Note that the PWMTL class we created from the register router is itself a
|
||||
LazyModule. Register routers have a TileLike node simply named "node", which
|
||||
we can hook up to the RocketChip peripheryBus. This will automatically add
|
||||
address map and device tree entries for the peripheral.
|
||||
Note that the ``PWMTL`` class we created from the register router is itself a ``LazyModule``.
|
||||
Register routers have a TileLike node simply named "node", which we can hook up to the Rocket Chip bus.
|
||||
This will automatically add address map and device tree entries for the peripheral.
|
||||
|
||||
The module implementation trait is where we instantiate our PWM module and
|
||||
connect it to the rest of the SoC. Since this module has an extra `pwmout`
|
||||
output, we declare that in this trait, using Chisel's multi-IO
|
||||
functionality. We then connect the PWMTL's pwmout to the pwmout we declared.
|
||||
The module implementation trait is where we instantiate our PWM module and connect it to the rest of the SoC.
|
||||
Since this module has an extra `pwmout` output, we declare that in this trait, using Chisel's multi-IO functionality.
|
||||
We then connect the ``PWMTL``'s pwmout to the pwmout we declared.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
trait HasPeripheryPWMModuleImp extends LazyMultiIOModuleImp {
|
||||
implicit val p: Parameters
|
||||
val outer: HasPeripheryPWM
|
||||
@@ -167,11 +147,10 @@ functionality. We then connect the PWMTL's pwmout to the pwmout we declared.
|
||||
pwmout := outer.pwm.module.io.pwmout
|
||||
}
|
||||
|
||||
Now we want to mix our traits into the system as a whole.
|
||||
This code is from ``generators/example/src/main/scala/Top.scala``.
|
||||
|
||||
Now we want to mix our traits into the system as a whole. This code is from
|
||||
src/main/scala/example/Top.scala.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
class ExampleTopWithPWM(q: Parameters) extends ExampleTop(q)
|
||||
with PeripheryPWM {
|
||||
override lazy val module = Module(
|
||||
@@ -182,19 +161,15 @@ src/main/scala/example/Top.scala.
|
||||
extends ExampleTopModule(l) with HasPeripheryPWMModuleImp
|
||||
|
||||
|
||||
Just as we need separate traits for LazyModule and module implementation, we
|
||||
need two classes to build the system. The ExampleTop classes already have the
|
||||
basic peripherals included for us, so we will just extend those.
|
||||
Just as we need separate traits for ``LazyModule`` and module implementation, we need two classes to build the system.
|
||||
The ``ExampleTop`` classes already have the basic peripherals included for us, so we will just extend those.
|
||||
|
||||
The ExampleTop class includes the pre-elaboration code and also a lazy val to
|
||||
produce the module implementation (hence LazyModule). The ExampleTopModule
|
||||
class is the actual RTL that gets synthesized.
|
||||
The ``ExampleTop`` class includes the pre-elaboration code and also a ``lazy val`` to produce the module implementation (hence ``LazyModule``).
|
||||
The ``ExampleTopModule`` class is the actual RTL that gets synthesized.
|
||||
|
||||
Finally, we need to add a configuration class in
|
||||
src/main/scala/example/Configs.scala that tells the TestHarness to instantiate
|
||||
ExampleTopWithPWM instead of the default ExampleTop.
|
||||
Finally, we need to add a configuration class in ``generators/example/src/main/scala/Configs.scala`` that tells the ``TestHarness`` to instantiate ``ExampleTopWithPWM`` instead of the default ``ExampleTop``.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
class WithPWM extends Config((site, here, up) => {
|
||||
case BuildTop => (p: Parameters) =>
|
||||
Module(LazyModule(new ExampleTopWithPWM()(p)).module)
|
||||
@@ -203,9 +178,9 @@ ExampleTopWithPWM instead of the default ExampleTop.
|
||||
class PWMConfig extends Config(new WithPWM ++ new BaseExampleConfig)
|
||||
|
||||
|
||||
Now we can test that the PWM is working. The test program is in tests/pwm.c
|
||||
Now we can test that the PWM is working. The test program is in ``tests/pwm.c``.
|
||||
|
||||
::
|
||||
.. code-block:: c
|
||||
#define PWM_PERIOD 0x2000
|
||||
#define PWM_DUTY 0x2008
|
||||
#define PWM_ENABLE 0x2010
|
||||
@@ -230,29 +205,26 @@ Now we can test that the PWM is working. The test program is in tests/pwm.c
|
||||
}
|
||||
|
||||
|
||||
This just writes out to the registers we defined earlier. The base of the
|
||||
module's MMIO region is at 0x2000. This will be printed out in the address
|
||||
map portion when you generated the verilog code.
|
||||
This just writes out to the registers we defined earlier.
|
||||
The base of the module's MMIO region is at 0x2000.
|
||||
This will be printed out in the address map portion when you generated the verilog code.
|
||||
|
||||
Compiling this program with make produces a `pwm.riscv` executable.
|
||||
Compiling this program with make produces a ``pwm.riscv`` executable.
|
||||
|
||||
Now with all of that done, we can go ahead and run our simulation.
|
||||
|
||||
::
|
||||
.. code-block:: shell
|
||||
cd verisim
|
||||
make CONFIG=PWMConfig
|
||||
./simulator-example-PWMConfig ../tests/pwm.riscv
|
||||
|
||||
|
||||
|
||||
|
||||
Adding a RoCC Accelerator
|
||||
----------------------------
|
||||
|
||||
RoCC accelerators are lazy modules that extend the LazyRoCC class.
|
||||
Their implementation should extends the LazyRoCCModule class.
|
||||
RoCC accelerators are lazy modules that extend the ``LazyRoCC`` class.
|
||||
Their implementation should extends the ``LazyRoCCModule`` class.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
class CustomAccelerator(opcodes: OpcodeSet)
|
||||
(implicit p: Parameters) extends LazyRoCC(opcodes) {
|
||||
override lazy val module = new CustomAcceleratorModule(this)
|
||||
@@ -277,34 +249,30 @@ Their implementation should extends the LazyRoCCModule class.
|
||||
}
|
||||
|
||||
|
||||
The ``opcodes`` parameter for ``LazyRoCC`` is
|
||||
the set of custom opcodes that will map to this accelerator. More on this
|
||||
in the next subsection.
|
||||
The ``opcodes`` parameter for ``LazyRoCC`` is the set of custom opcodes that will map to this accelerator.
|
||||
More on this in the next subsection.
|
||||
|
||||
The ``LazyRoCC`` class contains two TLOutputNode instances, ``atlNode`` and ``tlNode``.
|
||||
The former connects into a tile-local arbiter along with the backside of the
|
||||
L1 instruction cache. The latter connects directly to the L1-L2 crossbar.
|
||||
The corresponding Tilelink ports in the module implementation's IO bundle
|
||||
are ``atl`` and ``tl``, respectively.
|
||||
The former connects into a tile-local arbiter along with the backside of the L1 instruction cache.
|
||||
The latter connects directly to the L1-L2 crossbar.
|
||||
The corresponding Tilelink ports in the module implementation's IO bundle are ``atl`` and ``tl``, respectively.
|
||||
|
||||
The other interfaces available to the accelerator are ``mem``, which provides
|
||||
access to the L1 cache; ``ptw`` which provides access to the page-table walker;
|
||||
the ``busy`` signal, which indicates when the accelerator is still handling an
|
||||
instruction; and the ``interrupt`` signal, which can be used to interrupt the CPU.
|
||||
The other interfaces available to the accelerator are ``mem``, which provides access to the L1 cache;
|
||||
``ptw`` which provides access to the page-table walker;
|
||||
the ``busy`` signal, which indicates when the accelerator is still handling an instruction;
|
||||
and the ``interrupt`` signal, which can be used to interrupt the CPU.
|
||||
|
||||
Look at the examples in rocket-chip/src/main/scala/tile/LazyRocc.scala for
|
||||
detailed information on the different IOs.
|
||||
Look at the examples in ``generators/rocket-chip/src/main/scala/tile/LazyRocc.scala`` for detailed information on the different IOs.
|
||||
|
||||
### Adding RoCC accelerator to Config
|
||||
Adding RoCC accelerator to Config
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
RoCC accelerators can be added to a core by overriding the ``BuildRoCC`` parameter
|
||||
in the configuration. This takes a sequence of functions producing ``LazyRoCC``
|
||||
objects, one for each accelerator you wish to add.
|
||||
RoCC accelerators can be added to a core by overriding the ``BuildRoCC`` parameter in the configuration.
|
||||
This takes a sequence of functions producing ``LazyRoCC`` objects, one for each accelerator you wish to add.
|
||||
|
||||
For instance, if we wanted to add the previously defined accelerator and
|
||||
route custom0 and custom1 instructions to it, we could do the following.
|
||||
For instance, if we wanted to add the previously defined accelerator and route custom0 and custom1 instructions to it, we could do the following.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
class WithCustomAccelerator extends Config((site, here, up) => {
|
||||
case BuildRoCC => Seq((p: Parameters) => LazyModule(
|
||||
new CustomAccelerator(OpcodeSet.custom0 | OpcodeSet.custom1)(p)))
|
||||
@@ -313,17 +281,13 @@ route custom0 and custom1 instructions to it, we could do the following.
|
||||
class CustomAcceleratorConfig extends Config(
|
||||
new WithCustomAccelerator ++ new DefaultExampleConfig)
|
||||
|
||||
|
||||
|
||||
|
||||
Adding a DMA port
|
||||
-------------------
|
||||
|
||||
IO devices or accelerators (like a disk or network
|
||||
driver), we may want to have the device write directly to the coherent
|
||||
memory system instead. To add a device like that, you would do the following.
|
||||
IO devices or accelerators (like a disk or network driver), we may want to have the device write directly to the coherent memory system instead.
|
||||
To add a device like that, you would do the following.
|
||||
|
||||
::
|
||||
.. code-block:: scala
|
||||
class DMADevice(implicit p: Parameters) extends LazyModule {
|
||||
val node = TLClientNode(TLClientParameters(
|
||||
name = "dma-device", sourceId = IdRange(0, 1)))
|
||||
@@ -355,8 +319,6 @@ memory system instead. To add a device like that, you would do the following.
|
||||
|
||||
|
||||
The ``ExtBundle`` contains the signals we connect off-chip that we get data from.
|
||||
The DMADevice also has a Tilelink client port that we connect into the L1-L2
|
||||
crossbar through the front-side buffer (fsb). The sourceId variable given in
|
||||
the TLClientNode instantiation determines the range of ids that can be used
|
||||
in acquire messages from this device. Since we specified [0, 1) as our range,
|
||||
only the ID 0 can be used.
|
||||
The DMADevice also has a Tilelink client port that we connect into the L1-L2 crossbar through the front-side buffer (fsb).
|
||||
The sourceId variable given in the ``TLClientNode`` instantiation determines the range of ids that can be used in acquire messages from this device.
|
||||
Since we specified [0, 1) as our range, only the ID 0 can be used.
|
||||
|
||||
Reference in New Issue
Block a user