From 319d2fedf745f52211ae71a306d22a3ee6348f56 Mon Sep 17 00:00:00 2001 From: alonamid Date: Sat, 16 Mar 2019 00:15:02 -0700 Subject: [PATCH] more docs --- .../Adding-An-Accelerator-Tutorial.rst | 362 ++++++++++++++++++ docs/ReBAR-Basics/rebar-generator-mixins.rst | 135 +++++++ docs/Simulation/Commercial-Simulators.rst | 33 ++ docs/Simulation/FPGA-Based-Simulators.rst | 13 + docs/Simulation/Open-Source-Simulators.rst | 33 ++ 5 files changed, 576 insertions(+) create mode 100644 docs/ReBAR-Basics/Adding-An-Accelerator-Tutorial.rst create mode 100644 docs/ReBAR-Basics/rebar-generator-mixins.rst create mode 100644 docs/Simulation/Commercial-Simulators.rst create mode 100644 docs/Simulation/FPGA-Based-Simulators.rst create mode 100644 docs/Simulation/Open-Source-Simulators.rst diff --git a/docs/ReBAR-Basics/Adding-An-Accelerator-Tutorial.rst b/docs/ReBAR-Basics/Adding-An-Accelerator-Tutorial.rst new file mode 100644 index 00000000..1df6b0d5 --- /dev/null +++ b/docs/ReBAR-Basics/Adding-An-Accelerator-Tutorial.rst @@ -0,0 +1,362 @@ + + +Adding An Accelerator/Device +=============================== + +Accelerators or custom IO devices can be added to your SoC in several ways: ++ MMIO Peripheral (a.k.a TileLink-Attached Accelerator) ++ Tightly-Coupled RoCC Accelerator + +These approaches differ in the method of the communication between the processor and the custom block. + +With the TileLink-Attached approach, the processor communicates with MMIO peripherals through memory-mapped registers. + +In contrast, the processor communicates with a RoCC accelerators through a custom protocol and custom non-standard ISA instructions reserved in the RISC-V ISA encoding space. Each core can have up to four accelerators that are controlled by custom instructions and share resources with the CPU. +RoCC coprocessor instructions have the following form. + +:: + customX rd, rs1, rs2, funct + +The X will be a number 0-3, and determines the opcode of the instruction, +which controls which accelerator an instruction will be routed to. +The ``rd``, ``rs1``, and ``rs2`` fields are the register numbers of the destination +register and two source registers. The ``funct`` field is a 7-bit integer that +the accelerator can use to distinguish different instructions from each other. + +Note that communication through a RoCC interfaces requires a custom software toolchain, whereas MMIO peripherals can use that standard toolchain with approriate driver support. + + +Integrating into the Generator Build System +------------------------------------------- + +While developing, you want to include Chisel code in a submodule so that it +can be shared by different projects. To add a submodule to the project +template, make sure that your project is organized as follows. + + yourproject/ + build.sbt + src/main/scala/ + YourFile.scala + +Put this in a git repository and make it accessible. Then add it as a submodule +to under the following directory hierarchy: ``rebar/generators/yourproject``. + +:: + git submodule add https://git-repository.com/yourproject.git + +Then add `yourproject` to the ReBAR top-level build.sbt file. + +:: + lazy val yourproject = project.settings(commonSettings).dependsOn(rocketchip) + + +You can then import the classes defined in the submodule in a new project if +you add it as a dependency. For instance, if you want to use this code in +the `example` project, change the final line in build.sbt to the following. + +:: + lazy val example = (project in file(".")).settings(commonSettings).dependsOn(testchipip, yourproject) + + +Finally, add `yourproject` to the `PACKAGES` variable in the `Makefrag`. This will allow make to detect +that your source files have changed when building the verilog/firrtl files. + + + +MMIO Peripheral +------------------ + +The easiest way to create a TileLink peripheral is to use the +TLRegisterRouter, which abstracts away the details of handling the TileLink +protocol and provides a convenient interface for specifying memory-mapped +registers. To create a RegisterRouter-based peripheral, you will need to +specify a parameter case class for the configuration settings, a bundle trait +with the extra top-level ports, and a module implementation containing the +actual RTL. + +:: + case class PWMParams(address: BigInt, beatBytes: Int) + + trait PWMTLBundle extends Bundle { + val pwmout = Output(Bool()) + } + + trait PWMTLModule { + val io: PWMTLBundle + implicit val p: Parameters + def params: PWMParams + + val w = params.beatBytes * 8 + val period = Reg(UInt(w.W)) + val duty = Reg(UInt(w.W)) + val enable = RegInit(false.B) + + // ... Use the registers to drive io.pwmout ... + + regmap( + 0x00 -> Seq( + RegField(w, period)), + 0x04 -> Seq( + RegField(w, duty)), + 0x08 -> Seq( + RegField(1, enable))) + } + + +Once you have these classes, you can construct the final peripheral by +extending the TLRegisterRouter and passing the proper arguments. The first +set of arguments determines where the register router will be placed in the +global address map and what information will be put in its device tree entry. +The second set of arguments is the IO bundle constructor, which we create +by extending TLRegBundle with our bundle trait. The final set of arguments +is the module constructor, which we create by extends TLRegModule with our +module trait. + +:: + class PWMTL(c: PWMParams)(implicit p: Parameters) + extends TLRegisterRouter( + c.address, "pwm", Seq("ucbbar,pwm"), + beatBytes = c.beatBytes)( + new TLRegBundle(c, _) with PWMTLBundle)( + new TLRegModule(c, _, _) with PWMTLModule) + + +The full module code with comments can be found in src/main/scala/example/PWM.scala. + +After creating the module, we need to hook it up to our SoC. Rocketchip +accomplishes this using the [cake pattern](http://www.cakesolutions.net/teamblogs/2011/12/19/cake-pattern-in-depth). +This basically involves placing code inside traits. In the RocketChip cake, +there are two kinds of traits: a LazyModule trait and a module implementation +trait. + +The LazyModule trait runs setup code that must execute before all the hardware +gets elaborated. For a simple memory-mapped peripheral, this just involves +connecting the peripheral's TileLink node to the MMIO crossbar. + +:: + trait HasPeripheryPWM extends HasSystemNetworks { + implicit val p: Parameters + + private val address = 0x2000 + + val pwm = LazyModule(new PWMTL( + PWMParams(address, peripheryBusConfig.beatBytes))(p)) + + pwm.node := TLFragmenter( + peripheryBusConfig.beatBytes, cacheBlockBytes)(peripheryBus.node) + } + + +Note that the PWMTL class we created from the register router is itself a +LazyModule. Register routers have a TileLike node simply named "node", which +we can hook up to the RocketChip peripheryBus. This will automatically add +address map and device tree entries for the peripheral. + +The module implementation trait is where we instantiate our PWM module and +connect it to the rest of the SoC. Since this module has an extra `pwmout` +output, we declare that in this trait, using Chisel's multi-IO +functionality. We then connect the PWMTL's pwmout to the pwmout we declared. + +:: + trait HasPeripheryPWMModuleImp extends LazyMultiIOModuleImp { + implicit val p: Parameters + val outer: HasPeripheryPWM + + val pwmout = IO(Output(Bool())) + + pwmout := outer.pwm.module.io.pwmout + } + + +Now we want to mix our traits into the system as a whole. This code is from +src/main/scala/example/Top.scala. + +:: + class ExampleTopWithPWM(q: Parameters) extends ExampleTop(q) + with PeripheryPWM { + override lazy val module = Module( + new ExampleTopWithPWMModule(p, this)) + } + + class ExampleTopWithPWMModule(l: ExampleTopWithPWM) + extends ExampleTopModule(l) with HasPeripheryPWMModuleImp + + +Just as we need separate traits for LazyModule and module implementation, we +need two classes to build the system. The ExampleTop classes already have the +basic peripherals included for us, so we will just extend those. + +The ExampleTop class includes the pre-elaboration code and also a lazy val to +produce the module implementation (hence LazyModule). The ExampleTopModule +class is the actual RTL that gets synthesized. + +Finally, we need to add a configuration class in +src/main/scala/example/Configs.scala that tells the TestHarness to instantiate +ExampleTopWithPWM instead of the default ExampleTop. + +:: + class WithPWM extends Config((site, here, up) => { + case BuildTop => (p: Parameters) => + Module(LazyModule(new ExampleTopWithPWM()(p)).module) + }) + + class PWMConfig extends Config(new WithPWM ++ new BaseExampleConfig) + + +Now we can test that the PWM is working. The test program is in tests/pwm.c + +:: + #define PWM_PERIOD 0x2000 + #define PWM_DUTY 0x2008 + #define PWM_ENABLE 0x2010 + + static inline void write_reg(unsigned long addr, unsigned long data) + { + volatile unsigned long *ptr = (volatile unsigned long *) addr; + *ptr = data; + } + + static inline unsigned long read_reg(unsigned long addr) + { + volatile unsigned long *ptr = (volatile unsigned long *) addr; + return *ptr; + } + + int main(void) + { + write_reg(PWM_PERIOD, 20); + write_reg(PWM_DUTY, 5); + write_reg(PWM_ENABLE, 1); + } + + +This just writes out to the registers we defined earlier. The base of the +module's MMIO region is at 0x2000. This will be printed out in the address +map portion when you generated the verilog code. + +Compiling this program with make produces a `pwm.riscv` executable. + +Now with all of that done, we can go ahead and run our simulation. + +:: + cd verisim + make CONFIG=PWMConfig + ./simulator-example-PWMConfig ../tests/pwm.riscv + + + + +Adding a RoCC Accelerator +---------------------------- + +RoCC accelerators are lazy modules that extend the LazyRoCC class. +Their implementation should extends the LazyRoCCModule class. + +:: + class CustomAccelerator(opcodes: OpcodeSet) + (implicit p: Parameters) extends LazyRoCC(opcodes) { + override lazy val module = new CustomAcceleratorModule(this) + } + + class CustomAcceleratorModule(outer: CustomAccelerator) + extends LazyRoCCModuleImp(outer) { + val cmd = Queue(io.cmd) + // The parts of the command are as follows + // inst - the parts of the instruction itself + // opcode + // rd - destination register number + // rs1 - first source register number + // rs2 - second source register number + // funct + // xd - is the destination register being used? + // xs1 - is the first source register being used? + // xs2 - is the second source register being used? + // rs1 - the value of source register 1 + // rs2 - the value of source register 2 + ... + } + + +The ``opcodes`` parameter for ``LazyRoCC`` is +the set of custom opcodes that will map to this accelerator. More on this +in the next subsection. + +The ``LazyRoCC`` class contains two TLOutputNode instances, ``atlNode`` and ``tlNode``. +The former connects into a tile-local arbiter along with the backside of the +L1 instruction cache. The latter connects directly to the L1-L2 crossbar. +The corresponding Tilelink ports in the module implementation's IO bundle +are ``atl`` and ``tl``, respectively. + +The other interfaces available to the accelerator are ``mem``, which provides +access to the L1 cache; ``ptw`` which provides access to the page-table walker; +the ``busy`` signal, which indicates when the accelerator is still handling an +instruction; and the ``interrupt`` signal, which can be used to interrupt the CPU. + +Look at the examples in rocket-chip/src/main/scala/tile/LazyRocc.scala for +detailed information on the different IOs. + +### Adding RoCC accelerator to Config + +RoCC accelerators can be added to a core by overriding the ``BuildRoCC`` parameter +in the configuration. This takes a sequence of functions producing ``LazyRoCC`` +objects, one for each accelerator you wish to add. + +For instance, if we wanted to add the previously defined accelerator and +route custom0 and custom1 instructions to it, we could do the following. + +:: + class WithCustomAccelerator extends Config((site, here, up) => { + case BuildRoCC => Seq((p: Parameters) => LazyModule( + new CustomAccelerator(OpcodeSet.custom0 | OpcodeSet.custom1)(p))) + }) + + class CustomAcceleratorConfig extends Config( + new WithCustomAccelerator ++ new DefaultExampleConfig) + + + + +Adding a DMA port +------------------- + +IO devices or accelerators (like a disk or network +driver), we may want to have the device write directly to the coherent +memory system instead. To add a device like that, you would do the following. + +:: + class DMADevice(implicit p: Parameters) extends LazyModule { + val node = TLClientNode(TLClientParameters( + name = "dma-device", sourceId = IdRange(0, 1))) + + lazy val module = new DMADeviceModule(this) + } + + class DMADeviceModule(outer: DMADevice) extends LazyModuleImp(outer) { + val io = IO(new Bundle { + val mem = outer.node.bundleOut + val ext = new ExtBundle + }) + + // ... rest of the code ... + } + + trait HasPeripheryDMA extends HasSystemNetworks { + implicit val p: Parameters + + val dma = LazyModule(new DMADevice) + + fsb.node := dma.node + } + + trait HasPeripheryDMAModuleImp extends LazyMultiIOModuleImp { + val ext = IO(new ExtBundle) + ext <> outer.dma.module.io.ext + } + + +The ``ExtBundle`` contains the signals we connect off-chip that we get data from. +The DMADevice also has a Tilelink client port that we connect into the L1-L2 +crossbar through the front-side buffer (fsb). The sourceId variable given in +the TLClientNode instantiation determines the range of ids that can be used +in acquire messages from this device. Since we specified [0, 1) as our range, +only the ID 0 can be used. diff --git a/docs/ReBAR-Basics/rebar-generator-mixins.rst b/docs/ReBAR-Basics/rebar-generator-mixins.rst new file mode 100644 index 00000000..e950ffda --- /dev/null +++ b/docs/ReBAR-Basics/rebar-generator-mixins.rst @@ -0,0 +1,135 @@ + + +SoC Generator Config Mix-ins: +============================== + +Rocket Chip +----------------------- + ++ System-on-Chip + - HasTiles + - HasClockDomainCrossing + - HasResetVectorWire + - HasNoiseMakerIO + + ++ Basic Core + - HasRocketTiles + - HasRocketCoreParameters + - HasCoreIO + + ++ Branch Prediction + - HasBtbParameters + + ++ Additional Compute + - HasFPUCtrlSigs + - HasFPUParameters + - HasLazyRoCC + - HasFpuOpt + + ++ Memory System + - HasRegMap + - HasCoreMemOp + - HasHellaCache + - HasL1ICacheParameters + - HasICacheFrontendModule + - HasAXI4ControlRegMap + - HasTLControlRegMap + - HasTLBusParams + - HasTLXbarPhy + + ++ Interrupts + - HasInterruptSources + - HasExtInterrupts + - HasAsyncExtInterrupts + - HasSyncExtInterrupts + + ++ Periphery + - HasPeripheryDebug + - HasPeripheryBootROM + - HasBuiltInDeviceParams + + +BOOM +----------------------- ++ Basic Core + - HasBoomTiles + - HasBoomCoreParameters + - HasBoomCoreIO + - HasBoomUOP + - HasRegisterFileIO + + ++ Branch Prediction + - HasGShareParameters + - HasBoomBTBParameters + + ++ Memory System + - HasL1ICacheBankedParameters + - HasBoomICacheFrontend + - HasBoomHellaCache + + +SiFive Blocks +----------------------- + ++ Peripherals + - HasPeripheryGPIO + - HasPeripheryI2C + - HasPeripheryMockAON + - HasPeripheryPWM + - HasPeripherySPI + - HasSPIProtocol + - HasSPIEndian + - HasSPILength + - HasSPICSMode + - HasPeripherySPIFlash + - HasPeripheryUART + + +testchipip +----------------------- + ++ Peripherals + - HasPeripheryBlockDevice + - HasPeripherySerial + - HasNoDebug + + +Icenet +----------------------- + ++ Periphery Network Interface Controller + - HasPeripheryIceNIC + + +AWL +----------------------- + ++ IO + - HasEncoding8b10b + - HasTLBidirectionalPacketizer + - HasTLController + - HasGenericTransceiverSubsystem + ++ Debug/Testing + - HasBertDebug + - HasPatternMemDebug + - HasBitStufferDebug4Modes + - HasBitReversalDebug + + + + + + + + + + diff --git a/docs/Simulation/Commercial-Simulators.rst b/docs/Simulation/Commercial-Simulators.rst new file mode 100644 index 00000000..e107d703 --- /dev/null +++ b/docs/Simulation/Commercial-Simulators.rst @@ -0,0 +1,33 @@ +Commericial Simulators +============================== +The ReBAR framework currently supports only the VCS commerical simulator + +VCS +----------------------- +VCS is a commercial RTL simulator developed by Synopsys. It requires commerical licenses. +The ReBAR framework can compile and execute simulations using VCS. VCS simulation will generally compile +faster than Verilator simulations. + +To run a simulation using VCS, perform the following steps: + +Make sure that the VCS simulator is on your `PATH`. + +To compile the example design, run make in the ``sims/vsim`` directory. +This will elaborate the DefaultExampleConfig in the example project. + +An executable called simulator-example-DefaultExampleConfig will be produced. +This executable is a simulator that has been compiled based on the design that was built. +You can then use this executable to run any compatible RV64 code. For instance, +to run one of the riscv-tools assembly tests. + +:: + ./simulator-example-DefaultExampleConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/isa/rv64ui-p-simple + +If you later create your own project, you can use environment variables to +build an alternate configuration. + +:: + make PROJECT=yourproject CONFIG=YourConfig + ./simulator-yourproject-YourConfig ... + +If you would like to extract waveforms from the simulation, run the command ``make debug`` instead of just ``make``. This will generate a vpd file (this is a proprietry waveform representation format used by Synopsys) that can be loaded to vpd-supported waveform viewers. If you have Synopsys licenses, we recommend using the DVE waveform viewers diff --git a/docs/Simulation/FPGA-Based-Simulators.rst b/docs/Simulation/FPGA-Based-Simulators.rst new file mode 100644 index 00000000..fbacafa4 --- /dev/null +++ b/docs/Simulation/FPGA-Based-Simulators.rst @@ -0,0 +1,13 @@ +FPGA-Based Simulators +============================== + +FireSim +----------------------- +FireSim is an open-source cycle-accurate FPGA-accelerated full-system hardware simulation platform that runs on cloud FPGAs (Amazon EC2 F1). +FireSim allows RTL-level simulation at orders-of-magnitude faster speeds than software RTL simulators. FireSim also provide additional device models to allow full-system simulation, including memory models and network models. + +FireSim currently supports running only on Amazon EC2 F1 FPGA-enabled virtual instances on the public cloud. In order to simulate your ReBAR design using FireSim, you should follow the following steps: + +Follow the initial EC2 setup instructions as detailed in the FireSim documentatino . Then clone your full ReBAR repository onto your Amazon EC2 FireSim manager instance. + +Enter the ``sims/FireSim`` directory, and follow the FireSim instructions for running a simulation diff --git a/docs/Simulation/Open-Source-Simulators.rst b/docs/Simulation/Open-Source-Simulators.rst new file mode 100644 index 00000000..2b8e1f4e --- /dev/null +++ b/docs/Simulation/Open-Source-Simulators.rst @@ -0,0 +1,33 @@ +Open Source Simulators +============================== + +Verilator +----------------------- +Verilator is an open-source LGPL-Licensed simulator maintained by `Veripool `__ +The ReBAR framework can download, build, and execute simulations using Verilator. + +To run a simulation using verilator, perform the following steps: + +To compile the example design, run make in the ``sims/verisim`` directory. +This will elaborate the DefaultExampleConfig in the example project. + +An executable called simulator-example-DefaultExampleConfig will be produced. +This executable is a simulator that has been compiled based on the design that was built. +You can then use this executable to run any compatible RV64 code. For instance, +to run one of the riscv-tools assembly tests. + +:: + ./simulator-example-DefaultExampleConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/isa/rv64ui-p-simple + +If you later create your own project, you can use environment variables to +build an alternate configuration. + +:: + make PROJECT=yourproject CONFIG=YourConfig + ./simulator-yourproject-YourConfig ... + + +If you would like to extract waveforms from the simulation, run the command ``make debug`` instead of just ``make``. This will generate a vcd file (vcd is a standard waveform representation file format) that can be loaded to any common waveform viewer. An open-source vcd-capable waveform viewer is `GTKWave __ + + +