Barstools and chip docs (#446)
* WIP; Add some information on building a chip [skip ci] * [docs] [ci skip] re-org chip/barstools docs Also add macrocompiler flag descriptions * [docs] Add MDF description [ci-skip] * [docs] Address some of Abe and Harrison's feedback [ci skip] * [docs] Update hierarchy/transform section [ci skip] * [docs] Address all feedback except cost function [ci skip] * [docs] Describe macro compiler cost function [ci skip] Co-authored-by: John Wright <johnwright@eecs.berkeley.edu>
This commit is contained in:
@@ -3,3 +3,133 @@ Barstools
|
||||
|
||||
Barstools is a collection of useful FIRRTL transformations and compilers to help the build process.
|
||||
Included in the tools are a MacroCompiler (used to map Chisel memory constructs to vendor SRAMs), FIRRTL transforms (to separate harness and top-level SoC files), and more.
|
||||
|
||||
Mapping technology SRAMs (MacroCompiler)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If you are planning on building a real chip, it is likely that you will plan on using some amount of static random access memory, or SRAM.
|
||||
SRAM macros offer superior storage density over flip-flop arrays at the cost of restricting the number of read or write transactions that can happen in a cycle.
|
||||
Unlike in Verilog, these types of sequential memory elements are first-class primitives in Chisel and FIRRTL (``SeqMem`` elements).
|
||||
This allows Chisel designs to contain abstract instantiations of sequential memory elements without knowing the underlying implementation or process technology.
|
||||
|
||||
Modern CAD tools typically cannot synthesize SRAMs from a high-level RTL description.
|
||||
This, unfortunately, requires the designer to include the SRAM instantiation in the source RTL, which removes its process portability.
|
||||
In Verilog-entry designs, it is possible to create a layer of abstraction that allows a new process technology to implement a specific sequential memory block in a wrapper module.
|
||||
However, this method can be fragile and laborious.
|
||||
|
||||
The FIRRTL compiler contains a transformation to replace the ``SeqMem`` primitives called ``ReplSeqMem``.
|
||||
This simply converts all ``SeqMem`` instances above a size threshold into external module references.
|
||||
An external module reference is a FIRRTL construct that enables a design to reference a module without describing its contents, only its inputs and outputs.
|
||||
A list of unique SRAM configurations is output to a ``.conf`` file by FIRRTL, which is used to map technology SRAMs.
|
||||
Without this transform, FIRRTL will map all ``SeqMem`` s to flip-flop arrays with equivalent behavior, which may lead to a design that is difficult to route.
|
||||
|
||||
The ``.conf`` file is consumed by a tool called MacroCompiler, which is part of the :ref:`Barstools` scala package.
|
||||
MacroCompiler is also passed an ``.mdf`` file that describes the available list of technology SRAMs or the capabilities of the SRAM compiler, if one is provided by the foundry.
|
||||
Typically a foundry SRAM compiler will be able to generate a set of different SRAMs collateral based on some requirements on size, aspect ratio, etc. (see :ref:`SRAM MDF Fields`).
|
||||
Using a user-customizable cost function, MacroCompiler will select the SRAMs that are the best fit for each dimensionality in the ``.conf`` file.
|
||||
This may include over provisioning (e.g. using a 64x1024 SRAM for a requested 60x1024, if the latter is not available) or arraying.
|
||||
Arraying can be done in both width and depth, as well as to solve masking constraints.
|
||||
For example, a 128x2048 array could be composed of four 64x1024 arrays, with two macros in parallel to create two 128x1024 virtual SRAMs which are combinationally muxed to add depth.
|
||||
If this macro requires byte-granularity write masking, but no technology SRAMs support masking, then the tool may choose to use thirty-two 8x1024 arrays in a similar configuration.
|
||||
For information on writing ``.mdf`` files, look at `MDF on github <https://github.com/ucb-bar/plsi-mdf>`__ and a brief description in :ref:`SRAM MDF Fields` section.
|
||||
|
||||
The output of MacroCompiler is a Verilog file containing modules that wrap the technology SRAMs into the specified interface names from the ``.conf``.
|
||||
If the technology supports an SRAM compiler, then MacroCompiler will also emit HammerIR that can be passed to Hammer to run the compiler itself and generate design collateral.
|
||||
Documentation for SRAM compilers is forthcoming.
|
||||
|
||||
MacroCompiler Options
|
||||
+++++++++++++++++++++
|
||||
MacroCompiler accepts many command-line parameters which affect how it maps ``SeqMem`` s to technology specific macros.
|
||||
This highest level option ``--mode`` specifies in general how MacroCompiler should map the input ``SeqMem`` s to technology macros.
|
||||
The ``strict`` value forces MacroCompiler to map all memories to technology macros and error if it is unable to do so.
|
||||
The ``synflops`` value forces MacroCompiler to map all memories to flip flops.
|
||||
The ``compileandsynflops`` value instructs MacroCompiler to use the technology compiler to determine sizes of technology macros used but to then create mock versions of these macros with flip flops.
|
||||
The ``fallbacksynflops`` value causes MacroCompiler to compile all possible memories to technology macros but when unable to do so to use flip flops to implement the remaining memories.
|
||||
The final and default value, ``compileavailable``, instructs MacroCompiler to compile all memories to the technology macros and do nothing if it is unable to map them.
|
||||
|
||||
Most of the rest of the options are used to control where different inputs and outputs are expected and produced.
|
||||
The option ``--macro-conf`` is the file that contains the set of input ``SeqMem`` configurations to map in the ``.conf`` format described above.
|
||||
The option ``--macro-mdf`` also describes the input ``SeqMem`` s but is instead in the ``.mdf`` format.
|
||||
The option ``--library`` is an ``.mdf`` description of the available technology macros that can be mapped to.
|
||||
This file could be a list of fixed size memories often referred to as a cache of macros, or a description of what size memories could be made available through some technology specific process (usually an SRAM compiler), or a mix of both.
|
||||
The option ``--use-compiler`` instructs MacroCompiler that it is allowed to use any compilers listed in the ``--library`` specification.
|
||||
If this option is not set MacroCompiler will only map to macros directly listed in the ``--library`` specification.
|
||||
The ``--verilog`` option specifies where MacroCompiler will write the verilog containing the new technology mapped memories.
|
||||
The ``--firrtl`` option similarly specifies where MacroCompiler will write the FIRRTL that will be used to generate this verilog.
|
||||
This option is optional and no FIRRTL will be emitted if it is not specified.
|
||||
The ``--hammer-ir`` option specifies where MacroCompiler will write the details of which macros need to be generated from a technology compiler.
|
||||
This option is not needed if ``--use-compiler`` is not specified.
|
||||
This file can then be passed to HAMMER to have it run the technology compiler producing the associated macro collateral.
|
||||
The ``--cost-func`` option allows the user to specify a different cost function for the mapping task.
|
||||
Because the mapping of memories is a multi-dimensional space spanning performance, power, and area, the cost function setting of MacroCompiler allows the user to tune the mapping to their preference.
|
||||
The default option is a reasonable heuristic that attempts to minimize the number of technology macros instantiated per ``SeqMem`` without wasting too many memory bits.
|
||||
There are two ways to add additional cost functions.
|
||||
First, you can simply write another one in scala and call `registerCostMetric` which then enables you to pass its name to this command-line flag.
|
||||
Second, there is a pre-defined `ExternalMetric` which will execute a program (passed in as a path) with the MDF description of the memory being compiled and the memory being proposed as a mapping.
|
||||
The program should print a floating point number which is the cost for this mapping, if no number is printed MacroCompiler will assume this is an illegal mapping.
|
||||
The ``--cost-param`` option allows the user to specify parameters to pass to the cost function if the cost function supports that.
|
||||
The ``--force-synflops [mem]`` options allows the user to override any heuristics in MacroCompiler and force it to map the given memory to flip-flops.
|
||||
Likewise, the ``--force-compile [mem]`` option allows the user to force MacroCompiler to map the given ``mem`` to a technology macro.
|
||||
|
||||
SRAM MDF Fields
|
||||
+++++++++++++++
|
||||
|
||||
Technology SRAM macros described in MDF can be defined at three levels of detail.
|
||||
A single instance can be defined with the `SRAMMacro` format.
|
||||
A group of instances that share the number and type of ports but vary in width and depth can be defined with the `SRAMGroup` format.
|
||||
A set of groups of SRAMs that can be generated together from a single source like a compiler can be defined with the `SRAMCompiler` format.
|
||||
|
||||
At the most concrete level the `SRAMMAcro` defines a particular instance of an SRAM.
|
||||
That includes its functional attributes such as its width, depth, and number of access ports.
|
||||
These ports can be read, write, or read and write ports, and the instance can have any number.
|
||||
In order to correctly map to these functional ports to the physical instance each port is described in a list of sub-structures, in the parent instance's structure.
|
||||
Each port is only required to have an address and data field, but can have many other optional fields.
|
||||
These optional fields include a clock, write enable, read enable, chip enable, mask.
|
||||
The mask field can have a different granularity than the data field, e.g. it could be a bit mask or a byte mask.
|
||||
Each field must also specify its polarity, whether it is active high or active low.
|
||||
|
||||
In addition to these functional descriptions of the SRAM there are also other fields that specify physical/implementation characteristics.
|
||||
These include the threshold voltage, the mux factor, as well as a list of extra non-functional ports.
|
||||
|
||||
The next level of detail, an `SRAMGroup` includes a range of depths and widths, as well as a set of threshold voltages.
|
||||
A range has a lower bound, upper bound, and a step size.
|
||||
The least concrete level, an `SRAMCompiler` is simply a set of `SRAMGroups`.
|
||||
|
||||
Separating the Top module from the TestHarness module
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Unlike the FireSim and Software simulation flows, a VLSI flow needs to separate the test harness and the chip (a.k.a. DUT) into separate files.
|
||||
This is necessary to facilitate post-synthesis and post-place-and-route simulation, as the module names in the RTL and gate-level verilog files would collide.
|
||||
Simulations after you the design goes through a VLSI flow will use the verilog netlist generated from the flow and will need an untouched test harness to drive it.
|
||||
Separating these components into separate files makes this straightforward.
|
||||
Without the separation the file that included the test harness would also redefine the DUT which is often disallowed in simulation tools.
|
||||
To do this, there is a FIRRTL ``App`` in :ref:`Barstools` called ``GenerateTopAndHarness``, which runs the appropriate transforms to elaborate the modules separately.
|
||||
This also renames modules in the test harness so that any modules that are instantiated in both the test harness and the chip are uniquified.
|
||||
|
||||
.. Note:: For VLSI projects, this ``App`` is run instead of the normal FIRRTL ``App`` to elaborate Verilog.
|
||||
|
||||
Macro Description Format
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The SRAM technology macros and IO cells are described in a json format called Macro Description Format (MDF).
|
||||
MDF is specialized for each type of macro it supports.
|
||||
The specialization is defined in their respective sections.
|
||||
|
||||
|
||||
|
||||
Mapping technology IO cells
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Like technology SRAMs, IO cells are almost always included in digital ASIC designs to allow pin configurability, increase the voltage level of the IO signal, and provide ESD protection.
|
||||
Unlike SRAMs, there is no corresponding primitive in Chisel or FIRRTL.
|
||||
However, this problem can be solved similarly to ``SeqMems`` by leveraging the strong typing available in these scala-based tools.
|
||||
We are actively developing a FIRRTL transform that will automatically configure, map, and connect technology IO cells.
|
||||
Stay tuned for more information!
|
||||
|
||||
In the meantime, it is recommended that you instantiate the IO cells in your Chisel design.
|
||||
This, unfortunately, breaks the process-agnostic RTL abstraction, so it is recommended that inclusion of these cells be configurable using the ``rocket-chip`` parameterization system.
|
||||
The simplest way to do this is to have a config fragment that when included updates instantiates the IO cells and connects them in the test harness.
|
||||
When simulating chip-specific designs, it is important to include the IO cells.
|
||||
The IO cell behavioral models will often assert if they are connected incorrectly, which is a useful runtime check.
|
||||
They also keep the IO interface at the chip and test harness boundary (see :ref:`Separating the top module from the test harness`) consistent after synthesis and place-and-route,
|
||||
which allows the RTL simulation test harness to be reused.
|
||||
|
||||
@@ -1,6 +1,54 @@
|
||||
.. _build-a-chip:
|
||||
|
||||
Building A Chip
|
||||
==============================
|
||||
===============
|
||||
|
||||
In this section, we will discuss many of the ASIC-specific transforms and methodologies within Chipyard.
|
||||
For the full documentation on how to use the VLSI tool flow, see the `Hammer Documentation <https://hammer-vlsi.readthedocs.io/>`__.
|
||||
|
||||
Transforming the RTL
|
||||
--------------------
|
||||
|
||||
Building a chip requires specializing the generic verilog emitted by FIRRTL to adhere to the constraints imposed by the technology used for fabrication.
|
||||
This includes mapping Chisel memories to available technology macros such as SRAMs, mapping the input and output of your chip to connect to technology IO cells, see :ref:`Barstools`.
|
||||
In addition to these required transformations, it may also be beneficial to transform the RTL to make it more amenable to hierarchical physical design easier.
|
||||
This often includes modifying the logical hierarchy to match the physical hierarchy through grouping components together or flattening components into a single larger module.
|
||||
|
||||
|
||||
Modifying the logical hierarchy
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Building a large or complex chip often requires using hierarchical design to place and route sections of the chip separately.
|
||||
In addition, the design as written in Chipyard may not have a hierarchy that matches the physical hierarchy that would work best in the place and route tool.
|
||||
In order to reorganize the design to have its logical hierarchy match its physical hierarchy there are several FIRRTL transformations that can be run.
|
||||
These include grouping, which pull several modules into a larger one, and flattening, which dissolves a modules boundary leaving its components in its containing module.
|
||||
These transformations can be applied repeatedly to different parts of the design to arrange it as the physical designer sees fit.
|
||||
More details on how to use these transformations to reorganize the design hierarchy are forthcoming.
|
||||
|
||||
|
||||
Creating a floorplan
|
||||
--------------------
|
||||
|
||||
An ASIC floorplan is a specification that the place-and-route tools will follow when placing instances in the design.
|
||||
This includes the top-level chip dimensions, placement of SRAM macros, placement of custom (analog) circuits, IO cell placement, bump or wirebond pad placement, blockages, hierarchical boundaries, and pin placement.
|
||||
|
||||
Much of the design effort that goes into building a chip involves developing optimal floorplans for the instance of the design that is being manufactured.
|
||||
Often this is a highly manual and iterative process which consumes much of the physical designer's time.
|
||||
This cost becomes increasingly apparent as the parameterization space grows rapidly when using tools like Chisel- cycle times are hampered by the human labor
|
||||
that is required to floorplan each instance of the design.
|
||||
The Hammer team is actively developing methods of improving the agility of floorplanning for generator-based designs, like those that use Chisel.
|
||||
The libraries we are developing will emit Hammer IR that can be passed directly to the Hammer tool without the need for human intervention.
|
||||
Stay tuned for more information.
|
||||
|
||||
In the meantime, see the `Hammer Documentation <https://hammer-vlsi.readthedocs.io/>`__ for information on the Hammer IR floorplan API.
|
||||
It is possible to write this IR directly, or to generate it using simple python scripts.
|
||||
While we certainly look forward to having a more featureful toolkit, we have built many chips to date in this way.
|
||||
|
||||
|
||||
Running the VLSI flow
|
||||
---------------------
|
||||
|
||||
For the full documentation on how to use the VLSI tool flow, see the `Hammer Documentation <https://hammer-vlsi.readthedocs.io/>`__.
|
||||
For an example of how to use the VLSI in the context of Chipyard, see :ref:`ASAP7 Tutorial`.
|
||||
|
||||
|
||||
.. Note:: Please refer to the other sections in VLSI for tools/flows on how to build a chip. This section will be filled in ASAP.
|
||||
|
||||
Reference in New Issue
Block a user