diff --git a/docs/Tools/Barstools.rst b/docs/Tools/Barstools.rst index 81435130..34562227 100644 --- a/docs/Tools/Barstools.rst +++ b/docs/Tools/Barstools.rst @@ -3,3 +3,133 @@ Barstools Barstools is a collection of useful FIRRTL transformations and compilers to help the build process. Included in the tools are a MacroCompiler (used to map Chisel memory constructs to vendor SRAMs), FIRRTL transforms (to separate harness and top-level SoC files), and more. + +Mapping technology SRAMs (MacroCompiler) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you are planning on building a real chip, it is likely that you will plan on using some amount of static random access memory, or SRAM. +SRAM macros offer superior storage density over flip-flop arrays at the cost of restricting the number of read or write transactions that can happen in a cycle. +Unlike in Verilog, these types of sequential memory elements are first-class primitives in Chisel and FIRRTL (``SeqMem`` elements). +This allows Chisel designs to contain abstract instantiations of sequential memory elements without knowing the underlying implementation or process technology. + +Modern CAD tools typically cannot synthesize SRAMs from a high-level RTL description. +This, unfortunately, requires the designer to include the SRAM instantiation in the source RTL, which removes its process portability. +In Verilog-entry designs, it is possible to create a layer of abstraction that allows a new process technology to implement a specific sequential memory block in a wrapper module. +However, this method can be fragile and laborious. + +The FIRRTL compiler contains a transformation to replace the ``SeqMem`` primitives called ``ReplSeqMem``. +This simply converts all ``SeqMem`` instances above a size threshold into external module references. +An external module reference is a FIRRTL construct that enables a design to reference a module without describing its contents, only its inputs and outputs. +A list of unique SRAM configurations is output to a ``.conf`` file by FIRRTL, which is used to map technology SRAMs. +Without this transform, FIRRTL will map all ``SeqMem`` s to flip-flop arrays with equivalent behavior, which may lead to a design that is difficult to route. + +The ``.conf`` file is consumed by a tool called MacroCompiler, which is part of the :ref:`Barstools` scala package. +MacroCompiler is also passed an ``.mdf`` file that describes the available list of technology SRAMs or the capabilities of the SRAM compiler, if one is provided by the foundry. +Typically a foundry SRAM compiler will be able to generate a set of different SRAMs collateral based on some requirements on size, aspect ratio, etc. (see :ref:`SRAM MDF Fields`). +Using a user-customizable cost function, MacroCompiler will select the SRAMs that are the best fit for each dimensionality in the ``.conf`` file. +This may include over provisioning (e.g. using a 64x1024 SRAM for a requested 60x1024, if the latter is not available) or arraying. +Arraying can be done in both width and depth, as well as to solve masking constraints. +For example, a 128x2048 array could be composed of four 64x1024 arrays, with two macros in parallel to create two 128x1024 virtual SRAMs which are combinationally muxed to add depth. +If this macro requires byte-granularity write masking, but no technology SRAMs support masking, then the tool may choose to use thirty-two 8x1024 arrays in a similar configuration. +For information on writing ``.mdf`` files, look at `MDF on github `__ and a brief description in :ref:`SRAM MDF Fields` section. + +The output of MacroCompiler is a Verilog file containing modules that wrap the technology SRAMs into the specified interface names from the ``.conf``. +If the technology supports an SRAM compiler, then MacroCompiler will also emit HammerIR that can be passed to Hammer to run the compiler itself and generate design collateral. +Documentation for SRAM compilers is forthcoming. + +MacroCompiler Options ++++++++++++++++++++++ +MacroCompiler accepts many command-line parameters which affect how it maps ``SeqMem`` s to technology specific macros. +This highest level option ``--mode`` specifies in general how MacroCompiler should map the input ``SeqMem`` s to technology macros. +The ``strict`` value forces MacroCompiler to map all memories to technology macros and error if it is unable to do so. +The ``synflops`` value forces MacroCompiler to map all memories to flip flops. +The ``compileandsynflops`` value instructs MacroCompiler to use the technology compiler to determine sizes of technology macros used but to then create mock versions of these macros with flip flops. +The ``fallbacksynflops`` value causes MacroCompiler to compile all possible memories to technology macros but when unable to do so to use flip flops to implement the remaining memories. +The final and default value, ``compileavailable``, instructs MacroCompiler to compile all memories to the technology macros and do nothing if it is unable to map them. + +Most of the rest of the options are used to control where different inputs and outputs are expected and produced. +The option ``--macro-conf`` is the file that contains the set of input ``SeqMem`` configurations to map in the ``.conf`` format described above. +The option ``--macro-mdf`` also describes the input ``SeqMem`` s but is instead in the ``.mdf`` format. +The option ``--library`` is an ``.mdf`` description of the available technology macros that can be mapped to. +This file could be a list of fixed size memories often referred to as a cache of macros, or a description of what size memories could be made available through some technology specific process (usually an SRAM compiler), or a mix of both. +The option ``--use-compiler`` instructs MacroCompiler that it is allowed to use any compilers listed in the ``--library`` specification. +If this option is not set MacroCompiler will only map to macros directly listed in the ``--library`` specification. +The ``--verilog`` option specifies where MacroCompiler will write the verilog containing the new technology mapped memories. +The ``--firrtl`` option similarly specifies where MacroCompiler will write the FIRRTL that will be used to generate this verilog. +This option is optional and no FIRRTL will be emitted if it is not specified. +The ``--hammer-ir`` option specifies where MacroCompiler will write the details of which macros need to be generated from a technology compiler. +This option is not needed if ``--use-compiler`` is not specified. +This file can then be passed to HAMMER to have it run the technology compiler producing the associated macro collateral. +The ``--cost-func`` option allows the user to specify a different cost function for the mapping task. +Because the mapping of memories is a multi-dimensional space spanning performance, power, and area, the cost function setting of MacroCompiler allows the user to tune the mapping to their preference. +The default option is a reasonable heuristic that attempts to minimize the number of technology macros instantiated per ``SeqMem`` without wasting too many memory bits. +There are two ways to add additional cost functions. +First, you can simply write another one in scala and call `registerCostMetric` which then enables you to pass its name to this command-line flag. +Second, there is a pre-defined `ExternalMetric` which will execute a program (passed in as a path) with the MDF description of the memory being compiled and the memory being proposed as a mapping. +The program should print a floating point number which is the cost for this mapping, if no number is printed MacroCompiler will assume this is an illegal mapping. +The ``--cost-param`` option allows the user to specify parameters to pass to the cost function if the cost function supports that. +The ``--force-synflops [mem]`` options allows the user to override any heuristics in MacroCompiler and force it to map the given memory to flip-flops. +Likewise, the ``--force-compile [mem]`` option allows the user to force MacroCompiler to map the given ``mem`` to a technology macro. + +SRAM MDF Fields ++++++++++++++++ + +Technology SRAM macros described in MDF can be defined at three levels of detail. +A single instance can be defined with the `SRAMMacro` format. +A group of instances that share the number and type of ports but vary in width and depth can be defined with the `SRAMGroup` format. +A set of groups of SRAMs that can be generated together from a single source like a compiler can be defined with the `SRAMCompiler` format. + +At the most concrete level the `SRAMMAcro` defines a particular instance of an SRAM. +That includes its functional attributes such as its width, depth, and number of access ports. +These ports can be read, write, or read and write ports, and the instance can have any number. +In order to correctly map to these functional ports to the physical instance each port is described in a list of sub-structures, in the parent instance's structure. +Each port is only required to have an address and data field, but can have many other optional fields. +These optional fields include a clock, write enable, read enable, chip enable, mask. +The mask field can have a different granularity than the data field, e.g. it could be a bit mask or a byte mask. +Each field must also specify its polarity, whether it is active high or active low. + +In addition to these functional descriptions of the SRAM there are also other fields that specify physical/implementation characteristics. +These include the threshold voltage, the mux factor, as well as a list of extra non-functional ports. + +The next level of detail, an `SRAMGroup` includes a range of depths and widths, as well as a set of threshold voltages. +A range has a lower bound, upper bound, and a step size. +The least concrete level, an `SRAMCompiler` is simply a set of `SRAMGroups`. + +Separating the Top module from the TestHarness module +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Unlike the FireSim and Software simulation flows, a VLSI flow needs to separate the test harness and the chip (a.k.a. DUT) into separate files. +This is necessary to facilitate post-synthesis and post-place-and-route simulation, as the module names in the RTL and gate-level verilog files would collide. +Simulations after you the design goes through a VLSI flow will use the verilog netlist generated from the flow and will need an untouched test harness to drive it. +Separating these components into separate files makes this straightforward. +Without the separation the file that included the test harness would also redefine the DUT which is often disallowed in simulation tools. +To do this, there is a FIRRTL ``App`` in :ref:`Barstools` called ``GenerateTopAndHarness``, which runs the appropriate transforms to elaborate the modules separately. +This also renames modules in the test harness so that any modules that are instantiated in both the test harness and the chip are uniquified. + +.. Note:: For VLSI projects, this ``App`` is run instead of the normal FIRRTL ``App`` to elaborate Verilog. + +Macro Description Format +~~~~~~~~~~~~~~~~~~~~~~~~ + +The SRAM technology macros and IO cells are described in a json format called Macro Description Format (MDF). +MDF is specialized for each type of macro it supports. +The specialization is defined in their respective sections. + + + +Mapping technology IO cells +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Like technology SRAMs, IO cells are almost always included in digital ASIC designs to allow pin configurability, increase the voltage level of the IO signal, and provide ESD protection. +Unlike SRAMs, there is no corresponding primitive in Chisel or FIRRTL. +However, this problem can be solved similarly to ``SeqMems`` by leveraging the strong typing available in these scala-based tools. +We are actively developing a FIRRTL transform that will automatically configure, map, and connect technology IO cells. +Stay tuned for more information! + +In the meantime, it is recommended that you instantiate the IO cells in your Chisel design. +This, unfortunately, breaks the process-agnostic RTL abstraction, so it is recommended that inclusion of these cells be configurable using the ``rocket-chip`` parameterization system. +The simplest way to do this is to have a config fragment that when included updates instantiates the IO cells and connects them in the test harness. +When simulating chip-specific designs, it is important to include the IO cells. +The IO cell behavioral models will often assert if they are connected incorrectly, which is a useful runtime check. +They also keep the IO interface at the chip and test harness boundary (see :ref:`Separating the top module from the test harness`) consistent after synthesis and place-and-route, +which allows the RTL simulation test harness to be reused. diff --git a/docs/VLSI/Building-A-Chip.rst b/docs/VLSI/Building-A-Chip.rst index 6f7cd4b1..b684a07b 100644 --- a/docs/VLSI/Building-A-Chip.rst +++ b/docs/VLSI/Building-A-Chip.rst @@ -1,6 +1,54 @@ .. _build-a-chip: Building A Chip -============================== +=============== + +In this section, we will discuss many of the ASIC-specific transforms and methodologies within Chipyard. +For the full documentation on how to use the VLSI tool flow, see the `Hammer Documentation `__. + +Transforming the RTL +-------------------- + +Building a chip requires specializing the generic verilog emitted by FIRRTL to adhere to the constraints imposed by the technology used for fabrication. +This includes mapping Chisel memories to available technology macros such as SRAMs, mapping the input and output of your chip to connect to technology IO cells, see :ref:`Barstools`. +In addition to these required transformations, it may also be beneficial to transform the RTL to make it more amenable to hierarchical physical design easier. +This often includes modifying the logical hierarchy to match the physical hierarchy through grouping components together or flattening components into a single larger module. + + +Modifying the logical hierarchy +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Building a large or complex chip often requires using hierarchical design to place and route sections of the chip separately. +In addition, the design as written in Chipyard may not have a hierarchy that matches the physical hierarchy that would work best in the place and route tool. +In order to reorganize the design to have its logical hierarchy match its physical hierarchy there are several FIRRTL transformations that can be run. +These include grouping, which pull several modules into a larger one, and flattening, which dissolves a modules boundary leaving its components in its containing module. +These transformations can be applied repeatedly to different parts of the design to arrange it as the physical designer sees fit. +More details on how to use these transformations to reorganize the design hierarchy are forthcoming. + + +Creating a floorplan +-------------------- + +An ASIC floorplan is a specification that the place-and-route tools will follow when placing instances in the design. +This includes the top-level chip dimensions, placement of SRAM macros, placement of custom (analog) circuits, IO cell placement, bump or wirebond pad placement, blockages, hierarchical boundaries, and pin placement. + +Much of the design effort that goes into building a chip involves developing optimal floorplans for the instance of the design that is being manufactured. +Often this is a highly manual and iterative process which consumes much of the physical designer's time. +This cost becomes increasingly apparent as the parameterization space grows rapidly when using tools like Chisel- cycle times are hampered by the human labor +that is required to floorplan each instance of the design. +The Hammer team is actively developing methods of improving the agility of floorplanning for generator-based designs, like those that use Chisel. +The libraries we are developing will emit Hammer IR that can be passed directly to the Hammer tool without the need for human intervention. +Stay tuned for more information. + +In the meantime, see the `Hammer Documentation `__ for information on the Hammer IR floorplan API. +It is possible to write this IR directly, or to generate it using simple python scripts. +While we certainly look forward to having a more featureful toolkit, we have built many chips to date in this way. + + +Running the VLSI flow +--------------------- + +For the full documentation on how to use the VLSI tool flow, see the `Hammer Documentation `__. +For an example of how to use the VLSI in the context of Chipyard, see :ref:`ASAP7 Tutorial`. + -.. Note:: Please refer to the other sections in VLSI for tools/flows on how to build a chip. This section will be filled in ASAP.