updated documentation
This commit is contained in:
@@ -2,69 +2,26 @@
|
||||
|
||||
The Vortex Cache Sub-system has the following main properties:
|
||||
|
||||
- High-bandwidth with bank parallelism
|
||||
- Snoop protocol to flush data for CPU access
|
||||
- Generic design: Dcache, Icache, Shared Memory, L2 cache, L3 cache
|
||||
- High-bandwidth transfer with Multi-bank parallelism
|
||||
- Non-blocking pipelined architecture with local MSHR
|
||||
- Configurable design: Dcache, Icache, L2 cache, L3 cache
|
||||
|
||||
### Cache Hierarchy
|
||||
### Cache Microarchitecture
|
||||
|
||||

|
||||

|
||||
|
||||
- Cache can be configured to be any level in the hierarchy
|
||||
- Caches communicate via snooping
|
||||
- Cache flush from AFU is passed down the hierarchy
|
||||
The Vortex cache is comprised of multiple parallel banks. It is comprised of the following modules:
|
||||
- **Bank request dispatch crossbar**: assign a bank to incoming requests and resolve collision using stalls.
|
||||
- **Bank response merge crossbar**: merge result from banks and forward to the core response.
|
||||
- **Memory request multiplexer**: arbitrate bank memory requests
|
||||
- **Memory response demultiplexer**: forward memory response to the corresponding bank.
|
||||
- **Flush Unit**: perform tag memory initialization.
|
||||
|
||||
### VX_cache.v (Top Module)
|
||||
Incoming requests entering the cache are sent to a dispatch crossbar that select the corresponding bank for each request, resolving bank collisions with stalls. The result output of each bank is merge back into outgoing response port via merger crossbar. Each bank intergates a non-blocking pipeline with a local Miss Status Holding Register (MSHR) to reduce the miss rate. The bank pipeline consists of the following stages:
|
||||
|
||||
VX.cache.v is the top module of the cache verilog code located in the `/hw/rtl/cache` directory.
|
||||
- **Schedule**: Selects the next request into the pipeline from the incoming core request, memory fill, or the MSHR entry, with priority given to the latter.
|
||||
- **Tag Access**: A single-port read/write access to the tag store.
|
||||
- **Data Access**: Single-port read/write access to the data store.
|
||||
- **Response Handling**: Core response back to the core.
|
||||
|
||||

|
||||
|
||||
- Configurable (Cache size, number of banks, bank line size, etc.)
|
||||
- I/O signals
|
||||
- Core Request
|
||||
- Core Rsp
|
||||
- DRAM Req
|
||||
- DRAM Rsp
|
||||
- Snoop Rsp
|
||||
- Snoop Rsp
|
||||
- Snoop Forwarding Out
|
||||
- Snoop Forwarding In
|
||||
- Bank Select
|
||||
- Assigns valid and ready signals for each bank
|
||||
- Snoop Forwarder
|
||||
- DRAM Request Arbiter
|
||||
- Prepares cache response for communication with DRAM
|
||||
- Snoop Response Arbiter
|
||||
- Sends snoop response
|
||||
- Core Response Merge
|
||||
- Cache accesses one line at a time. As a result, each request may not come back in the same response. This module tries to recombine the responses by thread ID.
|
||||
|
||||
### VX_cache_bank.v
|
||||
|
||||
VX_cache_bank.v is the verilog code that handles cache bank functionality and is located in the `/hw/rtl/cache` directory.
|
||||
|
||||

|
||||
|
||||
- Allows for high throughput
|
||||
- Each bank contains queues to hold requests to the cache
|
||||
- I/O signals
|
||||
- Core request
|
||||
- Core Response
|
||||
- DRAM Fill Requests
|
||||
- DRAM Fill Response
|
||||
- DRAM WB Requests
|
||||
- Snp Request
|
||||
- Snp Response
|
||||
- Request Priority: DRAM fill, miss reserve, core request, snoop request
|
||||
- Snoop Request Queue
|
||||
- DRAM Fill Queue
|
||||
- Core Req Arbiter
|
||||
- Requests to be processed by the bank
|
||||
- Tag Data Store
|
||||
- Registers for valid, dirty, dirtyb, tag, and data
|
||||
- Length of registers determined by lines in the bank
|
||||
- Tag Data Access:
|
||||
- I/O: stall, snoop info, force request miss
|
||||
- Writes to cache or sends read response; hit or miss determined here
|
||||
- A missed request goes to the miss reserve if it is not a snoop request or DRAM fill
|
||||
Deadlocks inside the cache can occur when the MSHR is full and a new request is already in the pipeline. It can also occur when the memory request queue is full, and there is an incoming memory response. The cache mitigates MSHR deadlocks by using an early full signal before a new request is issued and similarly mitigates memory deadlocks by ensuring that its request queue never fills up.
|
||||
|
||||
Reference in New Issue
Block a user