AN 763: Intel® Arria® 10 SoC Device Design Guidelines

ID 683192
Date 5/17/2022
Public
Document Table of Contents

2.2.4.4. Example 4: FPGA Writing Cache Coherent Data to HPS

In this example, the HPS MPU requires access to data that originates in the FPGA. The most efficient mechanism for sharing small blocks of data with the MPU is to have logic in the FPGA perform cacheable writes to the HPS. It is important that the amount of data to be written to the HPS be in the form of relatively small blocks because large block writes cause the L2 cache to thrash, causing the cache to write to SDRAM for the majority of the transfer. For large buffer transfers, it is more appropriate to have the FPGA write data to the FPGA-to-SDRAM ports directly as shown in Example 2.

GUIDELINE: Perform full accesses targeting FPGA-to-HPS bridge.

For the transaction to be cacheable, the FPGA master must write to the FPGA-to-HPS bridge and at a minimum set the bufferable, cacheable and write-allocate bits of the AWCACHE signal. If you use Avalon-MM masters to access cacheable data, you must provide logic to force the AWCACHE signal to the appropriate values. An example of forcing Avalon-MM transactions to be cacheable can be found in the FPGA-to-HPS Bridge design example.

Figure 6. FPGA Writing Cache Coherent DataFor abbreviations, refer to the figure in Overview of HPS Memory-Mapped Interfaces.

GUIDELINE: Perform cacheable accesses aligned to 32 bytes targeting the FPGA-to-HPS bridge.

The ACP slave of the HPS is optimized for transactions that are the same size as the cache line (32 bytes). As a result you should attempt to align the data to 32-byte boundaries and ensure after data width adaptation the burst length into the 64-bit ACP slave is four beats long. For example, if the FPGA-to-HPS bridge is set up for 128-bit transactions you should align the data to be 32 byte aligned and perform full 128-bit accesses with a burst length of 2.

GUIDELINE: When L2 ECC is enabled, ensure that cacheable accesses to the FPGA-to-HPS bridge are aligned to 8-byte boundaries.

If you enable error checking and correction (ECC) in the L2 cache, you must also ensure each 8-byte group of data is completely written. The L2 cache performs ECC operations on 64-bit boundaries so when performing cacheable accesses you must always align the access to 8-byte boundaries and write to all eight lanes at once. Failing to follow these rules results in double bit errors, which cannot be recovered.

Regardless whether ECC is enabled or disabled, 32-byte cache transactions result in the best performance. Refer to "GUIDELINE: Access 32 bytes per cacheable transaction." in Example 3 for more information about 32-byte cache transactions.

GUIDELINE: When L2 ECC is enabled, ensure that cacheable accesses to the FPGA-to-HPS bridge have groups of eight write strobes enabled.

  • For 32-bit FPGA-to-HPS accesses, burst length must be 2, 4, 8, or 16 with all write byte strobes enabled.
  • For 64-bit FPGA-to-HPS accesses, all write byte strobes must be enabled.
  • For 128-bit FPGA-to-HPS accesses, the upper eight or lower eight (or both) write byte strobes must be enabled.