Wrap the GCD Core with AXI4-Lite Interface


In the part 1, we have done the RTL simulation of the GCD core. In this part, we are going to create a wrapper module (axi_gcd_performance.v) for the GCD core. So, the GCD core can interact with the host CPU (ARM Cortex-A9). We also going to add two counters, the first counter, inside the axi_gcd_performance.v, is for benchmarking the hardware GCD core, and the second counter, inside the axi_performance_counter.v, is for benchmarking the software GCD implementation.

The system diagram for this tutorial is shown in the following figure.

You can get the full source code of this part from here.


The following code shows the implementation of counter in Verilog. It is a simple counter that counts up if en=1, and stops if en=0. We can clear the counter back to zero by setting clr=1. The counter can count up from 0 to 264-1.

The following figure shows the timing diagram of the counter.

AXI4-Lite Wrapper

Now that we already have the gcd_core.v and counter.v. The next step is to create AXI4-Lite wrapper. We are going to create two wrapper, the first wrapper (axi_gcd_performance.v)is for gcd_core.v and counter.v, and the second wrapper (axi_performance_counter.v) is for counter.v.

The block diagram of the axi_gcd_performance.v is shown in the following figure. The counter inside the axi_gcd_performance.v starts when the GCD operation is initiated and stops when the operation is done.

The following Verilog code is for the first wrapper.

AXI4-Lite Prorocol

AXI4 (Advanced eXtensible Interface version 4) is part of the ARM AMBA bus architecture. It is a multi-master, multi-slave communication protocol, mainly designed for on-chip communication. There are three types of AXI4 bus that are commonly used in Xilinx, which are AXI4 (AXI4-Full), AXI4-Lite, and AXI4-Stream. Both AXI4 and AXI4-Lite are memory-mapped interface, while AXI4-Stream is stream interface. AXI4-Lite is a subset of AXI4. In this tutorial, we are going to use the AXI4-Lite interface.

AXI4-Lite has five channels as follows:

  • Read Address channel (AR) (s_axi_ar*)
  • Read Data channel (R) (s_axi_r*)
  • Write Address channel (AW) (s_axi_aw*)
  • Write Data channel (W) (s_axi_w*)
  • Write Response channel (B) (s_axi_b*)

In the Verilog code line 13-33, it declares all of the AXI4-Lite signals.

Register Map

The host CPU (ARM Cortex-A9) interacts with the GCD core by using memory-mapped access. From the C program point of view, the program should know the memory-mapped address (base address) of the GCD core. Then, the base address is declared as a pointer. Hence, the program can interact to the GCD core by writing/reading values to or from the pointer.

The registers, offset address, and fileds are:

  1. (offset 0, 0x00): status register
    • bit 1 = READY (R), bit 0 = START (R/W)
  2. (offset 1, 0x04): register a
    • bit 31~0 = A[31:0] (R/W)
  3. (offset 2, 0x08): register b
    • bit 31~0 = B[31:0] (R/W)
  4. (offset 3, 0x0C): output r
    • bit 31~0 = R[31:0] (R)
  5. (offset 4, 0x10): performance counter value
    • bit 31~0 = CNT[31:0] (R)

The memory mapped registers are implemented in line 221-253.

Read and Write Controllers

Read and write controllers are basically finite state machine (FSM) that implements the AXI4-Lite protocol. These controllers interact with the CPU (via AXI4 interconnect). Through this controller, the CPU can write or read data to or from the memory map registers.

The following figure shows the timing and FSM diagram of the AXI4-Lite write controller. The code for this implementation is in line 85-132.

The following figure shows the timing and FSM diagram of the AXI4-Lite read controller. The code for this implementation is in line 135-188.

In line 192-216, we instantiate the GCD core and counter.

The second wrapper is for the software GCD performance counter (axi_preformance_counter.v). The following figure shows the block diagram of the second wrapper.

The following Verilog code is for the second wrapper.

Register Map

The registers, offset address, and fileds for of the second wrapper are:

  1. (offset 0, 0x00): controlregister
    • bit 1 = STOP (R/W), bit 0 = START (R/W)
  2. (offset 1, 0x04): counter value low
    • bit 31~0 = Q[31:0] (R)
  3. (offset 2, 0x08): counter value high
    • bit 31~0 = Q[63:32] (R)

The control register is implemented in line 192-215.


In this tutorial, we have built AXI4-Lite wrapper module that wraps the GCD core and counters. This wrapper modules enable the CPU (ARM Cortex-A9) to interact with the GCD core and counter through the AXI4-Lite protocol. The CPU sees the GCD core and counter as memory-mapped peripherals. Furthermore, in C program, it is accessible through pointers.

Next: Create a Testbench for Simulating the AXI4-Lite Interface