### Introduction
This guide walks through how to benchmark an application using the Cycle Counter API provided in the SDK. Under the hood this API either uses the Performance Management Unit (PMU) inside the R5F core or the Debug Watchpoint and Trace Unit (DWT) of the M4F core to return precise cycle counts.
We will be benchmarking the execution time of a 1024-point Complex Fast Fourier Transform (CFFT) on the R5F core in this exercise. We will be starting from the Empty example project and adding the CFFT and Cycle Counter to it.
### Step 1: Import the Empty Project
We will be using the Empty project for the R5F0-0 core as the starting point.
#### a. Launch CCS Desktop
#### b. In CCS, go to *View → Resource Explorer* to open Resource Explorer
#### c. In Resource Explorer, navigate to *MCU+ SDK → Examples → Development Tools → {Board} → empty → r5fss0-0_nortos → empty*
#### d. Click "Import" to import the project into your workspace
### Step 2: Add the CFFT code
We will be using the CFFT functions from the Arm CMSIS DSP library found in the SDK at `${MCU_PLUS_SDK_PATH}/source/cmsis/DSP`
#### a. First, right-click on the project and rename it to `CFFT_Benchmark`
Renaming the project allows you to re-import the empty project to the workspace in the future if needed.
#### b. Rename `empty.c` to `cfft.c` and `empty_main()` to `cfft_main()`
#### c. `#Include` the header files for the CFFT
Add the following `#includes` to your `cfft.c` file
```c
#include "arm_math.h"
#include "arm_const_structs.h"
```
#### d. Add the search paths to the header files
In the project's properties window, go to *Build > Arm Compiler > Include Options* and add the following paths.
`${MCU_PLUS_SDK_PATH}/source/cmsis/DSP/Include`
`${MCU_PLUS_SDK_PATH}/source/cmsis/Core/Include`
#### e. Initialize the CFFT input data
```c
float32_t cfftInData[2048]; // CFFT Input Array
/* initialize FFT complex array */
uint16_t i;
/* initialize FFT complex array */
for (i = 0; i < 2048; i += 2)
{
cfftInData[i] = arm_sin_f32((float) i * 7.5);
cfftInData[i + 1] = 0;
}
```
#### f. Add a call to the CFFT module, this is the function we will be benchmarking
```c
void cfft_main(void *args)
{
.
.
.
/* Process the data through the CFFT/CIFFT module */
arm_cfft_f32(&arm_cfft_sR_f32_len1024, cfftInData, 0, 1);
```
### Step 3: Add the Cycle Counter
Now we need to add the Cycle Counter to benchmark the execution time of the CFFT
#### a. `#Include` the Cycle Counter header file
```c
#include
```
#### b. Reset the Cycle Counter
`CycleCounterP_reset()` will enable and reset the Cycle Counter.
```c
void cfft_main(void *args)
{
.
.
.
CycleCounterP_reset();
```
#### c. Calculate the Cycle Counter overhead
Calculate the overhead of getting the Cycle Counter count so that we can subtract it from the benchmark measurement.
```c
void cfft_main(void *args)
{
.
.
.
uint32_t start, end, overhead;
/* Calculate overhead */
CycleCounterP_reset();
start = CycleCounterP_getCount32();
end = CycleCounterP_getCount32();
overhead = end - start;
DebugP_log("Total Overhead: %d Cycles\r\n", overhead);
```
#### d. Benchmark the CFFT function
Wrap the CFFT function in `CycleCounterP_getCount32()` calls to measure its execution time.
```c
.
.
.
CycleCounterP_reset();
start = CycleCounterP_getCount32();
/* Process the data through the CFFT/CIFFT module */
arm_cfft_f32(&arm_cfft_sR_f32_len1024, cfftInData, 0, 1);
end = CycleCounterP_getCount32();
DebugP_log("Start: %d\r\n", start);
DebugP_log("End: %d\r\n", end);
DebugP_log("Total: %d Cycles = %d microseconds @ 800MHz\r\n",
end - start - overhead, (end - start - overhead) / 800);
```
#### e. Your `cfft.c` should now look like this
```c
#include
#include
#include "ti_drivers_config.h"
#include "ti_drivers_open_close.h"
#include "ti_board_open_close.h"
#include "arm_math.h"
#include "arm_const_structs.h"
#include
float32_t cfftInData[2048];
void cfft_main(void *args)
{
/* Open drivers to open the UART driver for console */
Drivers_open();
Board_driversOpen();
uint32_t start, end, overhead;
uint16_t i;
/* initialize FFT complex array */
for (i = 0; i < 2048; i += 2)
{
cfftInData[i] = arm_sin_f32((float) i * 7.5);
cfftInData[i + 1] = 0;
}
/* Calculate overhead */
CycleCounterP_reset();
start = CycleCounterP_getCount32();
end = CycleCounterP_getCount32();
overhead = end - start;
DebugP_log("Overhead: %d Cycles\r\n", overhead);
CycleCounterP_reset();
start = CycleCounterP_getCount32();
/* Process the data through the CFFT/CIFFT module */
arm_cfft_f32(&arm_cfft_sR_f32_len1024, cfftInData, 0, 1);
end = CycleCounterP_getCount32();
DebugP_log("Start: %d\r\n", start);
DebugP_log("End: %d\r\n", end);
DebugP_log("Total: %d Cycles = %d microseconds @ 800MHz\r\n",
end - start - overhead, (end - start - overhead) / 800);
Board_driversClose();
Drivers_close();
}
```
### Step 4: Build and Run the Benchmark Project
Now that we have created the benchmark application, let's build and run it.
#### a. Click **Debug** to build and load the application onto the R5F0-0 core

#### b. Click **Resume** (F8) to run the program. You should see the benchmark output in the CCS Console
```txt
Overhead: 9 Cycles
Start: 7
End: 77552
Total: 77536 Cycles = 96 microseconds @ 800MHz
```
### Cycle Counter Max Duration
`CycleCounterP_getCount32()` is a 32-bit counter so the maximum number of cycles before the counter rolls over is `2^32 - 1` or `0xFFFFFFFF`. So for an R5F core running at 800MHz, the maximum duration before the Cycle Counter rolls over is `(2^32 - 1)/800MHz`, or about 5.3 seconds.
The application can add logic to handle overflow. For example, here is a snippet from the SDK User Guide that shows how to handle one overflow condition:
```c
uint32_t cycleCountBefore, cycleCountAfter, cpuCycles;
/* enable and reset CPU cycle coutner */
CycleCounterP_reset();
cycleCountBefore = CycleCounterP_getCount32();
/* call functions to profile */
cycleCountAfter = CycleCounterP_getCount32();
/* Check for overflow and wrap around.
*
* This logic will only work for one overflow.
* If multiple overflows happen during the profile period,
* then CPU cycles count will be wrong.
*/
if (cycleCountAfter > cycleCountBefore)
{
cpuCycles = cycleCountAfter - cycleCountBefore;
}
else
{
cpuCycles = (0xFFFFFFFFU - cycleCountBefore) + cycleCountAfter;
}
```
If measurement of longer durations is needed, it is recommended to use the `ClockP_getTimeUsec()` API from the Clock module.
### Congratulations!
You have just learned how to use the Cycle Counter to benchmark an MCU+ SDK application. For another example application of using the Cycle Counter, take a look at the Benchmark Demo in the SDK at `examples/motor_control/benchmark_demo` .
### []{ } Knowledge Check
**1. True or False:** `CycleCounterP_reset()` should be called before using the Cycle Counter.
[quiz]
v True --> Correct!
x False --> Incorrect
[quiz]
**2. True or False:** The Cycle Counter driver automatically handles counter rollover.
[quiz]
x True --> Incorrect
v False --> Correct!
[quiz]
### Additional Reading
##### MCU+ SDK User Guide: [Cycle Counter](https://software-dl.ti.com/mcu-plus-sdk/esd/AM243X/latest/exports/docs/api_guide_am243x/KERNEL_DPL_CYCLE_COUNTER_PAGE.html)
{{r> [Back to Home](../overview.html)}}
{{r **Was this helpful? Let us know here:** [mcu_plus_academy_feedback@list.ti.com](mailto:mcu_plus_academy_feedback@list.ti.com)}}