Histogram Design Example Walkthrough

Developer Reference

Migrating OpenCL™ FPGA Designs to SYCL*

Download PDF

ID 767849

Date 5/08/2024

Version

Public

Visible to Intel only — GUID: GUID-468FC0EF-02D1-45AD-BE1A-3F515176CC48

View Details

Histogram Design Example Walkthrough

Design Overview

For the ease of understanding how to migrate an OpenCL FPGA design to SYCL*, refer to the following links where an OpenCL sample is migrated to SYCL:

The purpose of this design is to demonstrate important differences between OpenCL and SYCL for FPGA targets. The Histogram design implements a simple histogram function. The input data is a one-dimensional array of randomly generated integers with values between 0 and 99. You can choose the number of input values by passing in a command-line argument. The output is a histogram of this data using 10 bins. The histogram is calculated by the kernel function, which is offloaded to run on the FPGA. The resulting histogram is verified against a reference version calculated by the host.

This simple design allows you to observe some similarities and differences between OpenCL and SYCL as listed in the following tables:

Source Code Organization, Compilation, and Execution

	OpenCL	SYCL
Organization of the Source Code	OpenCL programs consist of a C++ host program and kernel functions written in C within `.cl` files. In the Histogram design example, you see them as `host.cpp` and `histogram.cl` files.	SYCL programs can be single-sourced. You can write the host and device code in the SYCL language within a single file. In the Histogram design example, this is the `main.cpp` file.
Compilation	You must compile `histogram.cl` and `host.cpp` files individually using `device_fpga` and `host_fpga` targets in the Makefile, respectively.	You can compile a SYCL program to run on the FPGA device using a single command. You can see this `icpx` command in the `fpga` target of the Makefile.
Execution of the Program	You must manually program the FPGA device with the `aocx` file that the `device_fpga` generates before running the host executable the `host_fpga` target generates. You can perform this either with a command-line operation before running the executable or by adding code into the host program.	Running a SYCL program on the FPGA device simply involves running the executable produced by the `fpga` target of the Makefile. This executable contains the `.aocx` file and running the executable programs the FPGA device with the `.aocx` automatically.
Emulation	Compile for emulation using the `-march=emulator` flag as shown in the `device_emu` target of the Makefile.	Compile for emulation using the `-DFPGA_EMULATOR` flag in the `icpx` command as shown in the `fpga_emu` target of the Makefile.
Optimization Report Generation	Generate the reports using the `device_report` target, where the `-rtl` flag stops compilation after generating the report.	Generate the report using the `report` target of the Makefile, where the `-fsycl-link=early` flag stops compilation after generating the reports.
Compilation for an FPGA Hardware Device	Use `device_fpga` and `host_fpga` targets in the Makefile respectively to compile for an FPGA hardware device.	Use the `fpga` target in the Makefile to compile for an FPGA hardware device. Since SYCL programs can be single-sourced, changes to the host code may trigger a full recompilation of the kernel code, including the time-consuming generation of the FPGA bitstream by the Intel® Quartus® Prime software. To avoid expensive and unnecessary recompilation of the kernel code, the `fpga` target uses the `-reuse-exe=main.fpga` flag that causes the `icpx` command to attempt to reuse the existing FPGA bitstream contained in the `main.fpga` executable if it can determine that the kernel code in the `main.cpp` has not changed. Alternatively, use the Device Link method described in detail in the Separating Device and Host Code Compilation section of the Intel oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs.

Host Code

Similar to OpenCL programs, SYCL programs have contexts, platforms, devices, queues, buffers, and kernels, as explained in the Modify Your Design chapter. However, as you can observe by comparing the SYCL program's main.cpp file to the OpenCL program's host.cpp file, choosing a device and launching a kernel is much simpler in the SYCL version of the example design.

	OpenCL	SYCL
Selecting the Device	You select FPGA hardware or emulator device by creating an explicit context and a queue to run on the selected device (see lines 84 - 94 of the `host.cpp` file).	FPGA hardware or emulator device is selected using a `device_selector` (see lines 34-38 of the `main.cpp` file). You do not need to create a platform or context explicitly. A queue is created from the `device_selector` (see line 41), and therefore kernels submitted to that queue run on the selected device.
Passing Data To and From the Kernel	You create buffers for input and output data using the `clCreateBuffer` function and provide the size of the buffer, the context, and so on (see line 107 of the `host.cpp` file).	See lines 43-52 of the `main.cpp` file for creating buffers for input and output data.
Accessing Buffers	Each kernel argument must be set to a buffer (or constant) explicitly using the `clSetKernelArg` function.	The kernel accesses the buffers through `accessor` objects. See lines 63-64 of the `main.cpp` file to understand how accessors are created from buffers. The kernel can then use these accessors as if they were pointers, for example, reading from the accessor (`in` on line 79), and writing to the accessor (`bins` on line 85).
Copying Kernel Output Data	You must copy the output data of the kernel back to the host explicitly using the `clEnqueueReadBuffer` function (see line 130).	Data is copied back to the host array `bins_h` automatically by the SYCL runtime when the SYCL buffer is destroyed if a host pointer was provided when the buffer was created.
Explicit Data Movement	Explicit data movement happens when you manually call `clEnqueueWriteBuffer` and `clEnqueueReadBuffer` functions. See lines 107-132 of the `host.cpp` file.	The buffer and accessor approach for passing data to and from the kernel is designed to simplify SYCL programs because the SYCL runtime handles copying the data to and from the device for you. However, to achieve high performance for more complex FPGA designs, Intel recommends that you become familiar with explicit data movement. The SYCL Sample Code With Explicit Data Movement shows a third version of the histogram design that uses explicit data movement. In this case: Memory is allocated on the FPGA device by calling the `malloc_device` function explicitly (see lines 44-52 of the `main.cpp` file). The input data is copied from the host to this device memory using the `memcpy` function (see line 56). You can directly use the pointers returned by the `malloc_device` function in kernel code, but note that to reduce area overhead, `device_ptr` pointers are created on lines 63-64. These pointers inform the compiler that all accesses through these pointers are to memory on the device rather than the host. You must copy data explicitly to the host using the `memcpy` function (see line 93). You must manage synchronization between the kernel and data transfer operations explicitly by waiting for the input data to be copied before launching the kernel and waiting for the kernel to finish before copying the output data back to the host.
Error Handling	The runtime APIs each have a return value indicating whether the operation was successful.	Runtime errors are reported by throwing an exception. For example, the buffers are created within a try-catch block, so if buffer creation fails by throwing an exception, the exception is caught, and an error message displays (see line 96 of the `main.cpp` file).
Resource Cleanup	You must clean up runtime objects, including `cl_context` and buffers (see lines 142-143).	You need not explicitly release runtime objects, such as buffers that are statically allocated. You can rely upon the object's destructor to clean up resources when the object goes out of scope.

Kernel Code

	OpenCL	SYCL
Body of the Kernel Function	The kernel source code is in a separate `.cl` file (see `histogram.cl`).	A kernel is either a lambda function or a functor. See `main.cpp` file, which contains the body of the kernel function. On line 61, observe the special syntax of a C++ lambda function. NOTE: The lambda capture `[=]` indicates that all captures are by copy. It is mandatory for kernel lambda functions.
Pragmas, Attribute, Directives, and Extensions	Most of the pragmas available in OpenCL kernel code have equivalent pragmas or attributes in SYCL, but some syntaxes differ in SYCL. In the example code, the `restrict` keyword on kernel arguments indicate that the input and output buffers do not overlap (shown on lines 3 and 4 of the `histogram.cl` file). You can also observe other pragmas in this file, such as `#pragma unroll`, `#pragma ii 1`, and `__attribute__((register))`.	Some pragmas used in the kernel code are the same, but the syntax is slightly different in other cases. For example, to indicate that the input and output buffers do not overlap (or alias), the kernel attribute `[[intel::kernel_args_restrict]]` is placed on the lambda function (see line 61 of the `main.cpp`). You can also observe other pragmas directives in this file, such as `#pragma unroll`, `[[intel::initiation_interval(1)]]`, and `[[intel::fpga_register]]`. For a detailed list of all flags, pragmas, and attributes, refer to Flags, Attributes, Directives, and Extensions.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Migrating OpenCL™ FPGA Designs to SYCL*

Histogram Design Example Walkthrough

Design Overview

Source Code Organization, Compilation, and Execution

Host Code

Kernel Code