Developer Reference

Migrating OpenCL™ FPGA Designs to SYCL*

ID 767849
Date 5/08/2024
Public

Host Code Modification

This topic describes how to convert your OpenCL host program to SYCL*.

NOTE:

The converted SYCL code in the following examples is single-sourced, so device and host code are in the same source file. Wherever SYCL device code is present, the code sample displays <your device code goes here>. See Device Code Modification for more details.

Add Include Files

Add SYCL-specific include files to define all SYCL constructs. The following table lists various include files that define the host API and FPGA extensions:

Include Files
OpenCL SYCL
#include “CL/opencl.h”

or

#include “CL/cl.h”
#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

Add the sycl Namespace

Add the sycl namespace as all SYCL constructs reside in this namespace. By adding this namespace, you can also avoid repeatedly writing sycl:: throughout your code.

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

Forward Declare Kernel Names

Recognizable kernel names allow you to identify kernels in the reports. In OpenCL, kernels are C functions marked with the special __kernel keyword, and the kernel name is simply the function name. In SYCL, you define kernels using C++ lambdas or functors, and therefore, a different naming method is required, where you use a type (for example, a C++ class or struct) to name the kernel.

Kernel submission is explained later in this document, but for now, it suffices to forward declare the type used to name the kernels, as shown in the following snippet:

SYCL Example: Header Files, Namespaces, and Forward Declaring Kernel Names

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

// Forward declare the kernel name (CopyKernel) in the global 
// scope to reduce name mangling. This is a best practice
// that makes it easier to identify the kernel in the FPGA
// optimization reports.
class CopyKernel;
NOTE:

The purpose of declaring this type in the global (file) scope is to minimize name mangling by the Intel® oneAPI DPC++/C++ Compiler, since the deeper the nesting of the declared type, the more the compiler mangles the name.

Queue Creation

Queues connect a host program to a single device. Host programs submit tasks to a device via the queue and can monitor the queue for completion.

The following table depicts how to create a device queue in OpenCL and SYCL:

Creating a Device Queue
OpenCL SYCL
platform = findPlatform("Intel(R) FPGA SDK for OpenCL(TM)");

clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, &device, &num_devices);

clCreateContext(0, num_devices, &device, &context_error_callback, NULL, &status);

cl_command_queue device_queue = clCreateCommandQueue (context, device_id,
properties, &errcode_ret);
queue device_queue((ext::intel::fpga_selector()));

SYCL Example: Create the Device Queue

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

// Forward declare the kernel name in the global scope to reduce
// name mangling. This is an FPGA best practice that makes it
// easier to identify the kernel in the optimization reports.
class CopyKernel;

void main() {
  queue device_queue(ext::intel::fpga_selector());
  // …
};

Memory Allocation and Movement

The following table shows how to transform the code related to data allocation and movement from OpenCL to SYCL:

Memory Allocation and Movement
OpenCL SYCL
int in_data[N], out_data[N];

cl_mem in = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(int) * N,
NULL, &status);

cl_mem out = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(int) * N,
NULL, &status);
int in_data[N], out_data[N];

int *in = malloc_device<int>(N, device_queue);
int *out = malloc_device<int>(N, device_queue);
clEnqueueWriteBuffer(device_queue, in, CL_TRUE, 0, sizeof(int) * N,
in_data, 0, NULL, NULL);
device_queue.memcpy(in, in_data, N * sizeof(int)).wait();
clEnqueueReadBuffer(device_queue, out, CL_TRUE, 0, sizeof(int) * N,
out_data, 0, NULL, NULL);
device_queue.memcpy(out_data, out, N * sizeof(int)).wait();

The following SYCL code uses Unified Shared Memory (USM) device allocations to allocate memory on the device. The Advanced Modifications section discusses alternative SYCL methods for allocating and moving data. The CL_TRUE arguments to clEnqueueWriteBuffer and clEnqueueReadBuffer interfaces in OpenCL result in blocking memory copy operations. In SYCL, you can achieve this by waiting on the returned event from the memcpy function calls. See Advanced Modifications for more information about events.

SYCL Example: Allocate and Move Data

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

// Forward declare the kernel name in the global scope to reduce
// name mangling. This is an FPGA best practice that makes it
// easier to identify the kernel in the optimization reports.
class CopyKernel;

void main() {
  // Create the device queue using the FPGA device selector.
  queue device_queue(ext::intel::fpga_selector());
    
  int in_data[N], out_data[N];

  // Allocate memory on the device.
  int *in = malloc_device<int>(N, device_queue);
  int *out = malloc_device<int>(N, device_queue);

  // Copy input data to the device and wait for it to finish.
  device_queue.memcpy(in, in_data, N * sizeof(int)).wait();

  // <submit kernel(s) to the device queue: this is discussed in the device code section>

  // Copy output data back from the device and wait for it to finish.
  device_queue.memcpy(out_data, out, N * sizeof(int)).wait();
};

Create Kernels and Submit Them to the Command Queue

The following table shows how you can submit both single task and NDRange kernels to the command queue in OpenCL and SYCL:

OpenCL SYCL
kernel = clCreateKernel(program, kernel_name, &status);

clEnqueueTask(device_queue,
              kernel,
              num_events,
              event_list,
              &my_event);
event my_event = device_queue.single_task<CopyKernel>([=] {
    // <your device code goes here>
});
kernel = clCreateKernel(program, kernel_name, &status);

clEnqueueNDRangeKernel(device_queue, 
                       kernel, 
                       num_dimensions,
                       global_work_offset,
                       global_work_size, 
                       local_work_size, 
                       num_events, 
                       event_list, 
                       &my_event);
event my_event = device_queue.parallel_for<CopyKernel>(nd_range<num_dimensions>(global_work_size, local_work_size, global_work_offset)), [=](id<num_dimensions> index) {
    // <your device code goes here>
});
NOTE:

The CopyKernel template parameter to the single_task and parallel_for functions are the forward declared kernel names discussed earlier.

The following table shows how kernel arguments are set in OpenCL and SYCL:

NOTE:

Kernel arguments are not set explicitly in SYCL but instead captured by copy in the lambda's captures (the = in [=]). The lambda automatically captures any variable you use in the kernel and converts it to a kernel argument. However, since the variables are captured by copy, they must be trivially copyable.

OpenCL SYCL
clSetKernelArg(kernel, 2, sizeof(cl_uint), (void*) &N);
Captured by the lambda's captures. For example, the integer N is captured by copy and can be used inside the kernel lambda in the following manner:
int N;
device_queue.single_task<CopyKernel>([=] {
  // <your device code goes here>
  // you can use ‘N’ here
});
NOTE:

When a kernel is enqueued to the device queue in both OpenCL and SYCL, an event returns (my_event). In OpenCL, an event returns by passing a reference to the function. In SYCL, an event is passed as a return value of the function.

The following table shows how the host code can wait for this event to finish and thus wait for the kernel to finish:

NOTE:

Examples show the basic use of events. See Advanced Modifications for additional details.

OpenCL SYCL
clWaitForEvents(1, &my_event);
my_event.wait()

After combining the kernel submission and event waiting code with the example from the previous section, you get the following code:

SYCL Example: Submitting a Single-task Kernel to the Device Queue

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

// Forward declare the kernel name in the global scope to reduce
// name mangling. This is an FPGA best practice that makes it
// easier to identify the kernel in the optimization reports.
class CopyKernel;

void main() {
  // create the device queue using the FPGA device selector.
  queue device_queue(ext::intel::fpga_selector());
    
  // The input and output data on the host.
  int in_data[N], out_data[N];

  // Allocate memory on the device.
  int *in = malloc_device<int>(N, device_queue);
  int *out = malloc_device<int>(N, device_queue);

  // Copy input data to the device and wait for it to finish.
  device_queue.memcpy(in, in_data, N * sizeof(int)).wait();

  // Submit kernel to the device queue.
  event my_event = device_queue.single_task<CopyKernel>([=] {
    // <your device code goes here>
  });
  my_event.wait();  // wait on the kernel to finish

  // Copy output data back from the device and wait for it to finish.
  device_queue.memcpy(out_data, out, N * sizeof(int)).wait();
};

The host code transformation is now finished, and you have a complete SYCL host program. Next, you must transform the device code, which is covered in the Device Code Modification topic.