Host Code Modification

Developer Reference

Migrating OpenCL™ FPGA Designs to SYCL*

Download PDF

ID 767849

Date 5/08/2024

Version

Public

Visible to Intel only — GUID: GUID-A5D0F9A3-D3B3-4FF0-86D3-F0221538D9D6

View Details

Host Code Modification

This topic describes how to convert your OpenCL host program to SYCL*.

NOTE:

The converted SYCL code in the following examples is single-sourced, so device and host code are in the same source file. Wherever SYCL device code is present, the code sample displays <your device code goes here>. See Device Code Modification for more details.

Add Include Files

Add SYCL-specific include files to define all SYCL constructs. The following table lists various include files that define the host API and FPGA extensions:

Include Files
OpenCL	SYCL
`#include “CL/opencl.h”` or `#include “CL/cl.h”`	`#include <sycl/sycl.hpp> #include <sycl/ext/intel/fpga_extensions.hpp>`

Add the `sycl` Namespace

Add the sycl namespace as all SYCL constructs reside in this namespace. By adding this namespace, you can also avoid repeatedly writing sycl:: throughout your code.

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

Forward Declare Kernel Names

Recognizable kernel names allow you to identify kernels in the reports. In OpenCL, kernels are C functions marked with the special __kernel keyword, and the kernel name is simply the function name. In SYCL, you define kernels using C++ lambdas or functors, and therefore, a different naming method is required, where you use a type (for example, a C++ class or struct) to name the kernel.

Kernel submission is explained later in this document, but for now, it suffices to forward declare the type used to name the kernels, as shown in the following snippet:

SYCL Example: Header Files, Namespaces, and Forward Declaring Kernel Names

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

// Forward declare the kernel name (CopyKernel) in the global 
// scope to reduce name mangling. This is a best practice
// that makes it easier to identify the kernel in the FPGA
// optimization reports.
class CopyKernel;

NOTE:

The purpose of declaring this type in the global (file) scope is to minimize name mangling by the Intel® oneAPI DPC++/C++ Compiler, since the deeper the nesting of the declared type, the more the compiler mangles the name.

Queue Creation

Queues connect a host program to a single device. Host programs submit tasks to a device via the queue and can monitor the queue for completion.

The following table depicts how to create a device queue in OpenCL and SYCL:

OpenCL SYCL

Creating a Device Queue
OpenCL	SYCL
`platform = findPlatform("Intel(R) FPGA SDK for OpenCL(TM)"); clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, &device, &num_devices); clCreateContext(0, num_devices, &device, &context_error_callback, NULL, &status); cl_command_queue device_queue = clCreateCommandQueue (context, device_id, properties, &errcode_ret);`	`queue device_queue((ext::intel::fpga_selector()));`

platform = findPlatform("Intel(R) FPGA SDK for OpenCL(TM)");

clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, &device, &num_devices);

clCreateContext(0, num_devices, &device, &context_error_callback, NULL, &status);

cl_command_queue device_queue = clCreateCommandQueue (context, device_id,
properties, &errcode_ret);

queue device_queue((ext::intel::fpga_selector()));

SYCL Example: Create the Device Queue

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

// Forward declare the kernel name in the global scope to reduce
// name mangling. This is an FPGA best practice that makes it
// easier to identify the kernel in the optimization reports.
class CopyKernel;

void main() {
  queue device_queue(ext::intel::fpga_selector());
  // …
};

Memory Allocation and Movement

The following table shows how to transform the code related to data allocation and movement from OpenCL to SYCL:

OpenCL SYCL

Memory Allocation and Movement
OpenCL	SYCL
`int in_data[N], out_data[N]; cl_mem in = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(int) * N, NULL, &status); cl_mem out = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(int) * N, NULL, &status);`	`int in_data[N], out_data[N]; int in = malloc_device<int>(N, device_queue); int out = malloc_device<int>(N, device_queue);`
`clEnqueueWriteBuffer(device_queue, in, CL_TRUE, 0, sizeof(int) * N, in_data, 0, NULL, NULL);`	`device_queue.memcpy(in, in_data, N * sizeof(int)).wait();`
`clEnqueueReadBuffer(device_queue, out, CL_TRUE, 0, sizeof(int) * N, out_data, 0, NULL, NULL);`	`device_queue.memcpy(out_data, out, N * sizeof(int)).wait();`

int in_data[N], out_data[N];

cl_mem in = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(int) * N,
NULL, &status);

cl_mem out = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(int) * N,
NULL, &status);

int in_data[N], out_data[N];

int *in = malloc_device<int>(N, device_queue);
int *out = malloc_device<int>(N, device_queue);

clEnqueueWriteBuffer(device_queue, in, CL_TRUE, 0, sizeof(int) * N,
in_data, 0, NULL, NULL);

device_queue.memcpy(in, in_data, N * sizeof(int)).wait();

clEnqueueReadBuffer(device_queue, out, CL_TRUE, 0, sizeof(int) * N,
out_data, 0, NULL, NULL);

device_queue.memcpy(out_data, out, N * sizeof(int)).wait();

The following SYCL code uses Unified Shared Memory (USM) device allocations to allocate memory on the device. The Advanced Modifications section discusses alternative SYCL methods for allocating and moving data. The CL_TRUE arguments to clEnqueueWriteBuffer and clEnqueueReadBuffer interfaces in OpenCL result in blocking memory copy operations. In SYCL, you can achieve this by waiting on the returned event from the memcpy function calls. See Advanced Modifications for more information about events.

SYCL Example: Allocate and Move Data

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

// Forward declare the kernel name in the global scope to reduce
// name mangling. This is an FPGA best practice that makes it
// easier to identify the kernel in the optimization reports.
class CopyKernel;

void main() {
  // Create the device queue using the FPGA device selector.
  queue device_queue(ext::intel::fpga_selector());
    
  int in_data[N], out_data[N];

  // Allocate memory on the device.
  int *in = malloc_device<int>(N, device_queue);
  int *out = malloc_device<int>(N, device_queue);

  // Copy input data to the device and wait for it to finish.
  device_queue.memcpy(in, in_data, N * sizeof(int)).wait();

  // <submit kernel(s) to the device queue: this is discussed in the device code section>

  // Copy output data back from the device and wait for it to finish.
  device_queue.memcpy(out_data, out, N * sizeof(int)).wait();
};

Create Kernels and Submit Them to the Command Queue

The following table shows how you can submit both single task and NDRange kernels to the command queue in OpenCL and SYCL:

OpenCL SYCL

OpenCL	SYCL
`kernel = clCreateKernel(program, kernel_name, &status); clEnqueueTask(device_queue, kernel, num_events, event_list, &my_event);`	`event my_event = device_queue.single_task<CopyKernel>([=] { // <your device code goes here> });`
`kernel = clCreateKernel(program, kernel_name, &status); clEnqueueNDRangeKernel(device_queue, kernel, num_dimensions, global_work_offset, global_work_size, local_work_size, num_events, event_list, &my_event);`	`event my_event = device_queue.parallel_for<CopyKernel>(nd_range<num_dimensions>(global_work_size, local_work_size, global_work_offset)), [=](id<num_dimensions> index) { // <your device code goes here> });`

kernel = clCreateKernel(program, kernel_name, &status);

clEnqueueTask(device_queue,
              kernel,
              num_events,
              event_list,
              &my_event);

event my_event = device_queue.single_task<CopyKernel>([=] {
    // <your device code goes here>
});

kernel = clCreateKernel(program, kernel_name, &status);

clEnqueueNDRangeKernel(device_queue, 
                       kernel, 
                       num_dimensions,
                       global_work_offset,
                       global_work_size, 
                       local_work_size, 
                       num_events, 
                       event_list, 
                       &my_event);

event my_event = device_queue.parallel_for<CopyKernel>(nd_range<num_dimensions>(global_work_size, local_work_size, global_work_offset)), [=](id<num_dimensions> index) {
    // <your device code goes here>
});

NOTE:

The CopyKernel template parameter to the single_task and parallel_for functions are the forward declared kernel names discussed earlier.

The following table shows how kernel arguments are set in OpenCL and SYCL:

NOTE:

Kernel arguments are not set explicitly in SYCL but instead captured by copy in the lambda's captures (the = in [=]). The lambda automatically captures any variable you use in the kernel and converts it to a kernel argument. However, since the variables are captured by copy, they must be trivially copyable.

OpenCL	SYCL
`clSetKernelArg(kernel, 2, sizeof(cl_uint), (void*) &N);`	Captured by the lambda's captures. For example, the integer `N` is captured by copy and can be used inside the kernel lambda in the following manner: `int N; device_queue.single_task<CopyKernel>([=] { // <your device code goes here> // you can use ‘N’ here });`

NOTE:

When a kernel is enqueued to the device queue in both OpenCL and SYCL, an event returns (my_event). In OpenCL, an event returns by passing a reference to the function. In SYCL, an event is passed as a return value of the function.

The following table shows how the host code can wait for this event to finish and thus wait for the kernel to finish:

NOTE:

Examples show the basic use of events. See Advanced Modifications for additional details.

OpenCL	SYCL
`clWaitForEvents(1, &my_event);`	`my_event.wait()`

After combining the kernel submission and event waiting code with the example from the previous section, you get the following code:

SYCL Example: Submitting a Single-task Kernel to the Device Queue

#include <sycl/sycl.hpp>
#include <sycl/ext/intel/fpga_extensions.hpp>

using namespace sycl;

// Forward declare the kernel name in the global scope to reduce
// name mangling. This is an FPGA best practice that makes it
// easier to identify the kernel in the optimization reports.
class CopyKernel;

void main() {
  // create the device queue using the FPGA device selector.
  queue device_queue(ext::intel::fpga_selector());
    
  // The input and output data on the host.
  int in_data[N], out_data[N];

  // Allocate memory on the device.
  int *in = malloc_device<int>(N, device_queue);
  int *out = malloc_device<int>(N, device_queue);

  // Copy input data to the device and wait for it to finish.
  device_queue.memcpy(in, in_data, N * sizeof(int)).wait();

  // Submit kernel to the device queue.
  event my_event = device_queue.single_task<CopyKernel>([=] {
    // <your device code goes here>
  });
  my_event.wait();  // wait on the kernel to finish

  // Copy output data back from the device and wait for it to finish.
  device_queue.memcpy(out_data, out, N * sizeof(int)).wait();
};

The host code transformation is now finished, and you have a complete SYCL host program. Next, you must transform the device code, which is covered in the Device Code Modification topic.

Parent topic: Basic Modifications

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Migrating OpenCL™ FPGA Designs to SYCL*

Host Code Modification

Add Include Files

Add the `sycl` Namespace

Forward Declare Kernel Names

Queue Creation

Memory Allocation and Movement

Create Kernels and Submit Them to the Command Queue

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Migrating OpenCL™ FPGA Designs to SYCL*

Host Code Modification

Add Include Files

Add the sycl Namespace

Forward Declare Kernel Names

Queue Creation

Memory Allocation and Movement

Create Kernels and Submit Them to the Command Queue

Add the `sycl` Namespace