Developer Reference

Migrating OpenCL™ FPGA Designs to SYCL*

ID 767849
Date 5/08/2024
Public

Host Code Modification

This topic describes some checks and best-known methods you should consider when converting your OpenCL host program to SYCL*.

Device Selection

Your design can target the Intel® FPGA Emulation Platform for OpenCL™ software (FPGA emulator) for functional testing before targeting the FPGA hardware. To target the emulator device, you must make small changes in your queue-creation host code. You can address this by using preprocessor macros. The following table depicts the method for selecting between the FPGA emulator and hardware by using the FPGA_EMULATOR macro:

Selecting Between the FPGA Emulator and Hardware
OpenCL SYCL
#ifdef FPGA_EMULATOR
platform = findPlatform("Intel(R) FPGA Emulation Platform for OpenCL(TM)”);
#else
platform = findPlatform("Intel(R) FPGA SDK for OpenCL(TM)");
#endif

clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, &device, &num_devices);

clCreateContext(0, num_devices, &device, &context_error_callback, NULL, &status);

cl_command_queue device_queue = clCreateCommandQueue (context, device_id,
properties, &errcode_ret);
#ifdef FPGA_EMULATOR
ext::intel::fpga_emulator_selector selector;
#else
ext::intel::fpga_selector selector;
#endif
queue q(selector);

With the above code in place, when compiling your OpenCL host code or your SYCL single-source file, add the -DFPGA_EMULATOR flag to your compile command to target the emulator. If you want to compile for the FPGA hardware target, add the -Xshardware flag. See FPGA Compilation Flags in the Intel® oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs for more information.

Enable Queue Profiling

The following table shows how to enable queue profiling in both OpenCL and SYCL:

Enable Queue Profiling
OpenCL SYCL
cl_command_queue device_queue = clCreateCommandQueue (context, device_id,
CL_QUEUE_PROFILING_ENABLE, &errcode_ret);
auto prop_list = property_list{property::queue::enable_profiling};
queue device_queue(ext::intel::fpga_selector{}, prop_list);

Querying the profiling information from queue events is discussed in the Events and Synchronization section.

NOTE:

To enable profiling during design compilation and add profiling counters to the SYCL kernel pipeline, include the -Xsprofile flag in your icpx command. For additional details, see Intel® FPGA Dynamic Profiler for DPC++ section in the Intel® oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs.

Error Handling

In OpenCL, most runtime API functions return an error code, and you perform the error handling to check whether that API call was successful or not. In SYCL, runtime errors are reported by throwing an exception caught either by an error handler or a try-catch block.

SYCL Example: Device Queue Creation in SYCL with an Error Handler to Catch Exceptions

NOTE:

The code example is combined with the queue profiling code from the previous section.

auto handler = [](exception_list e_list) {
  for (auto& e : e_list) {
    try {
      std::rethrow_exception(e);
    } catch (exception& e) {
       std::cout << "I have caught an exception!" << std::endl;
       std::cout << e.what() << std::endl;
    }
  }
};

auto prop_list = property_list{property::queue::enable_profiling};
queue device_queue(ext::intel::fpga_selector{}, handler, prop_list);

The following table shows how you would handle errors for submitting a single-task kernel to the device queue in OpenCL and SYCL. In SYCL, you can either construct your device queue with an error handler (as depicted in the previous code snippet) or wrap the command in a try-catch block.

Error Handling
OpenCL SYCL
event my_event;
cl_int status = clEnqueueTask(device_queue, kernel, num_events, event_list, &my_event);

if (status != 0) {
  // <handle error here>
}
// no error handling code necessary since the error is caught by the handle
event my_event = device_queue.single_task<Kernel>([=] {
  // <your device code goes here>
});
try {
  event my_event = device_queue.single_task<Kernel>([=] {
  // <your device code goes here>
});
} catch (exception& e) {
   std::cout << "I have caught an exception!" << std::endl;
   std::cout << e.what() << std::endl;
}

For brevity, the table above shows only wait_and_throw() handling errors for submitting a single-task kernel to the queue. However, the code is very similar for other queue operations, such as memory allocations, memory transfer operations, submitting NDRange kernels, and so on.

Events and Synchronization

In both OpenCL and SYCL, synchronization allows your host program to synchronize with the asynchronous operations running on or interacting with the device. The most basic form of synchronization is to wait for all events in the device queue to finish. The following table depicts how to synchronize in OpenCL and SYCL:

Events and Synchronization
OpenCL SYCL
clFinish(device_queue);
device_queue.wait_and_throw();
device_queue.wait();

The wait_and_throw() method throws asynchronous exceptions to the error handler, while the wait() method does not.

In both OpenCL and SYCL, an event represents the status of an operation that the runtime executes. Events allow you to control the scheduling of queue operations explicitly and to query their progress status. The following table demonstrates how to capture the events of a few OpenCL and SYCL operations:

Capture Events
OpenCL SYCL
event my_event;
clEnqueueWriteBuffer(device_queue, in, CL_FALSE, 0, sizeof(int) * N, in_data, 0, NULL, &my_event);
event my_event = device_queue.memcpy(in, in_data, N * sizeof(int));
event my_event;
clEnqueueReadBuffer(device_queue, out, CL_FALSE, 0, sizeof(int) * N, out_data, 0, NULL, &my_event);
event my_event = device_queue.memcpy(out_data, out, N * sizeof(int));
event my_event;
clEnqueueTask(device_queue, kernel, num_events, event_list, &my_event);
event my_event = device_queue.single_task<Kernel>([=] {});

Events provide a fine-grain synchronization method rather than waiting on all outstanding queue operations to finish. The following table depicts how to wait on an individual event:

Wait on an Individual Event
OpenCL SYCL
clWaitForEvents(&my_event, 1);
my_event.wait();

Additionally, you can use events to create dependencies and control the scheduling of operations. For example, the following table depicts how you would enqueue a single-task kernel to start after the some_event event finishes:

Enqueue a Single-Task Kernel
OpenCL SYCL
event my_event;
clEnqueueTask(device_queue, kernel, 1, &some_event, &my_event);
event my_event = device_queue.submit([&](handler &h) {
  h.depends_on(some_event);
  h.single_task<Kernel>([=]() {});
));

Lastly, events are used to access profiling information for the operation they represent. If you enabled queue profiling, you can access profiling information using the event returned when the operation was enqueued. The following table depicts how to access the profiling information of an event in OpenCL and SYCL:

Access the Profiling Information of an Event
OpenCL SYCL
unsigned long start = 0;
clGetEventProfilingInfo(my_event, CL_PROFILING_COMMAND_SUBMIT, sizeof(cl_ulong), &submit, NULL);
auto submit = my_event.get_profiling_info<info::event_profiling::command_submit>();
unsigned long start = 0;
clGetEventProfilingInfo(my_event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong),&start, NULL);
auto start = my_event.get_profiling_info<info::event_profiling::command_start>();
unsigned long end = 0;
clGetEventProfilingInfo(my_event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);
auto end = my_event.get_profiling_info<info::event_profiling::command_end>();

SYCL Buffers and Accessors

Like OpenCL buffers, SYCL buffers are shared memory of one, two, or three dimensions that you can use in a kernel. Unlike OpenCL buffers, SYCL buffers must be accessed using SYCL accessors. Using these accessors, the SYCL runtime analyzes the accesses of the buffers and creates a dependency graph of the host and device operations. This allows the runtime to schedule data movement and kernel events automatically.

For example, the following table depicts how to enqueue two kernels, KernelA and KernelB, that operate sequentially on the same buffer, buf. Assume that the device queue is properly set up and the data in buf is already transferred to the device.

Enqueuing Two kernels That Operate Sequentially on the Same Buffer
OpenCL SYCL
cl_mem buf = clCreateBuffer(context, CL_MEM_READ_WRITE, 
                            sizeof(int)*7, NULL, &status);

clSetKernelArg(KernelA, 0, sizeof(cl_mem), (void*) &buf);

event kernel_a_event;
clEnqueueTask(device_queue, KernelA, 0, NULL, &kernel_a_event);

clSetKernelArg(KernelB, 0, sizeof(cl_mem), (void*) &buf);

event kernel_b_event;
clEnqueueTask(device_queue, KernelB, 1, 
              &kernel_a_event, &kernel_b_event);
buffer<int, 1> buf(7);

device_queue.submit([&](handler &h) {
  accessor buf_acc(buf, h, read_write);
  h.single_task<KernelA>([=]() { });
));

device_queue.submit([&](handler &h) {
  accessor buf_acc(buf, h, read_write);
  h.single_task<KernelB>([=]() { });
));

In the SYCL code, since both kernels can write to buf (via buf_acc), the runtime implicitly adds a dependency between the kernels, and KernelA runs and completes before KernelB starts. In OpenCL, you must add this dependency manually using the event_list argument of the clEnqueueTask function.

One of the benefits of using SYCL buffers and accessors is that the runtime can automatically schedule both kernel and data movement operations. For example, the following code snippet shows a basic design that copies input data to the device, enqueues a kernel that reads from the input buffer, writes to an output buffer, and finally copies the output data back from the device:

int in_data[N], out_data[N];
{
  buffer<int, 1> in_buf(in_data, N);
  buffer<int, 1> out_buf(out_data, N);

  device_queue.submit([&](handler &h) {
    accessor in(in_buf, h, read_only);
    accessor out(out_buf, h, write_only, no_init);
    h.single_task<Kernel>([=]() { });
  )).wait();

  // CAUTION: The kernel has finished, but the data has not been copied back to out_data yet!
}

// out_buf is out of scope, so the contents have been copied back to out_data

For more details about buffer properties, accessors, dependency rules, constructors, and destructors, refer to the Buffers chapter in Data Parallel C++ Programming Accelerated Systems Using C++ and SYCL.

NOTE:

SYCL buffers have convenience constructors that accept std::array and std::vector objects as arguments and infer the type and size of the buffer, an example of which is shown in the following code snippet:

std::array<int, N> my_std_array;
std::vector<int> my_std_vector; 
{
  //expands to:
  // buffer<int, 1> my_std_array_buf(my_std_array.data(), my_std_array.size());
  buffer my_std_array_buf(my_std_array);

  //expands to:
  //buffer<int, 1> my_std_vector_buf(my_std_vector.data(), my_std_vector.size());
  buffer my_std_vector_buf(my_std_vector);

  // …
}