## Handling Errors and Debugging

## Learning Objectives * Learn about how SYCL handles errors * Learn about the difference between synchronous and asynchronous exceptions * Learn how to handle exceptions and retrieve further information * Learn about the host device and how to use it

#### SYCL exceptions

* In SYCL errors are handled by throwing exceptions. * It is crucial that these errors are handled, otherwise your application could fail in unpredictable ways. * In SYCL there are two kinds of error: * Synchronous errors (thrown in user thread) . * Asynchronous errors (thrown by the SYCL scheduler).

#### Handling errors

int main() {
  queue q();

  /* Synchronous code */

  q.submit([&](handler &cgh) {

    /* Synchronous code */

    cgh.parallel_for<add>(bufO.get_range(), [=](id<1> i) {

      /* Asynchronous code */

    });
  });
}

* Kernels run asynchronously on the device, and will throw asynchronous errors. * Everything else runs synchronously on the host, and will throw synchronous errors.

#### SYCL exceptions

![SYCL](../common-revealjs/images/sycl-exceptions.png "SYCL")

#### Handling errors

class add;

int main() {
  queue q();

  /* Synchronous code */ 

  q.submit([&](handler &cgh) {
    /* Synchronous code */

    cgh.single_task<add>([=](id<1> i) {
      /* Asynchronous code */
    });
  }).wait();
}

* Code on the device runs asynchronously * If errors are not handled, the application can fail: * SYCL 1.2.1 application will fail silently. * SYCL 2020 provides a default async handler that will call `std::terminate` when an asynchronous error is thrown.

class add;

int main() {
  std::vector<float> dA{ 7, 5, 16, 8 }, dB{ 8, 16, 5, 7 }, dO{ 0, 0, 0, 0 };
  try {
    queue gpuQueue(gpu_selector{});

    buffer bufA{dA};
    buffer bufB{dB};
    buffer bufO{dO};

    gpuQueue.submit([&](handler &cgh) {
      auto inA = accessor{bufA, cgh, read_only};
      auto inB = accessor{bufB, cgh, read_only};
      auto out = accessor{bufO, cgh, write_only};

      cgh.parallel_for<add>(bufO.get_range(), [=](id<1> i) {
        out[i] = inA[i] + inB[i];
      });
    }).wait();

  } catch (...) { /* handle errors */ }
}

* Synchronous errors are typically thrown by SYCL API functions. * In order to handle all SYCL errors you must wrap everything in a try-catch block.

class add;

int main() {
  std::vector<float> dA{ 7, 5, 16, 8 }, dB{ 8, 16, 5, 7 }, dO{ 0, 0, 0, 0 };
  try{
    queue gpuQueue(gpu_selector{}, async_handler{});
    buffer bufA{dA};
    buffer bufB{dB};
    buffer bufO{dO};

    gpuQueue.submit([&](handler &cgh) {
      auto inA = accessor{bufA, cgh, read_only};
      auto inB = accessor{bufB, cgh, read_only};
      auto out = accessor{bufO, cgh, write_only};

      cgh.parallel_for<add>(bufO.get_range(), [=](id<1> i) {
        out[i] = inA[i] + inB[i];
      });
    }).wait();
  
    gpuQueue.throw_asynchronous();
  } catch (...) { /* handle errors */
}

* Asynchronous errors errors that may have occurred will be thrown after a command group has been submitted to a `queue`. * To handle these errors you must provide an async handler when constructing the queue object. * Then you must also call the `throw_asynchronous` or `wait_and_throw` member functions of the `queue` class. * This will pass the exceptions to the async handler in the user thread so they can be thrown.

class add;

int main() {
  std::vector<float> dA{ 7, 5, 16, 8 }, dB{ 8, 16, 5, 7 }, dO{ 0, 0, 0, 0 };
  try{
    queue gpuQueue(gpu_selector{}, [=](exception_list eL) { 
      for (auto e : eL) { std::rethrow_exception(e); }
    });

    buffer bufA{dA};
    buffer bufB{dB};
    buffer bufO{dO};

    gpuQueue.submit([&](handler &cgh) {
      auto inA = accessor{bufA, cgh, read_only};
      auto inB = accessor{bufB, cgh, read_only};
      auto out = accessor{bufO, cgh, write_only};

      cgh.parallel_for<add>(bufO.get_range(), [=](id<1> i) {
        out[i] = inA[i] + inB[i];
      });
    }).wait();

    gpuQueue.throw_asynchronous();
  } catch (...) { /* handle errors */ }
}

* The async handler is a C++ lambda or function object that takes as a parameter an ``exception_list`` * The exception_list class is a wrapper around a list of ``exception_ptrs`` which can be iterated over * The exception_ptrs can be rethrown by passing them to ``std::rethrow_exception``

int main() {
  std::vector<float> dA{ 7, 5, 16, 8 }, dB{ 8, 16, 5, 7 }, dO{ 0, 0, 0, 0 };
  try {
    queue gpuQueue(gpu_selector{}, [=](exception_list eL) {
      for (auto e : eL) { std::rethrow_exception(e); }
    });

    ...

    gpuQueue.throw_asynchronous();
  } catch (const std::exception& e) {
    std::cout << “Exception caught: ” << e.what() 
     << std::endl;
  }
}

* Once rethrown and caught, a SYCL exception can provide information about the error * The ``what`` member function will return a string with more details

int main() {
  std::vector<float> dA{ 7, 5, 16, 8 }, dB{ 8, 16, 5, 7 }, dO{ 0, 0, 0, 0 };
  try {
    queue gpuQueue(gpu_selector{}, [=](exception_list eL) {
      for (auto e : eL) { std::rethrow_exception(e); }
    });

    ...

    gpuQueue.throw_asynchronous();
  } catch (const sycl::exception& e) {
    std::cout << “Exception caught: ” << e.what();
    std:: cout << “ With OpenCL error code: ” 
     << e.get_cl_code() << std::endl;
  }
}

* In SYCL 1.2.1, if the exception has an OpenCL error code associated with it this can be retrieved by calling the `get_cl_code` member function * If there is no OpenCL error code this will return `CL_SUCCESS` * SYCL 2020 provides the `error_category_for` templated free function that allows checking for the category of the exception depending on the backend used (e.g. `backend::opencl`), and `e.code().value()` will correspond to the backend error code.

int main() {
  std::vector<float> dA{ 7, 5, 16, 8 }, dB{ 8, 16, 5, 7 }, dO{ 0, 0, 0, 0 };

  queue gpuQueue(gpu_selector{}, [=](exception_list eL) {
    for (auto e : eL) { std::rethrow_exception(e); }
  });
  context gpuContext = gpuQueue.get_context();

  try {
    ...
    gpuQueue.wait_and_throw();
  } catch (const sycl::exception& e) {
    if (e.has_context()) {
      if (e.get_context() == gpuContext) {
        /* handle error */
      }
    }
  }
}

* The `has_context` member function will tell you if there is a SYCL context associated with the error * If that returns true then the `get_context` member function will return the associated SYCL context object

## Exception Types

* In SYCL 1.2.1 there are a number of different exception types that inherit from `std::exception` * E.g. `runtime_error`, `kernel_error` * SYCL 2020 only has a single `sycl::exception` type which provides different error codes * E.g. `errc::runtime`, `errc::kernel`

## Debugging SYCL Kernel Functions

* Every SYCL 1.2.1 implementation is required to provide a host device * This device executes native C++ code but is guaranteed to emulate the SYCL execution and memory model * This means you can debug a SYCL kernel function by switching to the host device and using a standard C++ debugger * For example gdb

* SYCL 2020 only guarantees that a device will always be available, and users can query the `host_debuggable` device aspect to check whether they can use the same functionality as the SYCL 1.2.1 host device

class add;

int main() {
  std::vector<float> dA{ 7, 5, 16, 8 }, dB{ 8, 16, 5, 7 }, dO{ 0, 0, 0, 0 };
  try{
    queue hostQueue(aspect_selector<aspect::host_debuggable>(), async_handler{});

    buffer bufA{dA};
    buffer bufB{dB};
    buffer bufO{dO};

    hostQueue.submit([&](handler &cgh) {
      auto inA = accessor{bufA, cgh, read_only};
      auto inB = accessor{bufB, cgh, read_only};
      auto out = accessor{bufO, cgh, write_only};

      cgh.parallel_for<add>(bufO.get_range(), [=](id<1> i) {
        out[i] = inA[i] + inB[i];
      });
    });
    hostQueue.wait_and_throw();
  } catch (...) { /* handle errors */ }
}

* Any SYCL application can be debugged on the host device by switching the queue for a host queue * Replacing the device selector for the `aspect_selector` will ensure that the queue submits all work to the device with the requested aspects, in this case a host debuggable device * In SYCL 1.2.1, `host_selector` would be used instead, deprecated in SYCL 2020

## Questions

#### Exercise

Code_Exercises/Handling_Errors/source

Add error handling to a SYCL application for both synchronous and asynchronous errors.