This repository provides materials that can be used for teaching SYCL. The materials are provided using the “Creative Commons Attribution Share Alike 4.0 International” license.
If you’re not familiar with SYCL or would like some further resources for learning about SYCL below are a list of useful resources:
To use these materials simply clone this repository including the required submodules.
git clone --recursive https://github.com/codeplaysoftware/syclacademy.git
The lectures are written in reveal.js, and can be found in Lesson_Materials
,
in the sub-directory for each topic. To view them simply open the index.html
file in your browser. Your browser will have a “Full Screen” mode that can be
used to run the presentation, use the right and left cursors to move forward and
backward in the presentation.
The exercises can be found in Code_Exercises
in the sub-directory for each
topic. Each exercise has a markdown document instructing what to do in the
exercise, a source file to start with and a solution file to provide an
example implementation to compare against.
Contributions to the materials are very gratefully received and this can be done by submitting a Pull Request with any changes. If you can, follow the instructions here to generate a pdf file for any lecture slides you change. Please limit the scope of each Pull Request so that they can be reviewed and merged in a timely manner.
Codeplay Software Ltd., Heidelberg University, Intel, Xilinx and University of Bristol.
Abertay University, Universidad de Concepcion, TU Dresden, University of Edinburgh, Federal University of Sao Carlos, University of Glasgow, Heriot Watt University, Universitat Innsbruck, Universidad de Málaga, University of Salerno and University of the West of Scotland.
The SYCL Academy curriculum is divided up into a number of short lessons consisting of slides for presenting the material and a more detailed write-up, each accompanied by a tutorial for getting hands on experience with the subject matter.
Each of the lessons are designed to be self contained modules in order to support both academic and training style teaching environments.
A playlist of video content is also available. Though note that these slides and exercises may have changed since these videos were created so they may not match completely.
Lesson | Title | Slides | Exercise | Source | Solution | DPC++ | AdaptiveCpp |
---|---|---|---|---|---|---|---|
01 | What is SYCL | slides | exercise | source | solution | Yes | Yes |
02 | Enqueueing a Kernel | slides | exercise | source | solution | Yes | Yes |
03 | Managing Data | slides | exercise | source | solution | Yes | Yes |
04 | Handling Errors | slides | exercise | source | solution | Yes | Yes |
05 | Device Discovery | slides | exercise | source | solution | Yes | Yes |
06 | Data Parallelism | slides | exercise | source | solution | Yes | Yes |
07 | Introduction to USM | slides | exercise | source | solution | Yes | Yes |
08 | Using USM | slides | exercise | source | solution | Yes | Yes |
09 | Asynchronous Execution | slides | exercise | source | solution | Yes | Yes |
10 | Data and Dependencies | slides | exercise | source | solution | Yes | Yes |
11 | In Order Queue | slides | exercise | source | solution | Yes | Yes |
12 | Advanced Data Flow | slides | exercise | source | solution | Yes | Yes |
13 | Multiple Devices | slides | exercise | source | solution | Yes | Yes |
14 | ND Range Kernels | slides | exercise | source | solution | Yes | Yes |
15 | Image Convolution | slides | exercise | solution | Yes | Yes | |
16 | Coalesced Global Memory | slides | exercise | source | solution | Yes | Yes |
17 | Vectors | slides | exercise | source | solution | Yes | Yes |
18 | Local Memory Tiling | slides | exercise | source | solution | Yes | Yes |
19 | Further Optimisations | slides | exercise | source | solution | Yes | Yes |
20 | Matrix Transpose | slides | exercise | source | solution | Yes | Yes |
21 | More SYCL Features | slides | exercise | source | solution | Yes | Yes |
22 | Functors | slides | exercise | source | solution | Yes | Yes |
The exercises can be built for DPC++ and AdaptiveCpp.
Below is a list of the supported platforms and devices for each SYCL implementations, please check this before deciding which SYCL implementation to use. Make sure to also install the specified version to ensure that you can build all of the exercises.
Implementation | Supported Platforms | Supported Devices | Required Version |
---|---|---|---|
DPC++ | Intel DevCloud Windows 10 Visual Studio 2019 (64bit) Red Hat Enterprise Linux 8, CentOS 8 Ubtuntu 18.04 LTS, 20.04 LTS (64bit) Refer to System Requirements for more details |
Intel CPU (OpenCL) Intel GPU (OpenCL) Intel FPGA (OpenCL) Nvidia GPU (CUDA)* |
2021.4 |
AdaptiveCpp | Any Linux | CPU (OpenMP) AMD GPU (ROCm)*** NVIDIA GPU (CUDA) Intel GPU (Level Zero) Intel CPU, GPU (OpenCL) |
23.10.0 from Nov 1, 2023 or newer |
* Supported in open source project only
** See here for the official list of GPUs supported by AMD for ROCm. We do not recommend using GPUs earlier than gfx9 (Vega 10 and Vega 20 chips).
First you’ll need to install your chosen SYCL implementation and any dependencies they require.
To set up DPC++ follow the getting started instructions.
You can also use a Docker* image.
If you are using the Intel DevCloud then the latest version of DPC++ will already be installed and available in the path.
You will need a AdaptiveCpp (formerly hipSYCL) build from September 2021 or newer. Refer to the AdaptiveCpp installation instructions for details on how to install AdaptiveCpp.
Before building the exercises you’ll need:
Clone this repository, there are some additional dependencies configured as git sub-modules so make sure to clone those as well. Then simply invoke CMake as follows:
mkdir build
cd build
cmake ../ -G<cmake_generator> -A<cmake_arch> -D<sycl_implementation>=ON
For <cmake_generator>
/ <cmake_arch>
we recommend:
For sycl_implementation
this can be one of:
SYCL_ACADEMY_USE_ADAPTIVECPP
SYCL_ACADEMY_USE_DPCPP
You can also specify the additional optional options:
-DSYCL_ACADEMY_INSTALL_ROOT=<path_to_sycl_impl_install_root>
For <path_to_sycl_impl_install_root>
we recommend you specify the path to the
root directory of your SYCL implementation installation, though this may not
always be required.
-DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON
This will enable building the solutions for each exercise as well as the source files. This is disabled by default.
-DCMAKE_BUILD_TYPE=Release
The build configuration for all exercises defaults to a debug build if this option is not specified.
-DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
This SYCL Academy CMake configuration uses the Intel oneAPI IntelSYCL CMake module package to assist it in its configuration. These command line arguments must be used to initiate this configuration correctly.
-DSYCL_TRIPLE
can be used to specify a DPC++ compatible SYCL triple. Possible
values include:
amdgcn-amd-amdhsa
- For AMD devicesnvptx64-nvidia-cuda
- For CUDA devicesspir64_gen
- For Intel GPUsnative_cpu
- For native CPU SYCL device (dependent on DPCPP version)-DSYCL_ARCH
can also be used to specify a device arch. This CMake opt is
necessary for AMD. Possible values include:
gfx90a
- For AMD MI200sm_80
- For NVIDIA A100pvc
- For Intel PVCIt may also be necessary to manually specify the install location of the CUDA or ROCM SDK, if this is found in a non-standard location. The flags:
-DROCM_DIR
and -DCUDA_DIR
can be used to specify the install dir of the ROCM
or CUDA SDKs, respectively.
cmake .. -GNinja -DSYCL_ACADEMY_USE_DPCPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx _DSYCL_TRIPLE=amdgcn-amd-amdhsa -DSYCL_ARCH=gfx90a -DROCM_DIR=/opt/rocm/5.4.3
cmake .. -GNinja -DSYCL_ACADEMY_USE_DPCPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx _DSYCL_TRIPLE=nvptx64-nvidia-cuda -DSYCL_ARCH=sm_61 -DCUDA_DIR=/usr/local/cuda/11.2/
Sufficiently new (>= 24.02.0), full installations of AdaptiveCpp do not require specifying compilation targets. In this case, targets may still be provided optionally.
For older AdaptiveCpp versions, CMake will require you to specify the compilation targets using -DACPP_TARGETS=<target specification>
.
<target specification>
is a list of compilation flows to enable and devices to target, for example
-DACPP_TARGETS="omp;generic"
compiles for CPUs using OpenMP and GPUs using the generic single-pass compiler.
If your AdaptiveCpp installation does not force a compilation target to be provided, but it was built with the generic single-pass compiler disabled (it is enabled by default in all AdaptiveCpp installations built against LLVM >= 14), it is compiling for a default set of targets provided at installation time. If you cannot run the binary on the hardware of your choice, this default set may not be the right one for your hardware and you may have to specify the right targets explicitly.
Available compilation flows are:
omp
- OpenMP CPU backendgeneric
- Generic single-pass compiler. Generates a binary that runs on host CPU, AMD, NVIDIA and Intel GPUs using runtime compilationcuda
- CUDA backend for NVIDIA GPUs. Requires specification of targets of the form sm_XY, e.g. sm_70 for Volta, sm_60 for Pascal. E.g: cuda:sm_70
.hip
- HIP backend for AMD GPUs. Requires specification of targets of the form gfxXYZ, e.g. gfx906 for Vega 20, gfx900 for Vega 10. E.g.: hip:gfx906
.When in doubt, use -DACPP_TARGETS=generic
as it compiles the fastest, usually generates the fastest binaries, and generates portable binaries.
Invoking CMake from the command line example usage:
cmake .. "-GUnix Makefiles" -DSYCL_ACADEMY_USE_DPCPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
First you have to ensure that your environment is configured to use DPC++ (note if you are using the Intel DevCloud then you don’t need to do this step).
On Linux simply call the setvars.sh
which is available in /opt/intel/oneapi
for sudo or root users and ~/intel/oneapi/ when installed as a normal user.
For root or sudo installations:
source /opt/intel/oneapi/setvars.sh
For normal user installations:
source ~/intel/oneapi/setvars.sh
On Windows the script is located in <dpc++_install_root>\setvars.bat
Where <dpc++_install_root>
is wherever the oneAPI
directory is installed.
Once that’s done you can invoke the DPC++ compiler as follows:
icpx -fsycl -o a.out source.cpp
Where <syclacademy_root>
is the path to the root directory of where you cloned
this repository. Note that on Windows you need to add the option /EHsc to avoid exception handling error.
The CMake configuration can also be used to build the exercises, see the section Configuring using CMake above.
Once you have a working SYCL compiler, you are ready to start writing some SYCL code. To find the first exercise:
cd Code_Exercises/Compiling_with_SYCL/
And read the README.md for further instructions.
Each exercise directory contains:
README.md
, which contains instructions of how to complete a given exercise, as well as directions for compilation.source.cpp
, a placeholder file where your code implementation should be written.solution.cpp
, where a solution has been implemented in advance.Once you have completed any given exercise make sure to compare your implementation against the corresponding solution.cpp
.
Hosted by tech.io, this SYCL Introduction tutorial introduces the concepts of SYCL. The website also provides the ability to compile and execute SYCL code from your web browser.
ssh devcloud
)Jupiter Notebook
select File->New->TerminalYou are now ready to start with the first lesson. Enjoy !
* If you are using **DevCloud via ssh**, run:
```sh
module load cmake
cd syclacademy
mkdir build
cd build
cmake ../ "-GUnix Makefiles" -DSYCL_ACADEMY_USE_DPCPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx