## Managing Data

## Learning Objectives * Learn about the USM and buffer/accessor models for managing data * Learn how to allocate, transfer and free memory using USM. * Learn how a buffer synchronizes data * Learn how to access data in a kernel function

#### Memory Models

* In SYCL there are two models for managing data: * The buffer/accessor model. * The USM (unified shared memory) model. * Which model you choose can have an effect on how you enqueue kernel functions.

#### CPU and GPU Memory

* A GPU has its own memory, separate to CPU memory. * In order for the GPU to use memory from the CPU, the following actions must take place (either explicitly or implicitly): * Memory allocation on the GPU. * Data migration from the CPU to the allocation on the GPU. * Some computation on the GPU. * Migration of the result back to the CPU.

#### CPU and GPU Memory

* Memory transfers between CPU and GPU are a bottleneck. * We want to minimize these transfers, when possible.

#### USM Allocation Types

* There are different ways USM memory can be allocated: host, device and shared. ![SYCL](../common-revealjs/images/Figure6-1bookUSMtypes.png "SYCL") (from book)

#### Using USM - Malloc Device

// Allocate memory on device
T *device_ptr = sycl::malloc_device<T>(n, myQueue);

// Copy data to device
myQueue.memcpy(device_ptr, cpu_ptr, n * sizeof(T));

// ...
// Do some computation on device
// ...

// Copy data back to CPU
myQueue.memcpy(result_ptr, device_ptr, n * sizeof(T)).wait();

// Free allocated data
sycl::free(device_ptr, myQueue);

* It is important to free memory after it has been used to avoid memory leaks.

#### Using USM - Malloc Shared

// Allocate shared memory 
T *shared_ptr = sycl::malloc_shared<T>(n, myQueue);

// Shared memory can be accessed on host as well as device
for (auto i = 0; i < n; ++i)
  shared_ptr[i] = i;

// ...
// Do some computation on device
// ...

// Free allocated data
sycl::free(shared_ptr, myQueue);

* Shared memory is accessible on host and device. * Performance of shared memory accesses may be poor depending on platform.

#### SYCL Buffers & Accessors

* SYCL provides an API which takes care of allocations and `memcpy`s, as well as some other things.

#### SYCL Buffers & Accessors

* The buffer/accessor model separates the storage and access of data * A SYCL buffer manages data across the host and any number of devices * A SYCL accessor requests access to data on the host or on a device for a specific SYCL kernel function * Accessors are also used to access data within a SYCL kernel function * This means they are declared in the host code but captured by and then accessed within a SYCL kernel function

#### SYCL Buffers & Accessors

* A SYCL buffer can be constructed with a pointer to host memory * For the lifetime of the buffer this memory is owned by the SYCL runtime * When a buffer object is constructed it will not allocate or copy to device memory at first * This will only happen once the SYCL runtime knows the data needs to be accessed and where it needs to be accessed

![Buffer Host Memory](../common-revealjs/images/buffer-hostmemory.png "Buffer Host Memory")

#### SYCL Buffers & Accessors

* Constructing an accessor specifies a request to access the data managed by the buffer * There are a range of different types of accessor which provide different ways to access data

![Buffer Host Memory Accessor](../common-revealjs/images/buffer-hostmemory-accessor.png "Buffer Host Memory Accessor")

#### SYCL Buffers & Accessors

* When an accessor is constructed it is associated with a command group via the handler object * This connects the buffer that is being accessed, the way in which it’s being accessed and the device that the command group is being submitted to

![Buffer Host Memory Accessor CG](../common-revealjs/images/buffer-hostmemory-accessor-cg.png "Buffer Host Memory Accessor CG")

#### SYCL Buffers & Accessors

* Once the SYCL scheduler selects the command group to be executed it must first satisfy its data dependencies * If necessary, this includes allocating and copying the data to the device accessing that data * If the most recent copy of the data is already on the device then the runtime will not copy again

![Buffer Host Memory Accessor CG Device](../common-revealjs/images/buffer-hostmemory-accessor-cg-device.png "Buffer Host Memory Accessor CG Device")

#### SYCL Buffers & Accessors

* Data will remain in device memory after kernels finish executing until another accessor requests access in a different device or on the host * When the buffer object is destroyed it will wait for any outstanding work that is accessing the data to complete and then copy back to the original host memory

![Buffer Destroyed](../common-revealjs/images/buffer-destroyed.png "Buffer Destroyed")

#### SYCL Buffers & Accessors

T var = 42;

{
  // Create buffer pointing to var.
  auto buf = sycl::buffer{&var, sycl::range<1>{1}};

  // ...
  // Do some computation on device. Use accessors to access buffer
  // ...
  
} // var updated here

assert(var != 42);

* A buffer is associated with a type, range and dimensionality. Dimensionality must be either 1, 2 or 3. * Usually type and dimensionality can be inferred using CTAD. * If a buffer is associated with some allocation in host memory, the host memory will be updated only once the buffer goes out of scope.

#### Accessor class

![Accessor Types](../common-revealjs/images/accessor-types.png "Accessor Types")

#### Accessor class

* There are many different ways to use the `accessor` class. * Accessing data on a device. * Accessing data immediately in the host application. * Allocating local memory. * For now we are going to focus on accessing data on a device.

#### Constructing an accessor

auto acc = sycl::accessor{bufA, cgh};

* There are many ways to construct an `accessor`. * The `accessor` class supports CTAD so it's not necessary to specify all of the template arguments. * The most common way to construct an `accessor` is from a `buffer` and a `handler` associated with the command group function you are within. * The element type and dimensionality are inferred from the `buffer`. * The `access_mode` is defaulted to `access_mode::read_write`.

#### Specifying the access mode

auto readAcc = sycl::accessor{bufA, cgh, sycl::read_only};
auto writeAcc = sycl::accessor{bufB, cgh, sycl::write_only};

* When constructing an `accessor` you will likely also want to specify the `access_mode` * You can do this by passing one of the CTAD tags: * `read_only` will result in `access_mode::read`. * `write_only` will result in `access_mode::write`.

#### Specifying no initialization

auto acc = sycl::accessor{buf, cgh, sycl::no_init};

* When constructing an `accessor` you may also want to discard the original data of a `buffer`. * You can do this by passing the `no_init` property.

#### Using Accessors

T var = 42;

{
  // Create buffer pointing to var.
  auto bufA = sycl::buffer{&var, sycl::range<1>{1}};
  auto bufB = sycl::buffer{&var, sycl::range<1>{1}};

  q.submit([&](sycl::handler &cgh) {
	auto accA = sycl::accessor{bufA, cgh, sycl::read_only};
	auto accB = sycl::accessor{bufA, cgh, sycl::no_init};

  cgh.single_task<mykernel>(...); // Do some work
  });
  
} // var updated here

assert(var != 42);

* Buffers and accessors take care of memory migration, as well as dependency analysis. * More to come later on dependencies.

#### operator[]

gpuQueue.submit([&](handler &cgh){
  auto inA = sycl::accessor{bufA, cgh, sycl::read_only};
  auto inB = sycl::accessor{bufB, cgh, sycl::read_only};
  auto out = sycl::accessor{bufO, cgh, sycl::write_only};
  cgh.single_task<mykernel>([=]{
    out[0] = inA[0] + inB[0];
  }); 
});

* As well as specifying data dependencies an `accessor` can also be used to access the data from within a kernel function. * You can do this by calling `operator[]` on the `accessor`. * `operator[]` for USM pointers must take a `size_t`, whereas `operator[]` for accessors can take a multi-dimensional `sycl::id` or a `size_t`.

## Questions

#### Exercise

Code_Exercises/Managing_Data/source

Implement a SYCL application that adds two variables and returns the result using:

1. The USM memory model 2. The buffer/accessor memory model.