## Managing Data
## Learning Objectives * Learn about the USM and buffer/accessor models for managing data * Learn how to allocate, transfer and free memory using USM. * Learn how a buffer synchronizes data * Learn how to access data in a kernel function
#### Memory Models
* In SYCL there are two models for managing data: * The buffer/accessor model. * The USM (unified shared memory) model. * Which model you choose can have an effect on how you enqueue kernel functions.
#### CPU and GPU Memory
* A GPU has its own memory, separate to CPU memory. * In order for the GPU to use memory from the CPU, the following actions must take place (either explicitly or implicitly): * Memory allocation on the GPU. * Data migration from the CPU to the allocation on the GPU. * Some computation on the GPU. * Migration of the result back to the CPU.
#### CPU and GPU Memory
* Memory transfers between CPU and GPU are a bottleneck. * We want to minimize these transfers, when possible.
#### USM Allocation Types
* There are different ways USM memory can be allocated: host, device and shared. ![SYCL](../common-revealjs/images/Figure6-1bookUSMtypes.png "SYCL") (from book)
#### Using USM - Malloc Device
// Allocate memory on device
T *device_ptr = sycl::malloc_device<T>(n, myQueue);

// Copy data to device
myQueue.memcpy(device_ptr, cpu_ptr, n * sizeof(T));

// ...
// Do some computation on device
// ...

// Copy data back to CPU
myQueue.memcpy(result_ptr, device_ptr, n * sizeof(T)).wait();

// Free allocated data
sycl::free(device_ptr, myQueue);
						
* It is important to free memory after it has been used to avoid memory leaks.
#### Using USM - Malloc Shared
// Allocate shared memory 
T *shared_ptr = sycl::malloc_shared<T>(n, myQueue);

// Shared memory can be accessed on host as well as device
for (auto i = 0; i < n; ++i)
  shared_ptr[i] = i;

// ...
// Do some computation on device
// ...

// Free allocated data
sycl::free(shared_ptr, myQueue);
						
* Shared memory is accessible on host and device. * Performance of shared memory accesses may be poor depending on platform.
#### SYCL Buffers & Accessors
* SYCL provides an API which takes care of allocations and `memcpy`s, as well as some other things.
#### SYCL Buffers & Accessors
* The buffer/accessor model separates the storage and access of data * A SYCL buffer manages data across the host and any number of devices * A SYCL accessor requests access to data on the host or on a device for a specific SYCL kernel function * Accessors are also used to access data within a SYCL kernel function * This means they are declared in the host code but captured by and then accessed within a SYCL kernel function
#### SYCL Buffers & Accessors
* A SYCL buffer can be constructed with a pointer to host memory * For the lifetime of the buffer this memory is owned by the SYCL runtime * When a buffer object is constructed it will not allocate or copy to device memory at first * This will only happen once the SYCL runtime knows the data needs to be accessed and where it needs to be accessed
![Buffer Host Memory](../common-revealjs/images/buffer-hostmemory.png "Buffer Host Memory")
#### SYCL Buffers & Accessors
* Constructing an accessor specifies a request to access the data managed by the buffer * There are a range of different types of accessor which provide different ways to access data
![Buffer Host Memory Accessor](../common-revealjs/images/buffer-hostmemory-accessor.png "Buffer Host Memory Accessor")
#### SYCL Buffers & Accessors
* When an accessor is constructed it is associated with a command group via the handler object * This connects the buffer that is being accessed, the way in which it’s being accessed and the device that the command group is being submitted to
![Buffer Host Memory Accessor CG](../common-revealjs/images/buffer-hostmemory-accessor-cg.png "Buffer Host Memory Accessor CG")
#### SYCL Buffers & Accessors
* Once the SYCL scheduler selects the command group to be executed it must first satisfy its data dependencies * If necessary, this includes allocating and copying the data to the device accessing that data * If the most recent copy of the data is already on the device then the runtime will not copy again
![Buffer Host Memory Accessor CG Device](../common-revealjs/images/buffer-hostmemory-accessor-cg-device.png "Buffer Host Memory Accessor CG Device")
#### SYCL Buffers & Accessors
* Data will remain in device memory after kernels finish executing until another accessor requests access in a different device or on the host * When the buffer object is destroyed it will wait for any outstanding work that is accessing the data to complete and then copy back to the original host memory
![Buffer Destroyed](../common-revealjs/images/buffer-destroyed.png "Buffer Destroyed")
#### SYCL Buffers & Accessors
T var = 42;

{
  // Create buffer pointing to var.
  auto buf = sycl::buffer{&var, sycl::range<1>{1}};

  // ...
  // Do some computation on device. Use accessors to access buffer
  // ...
  
} // var updated here

assert(var != 42);
						
* A buffer is associated with a type, range and dimensionality. Dimensionality must be either 1, 2 or 3. * Usually type and dimensionality can be inferred using CTAD. * If a buffer is associated with some allocation in host memory, the host memory will be updated only once the buffer goes out of scope.
#### Accessor class
![Accessor Types](../common-revealjs/images/accessor-types.png "Accessor Types")
#### Accessor class
* There are many different ways to use the `accessor` class. * Accessing data on a device. * Accessing data immediately in the host application. * Allocating local memory. * For now we are going to focus on accessing data on a device.
#### Constructing an accessor
auto acc = sycl::accessor{bufA, cgh};
						
* There are many ways to construct an `accessor`. * The `accessor` class supports CTAD so it's not necessary to specify all of the template arguments. * The most common way to construct an `accessor` is from a `buffer` and a `handler` associated with the command group function you are within. * The element type and dimensionality are inferred from the `buffer`. * The `access_mode` is defaulted to `access_mode::read_write`.
#### Specifying the access mode
auto readAcc = sycl::accessor{bufA, cgh, sycl::read_only};
auto writeAcc = sycl::accessor{bufB, cgh, sycl::write_only};
						
* When constructing an `accessor` you will likely also want to specify the `access_mode` * You can do this by passing one of the CTAD tags: * `read_only` will result in `access_mode::read`. * `write_only` will result in `access_mode::write`.
#### Specifying no initialization
auto acc = sycl::accessor{buf, cgh, sycl::no_init};
						
* When constructing an `accessor` you may also want to discard the original data of a `buffer`. * You can do this by passing the `no_init` property.
#### Using Accessors
T var = 42;

{
  // Create buffer pointing to var.
  auto bufA = sycl::buffer{&var, sycl::range<1>{1}};
  auto bufB = sycl::buffer{&var, sycl::range<1>{1}};

  q.submit([&](sycl::handler &cgh) {
	auto accA = sycl::accessor{bufA, cgh, sycl::read_only};
	auto accB = sycl::accessor{bufA, cgh, sycl::no_init};

  cgh.single_task<mykernel>(...); // Do some work
  });
  
} // var updated here

assert(var != 42);
						
* Buffers and accessors take care of memory migration, as well as dependency analysis. * More to come later on dependencies.
#### operator[]
gpuQueue.submit([&](handler &cgh){
  auto inA = sycl::accessor{bufA, cgh, sycl::read_only};
  auto inB = sycl::accessor{bufB, cgh, sycl::read_only};
  auto out = sycl::accessor{bufO, cgh, sycl::write_only};
  cgh.single_task<mykernel>([=]{
    out[0] = inA[0] + inB[0];
  }); 
});
						
* As well as specifying data dependencies an `accessor` can also be used to access the data from within a kernel function. * You can do this by calling `operator[]` on the `accessor`. * `operator[]` for USM pointers must take a `size_t`, whereas `operator[]` for accessors can take a multi-dimensional `sycl::id` or a `size_t`.
## Questions
#### Exercise
Code_Exercises/Managing_Data/source
Implement a SYCL application that adds two variables and returns the result using:
1. The USM memory model 2. The buffer/accessor memory model.