Tools for improving performance portability in heterogeneous environments

  1. Fernández Fabeiro, Jorge
Supervised by:
  1. Basilio B. Fraguela Co-director
  2. Diego Andrade Co-director

Defence university: Universidade da Coruña

Fecha de defensa: 13 July 2017

Committee:
  1. Bruno Hubert Raffin Chair
  2. María J. Martín Secretary
  3. Arturo González Escribano Committee member

Type: Thesis

Teseo: 490373 DIALNET lock_openRUC editor

Abstract

Parallel computing is currently partially dominated by the availability of heterogeneous devices. These devices differ from each other in aspects such as the instruction set they execute, the number and the type of computing devices that they offer or the structure of their memory systems. In the last years, langnages, libraries and extensions have appeared to allow to write a parallel code once aud run it in a wide variety of devices, OpenCL being the most widespread solution of this kind. However, functional portability does not imply performance portability. This way, one of the probletns that is still open in this field is to achieve automatic performance portability. That is, the ability to automatically tune a given code for any device where it will be execnted so that it ill obtain a good performance. This thesis develops three different solutions to tackle this problem. The three of them are based on typical source-to-sonrce optimizations for heterogeneous devices. Both the set of optimizations to apply and the way they are applied depend on different optimization parameters, whose values have to be tuned for each specific device. The first solution is OCLoptimizer, a source-to-source optimizer that can optimize annotated OpenCL kemels with the help of configuration files that guide the optimization process. The tool optimizes kernels for a specific device, and it is also able to automate the generation of functional host codes when only a single kernel is optimized. The two remaining solutions are built on top of the Heterogeneous Programming Library (HPL), a C++ framework that provides an easy and portable way to exploit heterogeneous computing systexns. The first of these solutions uses the run-time code generation capabilities of HPL to generate a self-optimizing version of a matrix multiplication that can optimize itself at run-time for an spedfic device. The last solutíon is the development of a built-in just-in-time optirnizer for HPL, that can optirnize, at run-tirne, a HPL code for an specific device. While the first two solutions use search processes to find the best values for the optimization parameters, this Iast alternative relies on heuristics bMed on general optirnization strategies.