YURI
TORRES DE LA SIERRA

PROFESOR CONTRATADO DOCTOR

Foto de YURI

Foto de DIEGO RAFAEL

DIEGO RAFAEL
LLANOS FERRARIS

CATEDRATICOS DE UNIVERSIDAD

Publikationen, an denen er mitarbeitet DIEGO RAFAEL LLANOS FERRARIS (19)

2024

Performance improvement of the triangular matrix product in commodity clusters
Journal of Supercomputing, Vol. 80, Núm. 11, pp. 16630-16653
The Role of Field-Programmable Gate Arrays in the Acceleration of Modern High-Performance Computing Workloads
Computer, Vol. 57, Núm. 7, pp. 66-76

2023

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
Journal of Supercomputing, Vol. 79, Núm. 9, pp. 9409-9442
Mappings and patterns to improve the triangular matrix product on distributed systems
Proceedings - IEEE International Conference on Cluster Computing, ICCC
Supporting efficient overlapping of host-device operations for heterogeneous programming with CtrlEvents
Journal of Parallel and Distributed Computing, Vol. 179
UVaFTLE: Lagrangian finite time Lyapunov exponent extraction for fluid dynamic applications
Journal of Supercomputing, Vol. 79, Núm. 9, pp. 9635-9665

2021

Operators for Data Redistribution: Applications to the STL Library and RayTracing Algorithm
IEEE Access, Vol. 9, pp. 38557-38570

2019

Mecanismo de equilibrado de carga en sistemas heterogéneos
Avances en Arquitectura y Tecnología de Computadores: Actas de Jornadas SARTECO, Cáceres, 18 a 20 de septiembre de 2019| (Servicio de Publicaciones), pp. 294-300
Transferencias de datos asíncronas y transparentes en plataformas heterogéneas
Avances en Arquitectura y Tecnología de Computadores: Actas de Jornadas SARTECO, Cáceres, 18 a 20 de septiembre de 2019| (Servicio de Publicaciones), pp. 284-293

2015

Comprehensive Evaluation of a New GPU-based Approach to the Shortest Path Problem
International Journal of Parallel Programming, Vol. 43, Núm. 5, pp. 918-938
TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities
International Journal of Parallel Programming, Vol. 43, Núm. 5, pp. 939-960

2014

An extensible system for multilevel automatic data partition and mapping
IEEE Transactions on Parallel and Distributed Systems, Vol. 25, Núm. 5, pp. 1145-1154
Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria
Journal of Supercomputing, Vol. 70, Núm. 2, pp. 786-798
The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems
High-Performance Computing on Complex Environments (Wiley Blackwell), pp. 283-299

2013

A new GPU-based approach to the Shortest Path problem
Proceedings of the 2013 International Conference on High Performance Computing and Simulation, HPCS 2013
UBench: Exposing the impact of CUDA block geometry in terms of performance
Journal of Supercomputing

2012

Encapsulated synchronization and load-balance in heterogeneous programming
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Using fermi architecture knowledge to speed up CUDA and OpenCL programs
Proceedings of the 2012 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2012

2011

Understanding the impact of CUDA tuning techniques for Fermi
Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011