DIEGO RAFAEL
LLANOS FERRARIS
CATEDRATICOS DE UNIVERSIDAD
YURI
TORRES DE LA SIERRA
PROFESOR CONTRATADO DOCTOR
Publicacións nas que colabora con YURI TORRES DE LA SIERRA (19)
2024
-
Performance improvement of the triangular matrix product in commodity clusters
Journal of Supercomputing, Vol. 80, Núm. 11, pp. 16630-16653
-
The Role of Field-Programmable Gate Arrays in the Acceleration of Modern High-Performance Computing Workloads
Computer, Vol. 57, Núm. 7, pp. 66-76
2023
-
EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs
Journal of Supercomputing, Vol. 79, Núm. 9, pp. 9409-9442
-
Mappings and patterns to improve the triangular matrix product on distributed systems
Proceedings - IEEE International Conference on Cluster Computing, ICCC
-
Supporting efficient overlapping of host-device operations for heterogeneous programming with CtrlEvents
Journal of Parallel and Distributed Computing, Vol. 179
-
UVaFTLE: Lagrangian finite time Lyapunov exponent extraction for fluid dynamic applications
Journal of Supercomputing, Vol. 79, Núm. 9, pp. 9635-9665
2021
-
Operators for Data Redistribution: Applications to the STL Library and RayTracing Algorithm
IEEE Access, Vol. 9, pp. 38557-38570
2019
-
Mecanismo de equilibrado de carga en sistemas heterogéneos
Avances en Arquitectura y Tecnología de Computadores: Actas de Jornadas SARTECO, Cáceres, 18 a 20 de septiembre de 2019| (Servicio de Publicaciones), pp. 294-300
-
Transferencias de datos asíncronas y transparentes en plataformas heterogéneas
Avances en Arquitectura y Tecnología de Computadores: Actas de Jornadas SARTECO, Cáceres, 18 a 20 de septiembre de 2019| (Servicio de Publicaciones), pp. 284-293
2015
-
Comprehensive Evaluation of a New GPU-based Approach to the Shortest Path Problem
International Journal of Parallel Programming, Vol. 43, Núm. 5, pp. 918-938
-
TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities
International Journal of Parallel Programming, Vol. 43, Núm. 5, pp. 939-960
2014
-
An extensible system for multilevel automatic data partition and mapping
IEEE Transactions on Parallel and Distributed Systems, Vol. 25, Núm. 5, pp. 1145-1154
-
Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria
Journal of Supercomputing, Vol. 70, Núm. 2, pp. 786-798
-
The All-Pair Shortest-Path Problem in Shared-Memory Heterogeneous Systems
High-Performance Computing on Complex Environments (Wiley Blackwell), pp. 283-299
2013
-
A new GPU-based approach to the Shortest Path problem
Proceedings of the 2013 International Conference on High Performance Computing and Simulation, HPCS 2013
-
UBench: Exposing the impact of CUDA block geometry in terms of performance
Journal of Supercomputing
2012
-
Encapsulated synchronization and load-balance in heterogeneous programming
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
-
Using fermi architecture knowledge to speed up CUDA and OpenCL programs
Proceedings of the 2012 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2012
2011
-
Understanding the impact of CUDA tuning techniques for Fermi
Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS 2011