Search CORE

14 research outputs found

CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs

Author: AlMasri Mohammad
Chung I-hsin
Gacek Andrew
Garland Michael
Gelado Isaac
Hwu Wen-mei
Mailthody Vikram
Newburn Chris
Park Jeongmin
Qureshi Zaid
Sakharnykh Nikolay
Shao Shunfan
Xiong Jinjun
Publication venue
Publication date: 07/07/2023
Field of study

Data compression and decompression have become vital components of big-data applications to manage the exponential growth in the amount of data collected and stored. Furthermore, big-data applications have increasingly adopted GPUs due to their high compute throughput and memory bandwidth. Prior works presume that decompression is memory-bound and have dedicated most of the GPU's threads to data movement and adopted complex software techniques to hide memory latency for reading compressed data and writing uncompressed data. This paper shows that these techniques lead to poor GPU resource utilization as most threads end up waiting for the few decoding threads, exposing compute and synchronization latencies. Based on this observation, we propose CODAG, a novel and simple kernel architecture for high throughput decompression on GPUs. CODAG eliminates the use of specialized groups of threads, frees up compute resources to increase the number of parallel decompression streams, and leverages the ample compute activities and the GPU's hardware scheduler to tolerate synchronization, compute, and memory latencies. Furthermore, CODAG provides a framework for users to easily incorporate new decompression algorithms without being burdened with implementing complex optimizations to hide memory latency. We validate our proposed architecture with three different encoding techniques, RLE v1, RLE v2, and Deflate, and a wide range of large datasets from different domains. We show that CODAG provides 13.46x, 5.69x, and 1.18x speed up for RLE v1, RLE v2, and Deflate, respectively, when compared to the state-of-the-art decompressors from NVIDIA RAPIDS

arXiv.org e-Print Archive

Predictive runtime code scheduling for heterogeneous architectures

Author: A.S. Tanenbaum
A.S. Tanenbaum
D.J.C. Mackay
G. Fursin
G.C. Sih
H. Oh
H. Topcuoglu
H.A. Gabb
H.S. Stone
I. Gelado
M. Frigo
M. Maheswaran
P. Bellens
R.C. Whaley
R.M. Badia
S. Ryoo
Publication venue
Publication date: 01/01/2009
Field of study

Heterogeneous architectures are currently widespread. With the advent of easy-to-program general purpose GPUs, virtually every re- cent desktop computer is a heterogeneous system. Combining the CPU and the GPU brings great amounts of processing power. However, such architectures are often used in a restricted way for domain-speci c appli- cations like scienti c applications and games, and they tend to be used by a single application at a time. We envision future heterogeneous com- puting systems where all their heterogeneous resources are continuously utilized by di erent applications with versioned critical parts to be able to better adapt their behavior and improve execution time, power con- sumption, response time and other constraints at runtime. Under such a model, adaptive scheduling becomes a critical component. In this paper, we propose a novel predictive user-level scheduler based on past performance history for heterogeneous systems. We developed sev- eral scheduling policies and present the study of their impact on system performance. We demonstrate that such scheduler allows multiple appli- cations to fully utilize all available processing resources in CPU/GPU- like systems and consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.Postprint (published version

HAL-CentraleSupelec

Crossref

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Adaptive Line Size Cache for Irregular References on Cell Multicore Processor

Author: A.J. Eichenberger
A.V. Veidenbaum
C. Dubnicki
C.H. Crawford
G. Marc
I. Gelado
J. Balart
J. Lee
M. Ren
N. Vujić
P. Wang
Q. Cao
S. Schneider
S. Seo
T. Chen
T. Chen
V.J. Jimenez
Z. Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Understanding Co-run Degradations on Integrated Heterogeneous Processors

Author: C Ding
D Grewe
D Grewe
EP Markatos
EZ Zhang
I Gelado
J Fousek
MS Squillante
Y Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

HostoSink: A Collaborative Scheduling in Heterogeneous Environment

Author: C. Augonnet
D.L. Michael
E.S. John
H. Alexander
I. Gelado
J. Diaz
M. Marjan
S. Qin
S.V. Jeffrey
T. Givargis
Y. Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

CUDA-For-Clusters: A System for Efficient Execution of CUDA Kernels on Multi-Core Clusters

Author: A. Danalis
B. Chamberlain
C. Amza
F. Cappello
G.F. Diamos
I. Gelado
J. Gummaraju
J.A. Stratton
K. Li
M. Snir
N.P. Manoj
P. Charles
S.V. Adve
Publication venue
Publication date: 01/01/2012
Field of study

Abstract. Rapid advancements in multi-core processor architectures along with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs to efficiently utilize all the resources in such a cluster is still a major challenge. Programmers have to manually deal with low-level details that should ideally be the responsibility of an intelligent compiler or a run-time layer. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing number of CUDA developers and benchmarks along with the stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation

CiteSeerX

Crossref

Open Access Repository of IISc Research Publications

Software-Controlled Priority Characterization of POWER5 Processor

Author: Alper Buyuktosunoglu
Carlos Boneti
Chen-Yong Cher
Francisco J. Cazorla
Gelado I.
Gibbs B.
Kalla R.
Knijnenburg P. M. W.
Luo K.
Mateo Valero
Roberto Gioiosa
Sinharoy B.
Tullsen D.
Vera J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Performance of partial reconfiguration in FPGA systems

Author: Apostolos Dollas
Claus C.
Claus C.
Delahaye J.-P.
Fong R. J.
Gelado I.
Griese B.
Kyprianos Papadimitriou
Liu M.
Lysaght P.
McKay N.
Noguera J.
Papadimitriou K.
Santambrogio M. D.
Scott Hauck
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

The Role of the Spanish Institute of Oceanography (IEO) in El Hierro Volcanic Crisis: Oceanographic Observations

Author: Alonso-González I.
Armas D. (Demetrio) de
Arístegui J.
Benítez-Barrios V.M. (Verónica María)
Blanco M.J.
Eugenio F.
Fraile-Nuez E. (Eugenio)
Gelado-Caballero M.D.
González-Dávila M.
Hernández-León S.
Llinás O.
Pérez-Contreras M.N. (María Nélida)
Rodríguez-Santana A.
Santana-Casiano J.M.
Publication venue: Centro Oceanográfico de Canarias
Publication date: 01/01/2012
Field of study

Digital.CSIC

Repositorio Institucional Digital del IEO

Assessment of two complementary influenza surveillance systems : Sentinel primary care influenza-like illness versus severe hospitalized laboratory-confirmed influenza using the moving epidemic method

Author: Aizpurua J.
Aizpurua P.
Alonso J.
Alvarez A.
Alvarez P.
Andrés Santamaría Marta
Antón Andrés
Ardaya P.M.
Azemar J.
Bach Foradada Pilar
Barcenilla Fernando
Barrabeig Irene
Basas M.D.
Basile Luca
Batalla J.
Biendicho P.
Bonet M.
Callado M.
Campins Martí Magda
Campos S.
Camps Neus
Carol Mónica
Casanovas J. M.
Ciurana E.
Clapes M.
Cots J.M.
De La Rica D.
De Molina P.
Domingo I.
Domínguez Ángela
Elizalde G.
Escapa P.
Espejo Elena
Fajardo S.
Fau E.
Fernandez M.
Fernandez O.
Ferrer C.
Ferràs Joaquim
Ferrús Glòria
Follia Núria
Forcada A.
Fos E.
Gadea G.
Garcia J.
Garcia R.
Garcia-Pardo Graciano
Gatius C.
Gelado M.J.
Godoy Pere
Gorrindo Pilar
Grau M.
Grivé M.
Guzman M.C.
Hernández R.
Hernández R.
Isanta Ricard
Jané i Checa Mireia
Jimenez G.
Juscafresa A.
Kristensen Line
Llussa A.M.
López C.
MacIà E.
Mainou A.
Marce R.
Marco E.
Marcos María Ángeles
Martínez Ana
Martínez J.G.
Martínez M.
Marulanda K.V.
Masa R.
Massuet C.
Mena Pinilla Guillermo
Minguell Sofía
Moncosí X.
Mosquera Maria del Mar
Mòdol Josep Maria
Naranjo M.A.
Navarro Rubio Gema
Navarro D.
Oller A.
Olona Montserrat
Ortolà E.
París F.
Perez Rafael
Plasencia Elsa
Pou J.
Pozo C.
Pujol R.
Pumarola Suñé Tomàs
Pérez M.M.
Ramon Juan Miguel
Rebull Fatsini Josep
Ribatallada A.
Riera Garcia Montserrat
Rius i Gibert Maria Cristina
Rubio E.
Ruiz G.
Sabaté S.
Sala-Farré Maria Rosa
Sanchez R.
Sarrà N.
Tarragó E.
Teixidó A.M.
Tora Grzegorz
Torner Nuria
Torra Bastardas Roser
Torres A.
Universitat Autònoma de Barcelona
Valén E.
Van Esso D.
Van Tarjcwick C.
Vilella Anna
Vink Schoenholzer R.
Zabala Eduardo
Publication venue
Publication date: 01/01/2019
Field of study

Monitoring seasonal influenza epidemics is the corner stone to epidemiological surveillance of acute respiratory virus infections worldwide. This work aims to compare two sentinel surveillance systems within the Daily Acute Respiratory Infection Information System of Catalonia (PIDIRAC), the primary care ILI and Influenza confirmed samples from primary care (PIDIRAC-ILI and PIDIRAC-FLU) and the severe hospitalized laboratory confirmed influenza system (SHLCI), in regard to how they behave in the forecasting of epidemic onset and severity allowing for healthcare preparedness. Epidemiological study carried out during seven influenza seasons (2010-2017) in Catalonia, with data from influenza sentinel surveillance of primary care physicians reporting ILI along with laboratory confirmation of influenza from systematic sampling of ILI cases and 12 hospitals that provided data on severe hospitalized cases with laboratory-confirmed influenza (SHLCI-FLU). Epidemic thresholds for ILI and SHLCI-FLU (overall) as well as influenza A (SHLCI-FLUA) and influenza B (SHLCI-FLUB) incidence rates were assessed by the Moving Epidemics Method. Epidemic thresholds for primary care sentinel surveillance influenza-like illness (PIDIRAC-ILI) incidence rates ranged from 83.65 to 503.92 per 100.000 h. Paired incidence rate curves for SHLCI-FLU/PIDIRAC-ILI and SHLCI-FLUA/PIDIRAC-FLUA showed best correlation index' (0.805 and 0.724 respectively). Assessing delay in reaching epidemic level, PIDIRAC-ILI source forecasts an average of 1.6 weeks before the rest of sources paired. Differences are higher when SHLCI cases are paired to PIDIRAC-ILI and PIDIRAC-FLUB although statistical significance was observed only for SHLCI-FLU/PIDIRAC-ILI (p-value Wilcoxon test = 0.039). The combined ILI and confirmed influenza from primary care along with the severe hospitalized laboratory confirmed influenza data from PIDIRAC sentinel surveillance system provides timely and accurate syndromic and virological surveillance of influenza from the community level to hospitalization of severe cases

Diposit Digital de Documents de la UAB