Search CORE

25 research outputs found

Algorithms for Preemptive Co-scheduling of Kernels on GPUs

Author: Bentes Cristiana
Eyraud-Dubois Lionel
Publication venue: HAL CCSD
Publication date: 16/12/2020
Field of study

International audienceModern GPUs allow concurrent kernel execution and preemption to improve hardware utilization and responsiveness. Currently, the decision on the simultaneous execution of kernels is performed by the hardware, which can lead to unreasonable use of resources. In this work, we tackle the problem of co-scheduling for GPUs in high competition scenarios. We propose a novel graphbased preemptive co-scheduling algorithm, with the focus on reducing the number of preemptions. We show that the optimal preemptive makespan can be computed by solving a Linear Program in polynomial time. Based on this solution we propose graph theoretical model and an algorithm to build preemptive schedules which minimizes the number of preemptions. We show, however, that finding the minimal amount of preemptions among all preemptive solutions of optimal makespan is a NP-hard problem. We performed experiments on real-world GPU applications and our approach can achieve optimal makespan by preempting 6 to 9% of the tasks

INRIA a CCSD electronic archive server

Independent tasks on 2 resources with co-scheduling effects

Author: Bentes Cristiana
Eyraud-Dubois Lionel
Publication venue: HAL CCSD
Publication date: 08/01/2020
Field of study

Concurrent kernel execution is a relatively new feature in modern GPUs, which was designed to improve hardware utilization and the overall system throughput. However, the decision on the simultaneous execution of tasks is performed by the hardware with a leftover policy, that assigns as many resources as possible for one task and then assigns the remaining resources to the next task. This can lead to unreasonable use of resources. In this work, we tackle the problem of co-scheduling for GPUs with and without preemption, with the focus on determining the kernels submission order to reduce the number of preemptions and the kernels makespan, respectively. We propose a graph-based theoretical model to build preemptive and non-preemptive schedules. We show that the optimal preemptive makespan can be computed by solving a Linear Program in polynomial time, and we propose an algorithm based on this solution which minimizes the number of preemptions. We also propose an algorithm that transforms a preemptive solution of optimal makespan into a non-preemptive solution with the smallest possible preemption overhead. We show, however, that finding the minimal amount of preemptions among all preemptive solutions of optimal makespan is a NP-hard problem, and computing the optimal non-preemptive schedule is also NP-hard. In addition, we study the non-preemptive problem, without searching first for a good preemptive solution, and present a Mixed Integer Linear Program solution to this problem. We performed experiments on real-world GPU applications and our approach can achieve optimal makespan by preempting 6 to 9% of the tasks. Our non-preemptive approach, on the other side, obtains makespan within 2.5% of the optimal preemptive schedules, while previous approaches exceed the preemptive makespan by 5 to 12%

INRIA a CCSD electronic archive server

UM ALGORITMO DE SEGMENTAÇÃO POR CRESCIMENTO DE REGIÕES PARA GPUS

Author: BENTES CRISTIANA
FARIAS RICARDO
FEITOSA RAUL QUEIROZ
HAPP PATRICK NIGRI
Publication venue: Bulletin of Geodetic Sciences
Publication date: 01/06/2013
Field of study

Este artigo propõe um algoritmo paralelo de segmentação de imagens por crescimento de região voltado a Unidades de Processamento Gráfico (GPU). O algoritmo proposto deriva de um algoritmo sequencial largamente utilizado pela comunidade de Análise de Imagens de sensoriamento remoto Baseada em Objeto Geográfico (GEOBIA). Relativamente à versão sequencial propõem-se neste trabalho novos atributos para caracterização de heterogeneidade morfológica de segmentos, cujo cálculo pode ser realizado de modo mais eficiente em GPUs. Duas variantes do algoritmo paralelo com diferentes heurísticas para seleção dos segmentos adjacentes a serem fundidos a cada iteração são descritas. Visando explorar o potencial de GPUs para execução paralela de threads de baixa granularidade, o algoritmo proposto atribui uma thread para cada pixel da imagem, o que contribui ao mesmo tempo para uma distribuição mais uniforme da carga computacional entre os processadores da GPU. Uma detalhada análise experimental utilizando uma GPU convencional sobre quatro imagens de teste indicou acelerações superiores a 8 em relação ao algoritmo sequencia

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteca Digital de Periódicos da UFPR (Universidade Federal do Paraná)

Independent tasks on 2 resources with co-scheduling effects

Author: Bentes Cristiana
Eyraud-Dubois Lionel
Publication venue: HAL CCSD
Publication date: 08/01/2020
Field of study

INRIA a CCSD electronic archive server

Abstract

Author: Andre Maximo
Cristiana Bentes
Ricardo Marroquim
Publication venue
Publication date
Field of study

Cardiovascular diseases are the the leading cause of death and disability in the world. Non-invasive tecniques are required to reduce the number of deaths as well as the patients quality of life. These techniques usually rely on 3D visualization of MRI or CT data. In this work we describe how improved volume rendering techniques, combined with graphics cards programming can provide interactive visualization of the heart internal structures. Our main focus here is to provide doctors with high-performance 3D images for evaluating patient heart anatomy and performance. Our idea is to take full advantage of the triangle-rendering hardware to provide interactive frame rates.

CiteSeerX

Memory Efficient and Robust Software Implementation of the Raycast Algorithm

Author: Bentes Cristiana
de Pina Aline A.
Farias Ricardo
Publication venue: Václav Skala - UNION Agency
Publication date: 01/01/2007
Field of study

In this paper we propose two novel software implementations of the ray-casting volume rendering algorithm for irregular grids, called ME-Raycast (Memory Efficient Ray-casting) and EME-Raycast (Enhanced Memory Efficient Ray-Casting). Our algorithms improve previous work by Bunyk et al [1] in terms of complete handling of degenerate cases, memory consumption, and type of cell allowed in the grid (tetrahedral and/or hexahedral). The use of a more compact and non-redundant data structure, allowed us to achieve higher memory efficiency. Our results show consistent and significant gains in the memory usage of ME-Raycast and EME-Raycast when compared to Bunyk et al implementation. Furthermore, our results also show that handling of degenerate cases generates accurate images, correctly rendering all the pixels in the image, while Bunyk et al implementation fails in rendering up to 38 pixels in the final image. When we compare our algorithms to other robust rendering algorithm, like ZSweep [2], we have considerable performance gains and competitive memory consumption. We conclude that ME-Raycast and EME-Raycast are efficient methods for ray-casting that allow in-core rendering of large datasets with no image errors

University of West Bohemia Digital Library

DSpace at University of West Bohemia

Electronic Auction with autonomous intelligent agents: finding opportunities by being there

Author: Bentes Cristiana
Bicharra Garcia Ana Cristina
Lopes Anderson
Publication venue: Asociación Española para la Inteligencia Artificial (AEPIA)
Publication date: 01/01/2001
Field of study

The overwhelming options conveyed by Internet exaggerated growth bring new issues for users engaged in buying and/or selling goods using the net as the business medium. Goods and services can be exchanged, directly sold or negotiated in auctions. In any of these situations, finding the required product by the right price is the big challenge for Internet users. Especially in e-auction, timing and strategic actions are vital to a successful deal. In this paper, we propose a model for e-auction based on intelligent agents technology. The use of agents make possible to reflect better what happens in real auctions. Agents act together with buyers, sellers and auctioneers to assist them obtaining the best deal or at least finding Nash equilibrium point

Secretaría de Estado de Cultura

Memory-Aware and Efficient Ray-Casting Algorithm

Author: Antônio Oliveira
Cristiana Bentes
Ricardo Farias
Saulo Ribeiro
Publication venue
Publication date
Field of study

Ray-casting implementations require that the connectivity between the cells of the dataset to be explicitly computed and kept in memory. This constitutes a huge obstacle for obtaining real-time rendering for very large models. In this paper, we address this problem by introducing a new implementation of the ray-casting algorithm for irregular datasets. Our implementation optimizes the memory usage of past implementations by exploring ray coherence. The idea is to keep in main memory the information of the faces traversed by the ray cast through every pixel under the projection of a visible face. Our results show that exploring pixel coherence reduces considerably the memory usage, while keeping the performance of our algorithm competitive with the fastest previous ones. 1

CiteSeerX

Performance Analysis and Optimization of the Vector-Kronecker Product Multiplication

Author: Azevedo Alexandre
Bentes Cristiana
Clicia Castro Maria
Tadonki Claude
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/09/2020
Field of study

International audienceThe Kronecker product, also called tensor product, is a fundamental matrix algebra operation, used to model complex systems using structured descriptions. This operation needs to be computed efficiently, since it is a critical kernel for iterative algorithms. In this work, we focus on the vector-kronecker product operation, where we present an in-depth performance analysis of a sequential and a parallel algorithm previously proposed. Based on this analysis, we proposed three optimizations: changing the memory access pattern, reducing load imbalance and manually vectorizing some portions of the code with Intel SSE4.2 intrinsics. The obtained results show better cache usage and load balance, thus improving the performance, especially for larger matrices

Crossref

HAL Descartes

HAL-MINES ParisTech