Search CORE

7,516 research outputs found

Data-parallel intra decoding for block-based image and video coding on massively parallel architectures

Author: De Cock Jan
Hollemeersch Charles
Lambert Peter
Pieters Bart
Van de Walle Rik
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Exploiting graphic processing units parallelism to improve intelligent data acquisition system performance in JET's correlation reflectometer

Author: Arcas Castro Guillermo de
Barrera Lopez de Turiso Eduardo
Fonseca A.
López Navarro Juan Manuel
Murari Andrea
Nieto J.
Ruiz González Mariano
Vega Jesús
Publication venue: E.U.I.T. Telecomunicación (UPM)
Publication date: 01/04/2011
Field of study

The performance of intelligent data acquisition systems relies heavily on their processing capabilities and local bus bandwidth, especially in applications with high sample rates or high number of channels. This is the case of the self adaptive sampling rate data acquisition system installed as a pilot experiment in KG8B correlation reflectometer at JET. The system, which is based on the ITMS platform, continuously adapts the sample rate during the acquisition depending on the signal bandwidth. In order to do so it must transfer acquired data to a memory buffer in the host processor and run heavy computational algorithms for each data block. The processing capabilities of the host CPU and the bandwidth of the PXI bus limit the maximum sample rate that can be achieved, therefore limiting the maximum bandwidth of the phenomena that can be studied. Graphic processing units (GPU) are becoming an alternative for speeding up compute intensive kernels of scientific, imaging and simulation applications. However, integrating this technology into data acquisition systems is not a straight forward step, not to mention exploiting their parallelism efficiently. This paper discusses the use of GPUs with new high speed data bus interfaces to improve the performance of the self adaptive sampling rate data acquisition system installed on JET. Integration issues are discussed and performance evaluations are presente

Archivo Digital UPM

Recommended from our members

Preparing sparse solvers for exascale computing.

Author: Anzt Hartwig
Boman Erik
Curfman McInnes Lois
Falgout Rob
Ghysels Pieter
Heroux Michael
Li Xiaoye
Meier Yang Ulrike
Rajamanickam Sivasankaran
Rupp Karl
Smith Barry
Tran Mills Richard
Yamazaki Ichitaro
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

eScholarship - University of California

Dependency Parsing with Dilated Iterated Graph CNNs

Author: McCallum Andrew
Strubell Emma
Publication venue
Publication date: 01/01/2017
Field of study

Dependency parses are an effective way to inject linguistic knowledge into many downstream tasks, and many practitioners wish to efficiently parse sentences at scale. Recent advances in GPU hardware have enabled neural networks to achieve significant gains over the previous best models, these models still fail to leverage GPUs' capability for massive parallelism due to their requirement of sequential processing of the sentence. In response, we propose Dilated Iterated Graph Convolutional Neural Networks (DIG-CNNs) for graph-based dependency parsing, a graph convolutional architecture that allows for efficient end-to-end GPU parsing. In experiments on the English Penn TreeBank benchmark, we show that DIG-CNNs perform on par with some of the best neural network parsers.Comment: 2nd Workshop on Structured Prediction for Natural Language Processing (at EMNLP '17

arXiv.org e-Print Archive

Crossref

Efficient transfer entropy analysis of non-stationary neural time series

Author: Díaz-Pernas Francisco J.
Martínez-Zarzuela Mario
Vicente Raul
Wibral Michael
Wollstadt Patricia
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/07/2014
Field of study

Information theory allows us to investigate information processing in neural systems in terms of information transfer, storage and modification. Especially the measure of information transfer, transfer entropy, has seen a dramatic surge of interest in neuroscience. Estimating transfer entropy from two processes requires the observation of multiple realizations of these processes to estimate associated probability density functions. To obtain these observations, available estimators assume stationarity of processes to allow pooling of observations over time. This assumption however, is a major obstacle to the application of these estimators in neuroscience as observed processes are often non-stationary. As a solution, Gomez-Herrero and colleagues theoretically showed that the stationarity assumption may be avoided by estimating transfer entropy from an ensemble of realizations. Such an ensemble is often readily available in neuroscience experiments in the form of experimental trials. Thus, in this work we combine the ensemble method with a recently proposed transfer entropy estimator to make transfer entropy estimation applicable to non-stationary time series. We present an efficient implementation of the approach that deals with the increased computational demand of the ensemble method's practical application. In particular, we use a massively parallel implementation for a graphics processing unit to handle the computationally most heavy aspects of the ensemble method. We test the performance and robustness of our implementation on data from simulated stochastic processes and demonstrate the method's applicability to magnetoencephalographic data. While we mainly evaluate the proposed method for neuroscientific data, we expect it to be applicable in a variety of fields that are concerned with the analysis of information transfer in complex biological, social, and artificial systems.Comment: 27 pages, 7 figures, submitted to PLOS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

Fast and accurate object detection in high resolution 4K and 8K video using GPUs

Author: Franchetti Franz
Růžička Vít
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/10/2018
Field of study

Machine learning has celebrated a lot of achievements on computer vision tasks such as object detection, but the traditionally used models work with relatively low resolution images. The resolution of recording devices is gradually increasing and there is a rising need for new methods of processing high resolution data. We propose an attention pipeline method which uses two staged evaluation of each image or video frame under rough and refined resolution to limit the total number of necessary evaluations. For both stages, we make use of the fast object detection model YOLO v2. We have implemented our model in code, which distributes the work across GPUs. We maintain high accuracy while reaching the average performance of 3-6 fps on 4K video and 2 fps on 8K video.Comment: 6 pages, 12 figures, Best Paper Finalist at IEEE High Performance Extreme Computing Conference (HPEC) 2018; copyright 2018 IEEE; (DOI will be filled when known

arXiv.org e-Print Archive

Crossref