Search CORE

6,572 research outputs found

Comprehensive analysis of high-performance computing methods for filtered back-projection

Author: Boulanger Pierre
Eliuk Steven
Mendl Christian B.
Noga Michelle
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2013
Field of study

This paper provides an extensive runtime, accuracy, and noise analysis of Computed To-mography (CT) reconstruction algorithms using various High-Performance Computing (HPC) frameworks such as: "conventional" multi-core, multi threaded CPUs, Compute Unified Device Architecture (CUDA), and DirectX or OpenGL graphics pipeline programming. The proposed algorithms exploit various built-in hardwired features of GPUs such as rasterization and texture filtering. We compare implementations of the Filtered Back-Projection (FBP) algorithm with fan-beam geometry for all frameworks. The accuracy of the reconstruction is validated using an ACR-accredited phantom, with the raw attenuation data acquired by a clinical CT scanner. Our analysis shows that a single GPU can run a FBP reconstruction 23 time faster than a 64-core multi-threaded CPU machine for an image of 1024 X 1024. Moreover, directly programming the graphics pipeline using DirectX or OpenGL can further increases the performance compared to a CUDA implementation

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Revistes Catalanes amb Accés Obert

Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)

Diposit Digital de Documents de la UAB

Recommended from our members

Bridging the gap between mobile CPU design and user satisfaction via crowdsourcing

Author: Halpern Matthew Franklin
Publication venue
Publication date: 14/10/2016
Field of study

This report aims to provide an understanding of how the mobile CPU designs have evolved and its influence on end-user satisfaction. To that end, a quantitative performance analysis is conducted across ten cutting-edge mobile CPU designs studied within top-selling off-the-shelf smartphones released over the past seven years. This analysis is then used to guide a large-scale user study spanning over 25,000 participants via crowdsourcing on the Amazon Mechanical Turk service. The user study asks participants to assess the responsiveness of interactive application use cases for a set of current-generation applications (e.g. Angry Birds and FaceBook) and next-generation applications (i.e. face recognition and augmented reality) relative to the performance capabilities of the devices studied. This framework allows us to quantitatively link how the mobile CPU designs studied impacted end-user satisfaction. The study results indicate that mobile CPU designs have exhibited signifiant performance improvements through aggressive core scaling techniques prevalent in desktop CPUs. Just as was observed in desktop CPU design, these same techniques have lead to excessive mobile CPU power consumption. However, from an end-user perspective this power consumption was not without success. Mobile CPUs have evolved to provide satisfactory experiences for the studied current- generation applications. The reason is that many of these applications rely heavily on single-threaded performance. Other, more recent applications, actually multi-thread user-critical parts of the applications, which also demonstrates that multi- core mobile CPUs are an important design consideration – contrary to conventional wisdom. However, looking ahead, the same mobile CPUs where not able to provide satisfactory experiences for many of the next-generation applications studied, questioning the sustainability of these power-hungry design techniques in future mobile CPU designs.Electrical and Computer Engineerin

Texas ScholarWorks

Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes

Author: Alienin Oleg
Gordienko Yuri
Kochura Yuriy
Novotarskiy Michail
Rojbi Anis
Stirenko Sergii
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/06/2017
Field of study

arXiv.org e-Print Archive

Crossref

Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes

Author: Alienin Oleg
Gordienko Yuri
Kochura Yuriy
Novotarskiy Michail
Stirenko Sergii
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/08/2017
Field of study

The basic features of some of the most versatile and popular open source frameworks for machine learning (TensorFlow, Deep Learning4j, and H2O) are considered and compared. Their comparative analysis was performed and conclusions were made as to the advantages and disadvantages of these platforms. The performance tests for the de facto standard MNIST data set were carried out on H2O framework for deep learning algorithms designed for CPU and GPU platforms for single-threaded and multithreaded modes of operation Also, we present the results of testing neural networks architectures on H2O platform for various activation functions, stopping metrics, and other parameters of machine learning algorithm. It was demonstrated for the use case of MNIST database of handwritten digits in single-threaded mode that blind selection of these parameters can hugely increase (by 2-3 orders) the runtime without the significant increase of precision. This result can have crucial influence for optimization of available and new machine learning methods, especially for image recognition problems.Comment: 15 pages, 11 figures, 4 tables; this paper summarizes the activities which were started recently and described shortly in the previous conference presentations arXiv:1706.02248 and arXiv:1707.04940; it is accepted for Springer book series "Advances in Intelligent Systems and Computing

arXiv.org e-Print Archive

Crossref

Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

Author: Chen Langshi
Jiang Lei
Qiu Judy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2019
Field of study

Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging performance when processing graphs, so that it can become a promising solution to accelerating multi-threaded graph applications. We further characterize the impact of KNL architectural enhancements on the performance of a state-of-the art graph framework.We have four key observations: 1 Different graph applications require distinctive numbers of threads to reach the peak performance. For the same application, various datasets need even different numbers of threads to achieve the best performance. 2 Only a few graph applications benefit from the high bandwidth MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units executing AVX512 SIMD instructions on KNLs are underutilized when running the state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering the lowest local memory access latency hurts the performance of graph benchmarks that are lack of NUMA awareness. At last, We suggest future works including system auto-tuning tools and graph framework optimizations to fully exploit the potential of KNL for parallel graph processing.Comment: published as L. Jiang, L. Chen and J. Qiu, "Performance Characterization of Multi-threaded Graph Processing Applications on Many-Integrated-Core Architecture," 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Belfast, United Kingdom, 2018, pp. 199-20

arXiv.org e-Print Archive

IUScholarWorks Open

Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

Author: Cheng Yinhe
Ogasawara Takeshi
Tzeng Tzy-Hwa Kathy
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 05/08/2016
Field of study

This paper introduces a high-throughput software tool framework called {\it sam2bam} that enables users to significantly speedup pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156-186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize the multiple processors, available memory, high-bandwidth of storage, and hardware compression accelerators if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting the input data are provided by {\it plug-in} tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of NGS data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime for whole-genome sequencing data from about 20 hours to about nine minutes on the same system using up to 711 GB of memory

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Acceleration of a Full-scale Industrial CFD Application with OP2

Author: Bertolli C
Betts A
Giles MB
Kelly PHJ
Mudalige GR
Radford D
Reguly IZ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/06/2015
Field of study

Spiral - Imperial College Digital Repository

Energy Efficiency of Software Transactional Memory in a Heterogeneous Architecture

Author: Asenjo-Plaza Rafael
Navarro Angeles
Plata-Gonzalez Oscar Guillermo
Ukidave Yash
Villegas Alejandro
Villegas Emilio
Publication venue
Publication date: 07/09/2016
Field of study

Hardware vendors make an important effort creating low-power CPUs that keep battery duration and durability above acceptable levels. In order to achieve this goal and provide good performance-energy for a wide variety of applications, ARM designed the big.LITTLE architecture. This heterogeneous multi-core architecture features two different types of cores: big cores oriented to performance and little cores, slower and aimed to save energy consumption. As all the cores have access to the same memory, multi-threaded applications must resort to some mutual exclusion mechanism to coordinate the access to shared data by the concurrent threads. Transactional Memory (TM) represents an optimistic approach for shared-memory synchronization. To take full advantage of the features offered by software TM, but also benefit from the characteristics of the heterogeneous big.LITTLE architectures, our focus is to propose TM solutions that take into account the power/performance requirements of the application and what it is offered by the architecture. In order to understand the current state-of-the-art and obtain useful information for future power-aware software TM solutions, we have performed an analysis of a popular TM library running on top of an ARM big.LITTLE processor. Experiments show, in general, better scalability for the LITTLE cores for most of the applications except for one, which requires the computing performance that the big cores offer.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Repositorio Institucional Universidad de Málaga