Search CORE

1,203 research outputs found

Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma

Author: Chang Cs
Choi Jong Y.
Churchill Michael
Klasky Scott
Sim Alex
Stathopoulos Andreas
Wu Kesheng
Wu Lingfei
Publication venue
Publication date: 01/07/2016
Field of study

A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC.Comment: 14 pages, 40 figure

arXiv.org e-Print Archive

eScholarship - University of California

A compiler extension for parallelizing arrays automatically on the cell heterogeneous processor

Author: Cockshott W.
Gdura Y.
Publication venue
Publication date: 11/01/2012
Field of study

This paper describes the approaches taken to extend an array programming language compiler using a Virtual SIMD Machine (VSM) model for parallelizing array operations on Cell Broadband Engine heterogeneous machine. This development is part of ongoing work at the University of Glasgow for developing array compilers that are beneficial for applications in many areas such as graphics, multimedia, image processing and scientific computation. Our extended compiler, which is built upon the VSM interface, eases the parallelization processes by allowing automatic parallelisation without the need for any annotations or process directives. The preliminary results demonstrate significant improvement especially on data-intensive applications

Enlighten

Accelerating SIFT on Parallel Architectures

Author: Apon Amy
Cothren Jackson
Emeneker Wesley
Warn Seth
Publication venue: Clemson University Libraries
Publication date: 01/09/2009
Field of study

SIFT is a widely-used algorithm that extracts features from images; using it to extract information from hundreds of terabytes of aerial and satellite photographs requires parallelization in order to be feasible. We explore accelerating an existing serial SIFT implementation with OpenMP parallelization and GPU execution

Clemson University: TigerPrints

Real-Time Hand Shape Classification

Author: A. Erol
D. Huttenlocher
J. Nalepa
J. Nalepa
J. Wachs
M. Kawulok
M. Papiez
M.K. Hu
P. Phillips
S. Belongie
T. Grzejszczak
Y. Shen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

The problem of hand shape classification is challenging since a hand is characterized by a large number of degrees of freedom. Numerous shape descriptors have been proposed and applied over the years to estimate and classify hand poses in reasonable time. In this paper we discuss our parallel framework for real-time hand shape classification applicable in real-time applications. We show how the number of gallery images influences the classification accuracy and execution time of the parallel algorithm. We present the speedup and efficiency analyses that prove the efficacy of the parallel implementation. Noteworthy, different methods can be used at each step of our parallel framework. Here, we combine the shape contexts with the appearance-based techniques to enhance the robustness of the algorithm and to increase the classification score. An extensive experimental study proves the superiority of the proposed approach over existing state-of-the-art methods.Comment: 11 page

arXiv.org e-Print Archive

Crossref

Accelerating Pattern Matching in Neuromorphic Text Recognition System Using Intel Xeon Phi Coprocessor

Author: Ahmed Khadeer
Publication venue: SURFACE at Syracuse University
Publication date: 30/05/2014
Field of study

Neuromorphic computing systems refer to the computing architecture inspired by the working mechanism of human brains. The rapidly reducing cost and increasing performance of state-of-the-art computing hardware allows large-scale implementation of machine intelligence models with neuromorphic architectures and opens the opportunity for new applications. One such computing hardware is Intel Xeon Phi coprocessor, which delivers over a TeraFLOP of computing power with 61 integrated processing cores. How to efficiently harness such computing power to achieve real time decision and cognition is one of the key design considerations. This work presents an optimized implementation of Brain-State-in-a-Box (BSB) neural network model on the Xeon Phi coprocessor for pattern matching in the context of intelligent text recognition of noisy document images. From a scalability standpoint on a High Performance Computing (HPC) platform we show that efficient workload partitioning and resource management can double the performance of this many-core architecture for neuromorphic applications

Syracuse University Research Facility and Collaborative Environment

Task-based Augmented Contour Trees with Fibonacci Heaps

Author: Fortin P.
Gueunet Charles
Jomier J
Tierny J
Publication venue
Publication date: 01/01/2019
Field of study

This paper presents a new algorithm for the fast, shared memory, multi-core computation of augmented contour trees on triangulations. In contrast to most existing parallel algorithms our technique computes augmented trees, enabling the full extent of contour tree based applications including data segmentation. Our approach completely revisits the traditional, sequential contour tree algorithm to re-formulate all the steps of the computation as a set of independent local tasks. This includes a new computation procedure based on Fibonacci heaps for the join and split trees, two intermediate data structures used to compute the contour tree, whose constructions are efficiently carried out concurrently thanks to the dynamic scheduling of task parallelism. We also introduce a new parallel algorithm for the combination of these two trees into the output global contour tree. Overall, this results in superior time performance in practice, both in sequential and in parallel thanks to the OpenMP task runtime. We report performance numbers that compare our approach to reference sequential and multi-threaded implementations for the computation of augmented merge and contour trees. These experiments demonstrate the run-time efficiency of our approach and its scalability on common workstations. We demonstrate the utility of our approach in data segmentation applications

arXiv.org e-Print Archive

Hal-Diderot

Improving Utility of GPU in Accelerating Industrial Applications with User-centred Automatic Code Translation

Author: Anvari-Moghaddam Amjad
Codreanu Valeriu
Dong Feng
Liu Baoquan
Min Geyong
Roerdink Jos B. T. M.
Williams David
Yang Po
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2018
Field of study

SMEs (Small and medium-sized enterprises), particularly those whose business is focused on developing innovative produces, are limited by a major bottleneck on the speed of computation in many applications. The recent developments in GPUs have been the marked increase in their versatility in many computational areas. But due to the lack of specialist GPU (Graphics processing units) programming skills, the explosion of GPU power has not been fully utilized in general SME applications by inexperienced users. Also, existing automatic CPU-to-GPU code translators are mainly designed for research purposes with poor user interface design and hard-to-use. Little attentions have been paid to the applicability, usability and learnability of these tools for normal users. In this paper, we present an online automated CPU-to-GPU source translation system, (GPSME) for inexperienced users to utilize GPU capability in accelerating general SME applications. This system designs and implements a directive programming model with new kernel generation scheme and memory management hierarchy to optimize its performance. A web-service based interface is designed for inexperienced users to easily and flexibly invoke the automatic resource translator. Our experiments with non-expert GPU users in 4 SMEs reflect that GPSME system can efficiently accelerate real-world applications with at least 4x and have a better applicability, usability and learnability than existing automatic CPU-to-GPU source translators

LJMU Research Online (Liverpool John Moores University)

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

VBN

White Rose Research Online

University of Bedfordshire Repository

Dissertations of the University of Groningen