32 research outputs found
Parallel Image Processing Using a Pure Topological Framework
Image processing is a fundamental operation
in many real time applications, where lots of parallelism
can be extracted. Segmenting the image into different
connected components is the most known operations, but
there are many others like extracting the region adjacency
graph (RAG) of these regions, or searching for features
points, being invariant to rotations, scales, brilliant
changes, etc. Most of these algorithms part from the basis
of Tracing-type approaches or scan/raster methods. This
fact necessarily implies a data dependence between the
processing of one pixel and the previous one, which
prevents using a pure parallel approach. In terms of time
complexity, this means that linear order O(N) (N being the
number of pixels) cannot be cut down. In this paper, we
describe a novel approach based on the building of a pure
Topological framework, which allows to implement fully
parallel algorithms. Concerning topological analysis, a first
stage is computed in parallel for every pixel, thus
conveying the local neighboring conditions. Then, they are
extended in a second parallel stage to the necessary global
relations (e.g. to join all the pixels of a connected
component). This combinatorial optimization process can
be seen as the compression of the whole image to just one
pixel. Using this final representation, every region can be
related with the rest, which yields to pure topological
construction of other image operations. Besides, complex
data structures can be avoided: all the processing can be
done using matrixes (with the same indexation as the
original image) and element-wise operations. The time
complexity order of our topological approach for a mĂ—n
pixel image is near O(log(m+n)), under the assumption that
a processing element exists for each pixel. Results for a
multicore processor show very good scalability until the
memory bandwidth bottleneck is reached, both for bigger
images and for much optimized implementations. The
inherent parallelism of our approach points to the
direction that even better results will be obtained in other
less classical computing architectures.1Ministerio de EconomĂa y Competitividad (España) TEC2012-37868-C04-02AEI/FEDER (UE) MTM2016-81030-PVPPI of the University of Sevill
GPU-Accelerated Computation of Vietoris-Rips Persistence Barcodes
The computation of Vietoris-Rips persistence barcodes is both
execution-intensive and memory-intensive. In this paper, we study the
computational structure of Vietoris-Rips persistence barcodes, and identify
several unique mathematical properties and algorithmic opportunities with
connections to the GPU. Mathematically and empirically, we look into the
properties of apparent pairs, which are independently identifiable persistence
pairs comprising up to 99% of persistence pairs. We give theoretical upper and
lower bounds of the apparent pair rate and model the average case. We also
design massively parallel algorithms to take advantage of the very large number
of simplices that can be processed independently of each other. Having
identified these opportunities, we develop a GPU-accelerated software for
computing Vietoris-Rips persistence barcodes, called Ripser++. The software
achieves up to 30x speedup over the total execution time of the original Ripser
and also reduces CPU-memory usage by up to 2.0x. We believe our
GPU-acceleration based efforts open a new chapter for the advancement of
topological data analysis in the post-Moore's Law era.Comment: 36 pages, 15 figures. To be published in Symposium on Computational
Geometry 202
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
Tuning the Computational Effort: An Adaptive Accuracy-aware Approach Across System Layers
This thesis introduces a novel methodology to realize accuracy-aware systems, which will help designers integrate accuracy awareness into their systems. It proposes an adaptive accuracy-aware approach across system layers that addresses current challenges in that domain, combining and tuning accuracy-aware methods on different system layers. To widen the scope of accuracy-aware computing including approximate computing for other domains, this thesis presents innovative accuracy-aware methods and techniques for different system layers.
The required tuning of the accuracy-aware methods is integrated into a configuration layer that tunes the available knobs of the accuracy-aware methods integrated into a system
High-Performance Modelling and Simulation for Big Data Applications
This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
Proceedings, MSVSCC 2015
The Virginia Modeling, Analysis and Simulation Center (VMASC) of Old Dominion University hosted the 2015 Modeling, Simulation, & Visualization Student capstone Conference on April 16th. The Capstone Conference features students in Modeling and Simulation, undergraduates and graduate degree programs, and fields from many colleges and/or universities. Students present their research to an audience of fellow students, faculty, judges, and other distinguished guests. For the students, these presentations afford them the opportunity to impart their innovative research to members of the M&S community from academic, industry, and government backgrounds. Also participating in the conference are faculty and judges who have volunteered their time to impart direct support to their students’ research, facilitate the various conference tracks, serve as judges for each of the tracks, and provide overall assistance to this conference. 2015 marks the ninth year of the VMASC Capstone Conference for Modeling, Simulation and Visualization. This year our conference attracted a number of fine student written papers and presentations, resulting in a total of 51 research works that were presented. This year’s conference had record attendance thanks to the support from the various different departments at Old Dominion University, other local Universities, and the United States Military Academy, at West Point. We greatly appreciated all of the work and energy that has gone into this year’s conference, it truly was a highly collaborative effort that has resulted in a very successful symposium for the M&S community and all of those involved. Below you will find a brief summary of the best papers and best presentations with some simple statistics of the overall conference contribution. Followed by that is a table of contents that breaks down by conference track category with a copy of each included body of work. Thank you again for your time and your contribution as this conference is designed to continuously evolve and adapt to better suit the authors and M&S supporters.
Dr.Yuzhong Shen Graduate Program Director, MSVE Capstone Conference Chair
John ShullGraduate Student, MSVE Capstone Conference Student Chai
Accelerating Genomic Sequence Alignment using High Performance Reconfigurable Computers
Recongurable computing technology has progressed to a stage where it is now possible to achieve orders of magnitude performance and power eciency gains over conventional computer architectures for a subset of high performance computing applications. In this thesis, we investigate the potential of recongurable computers to accelerate genomic sequence alignment specically for genome sequencing applications.
We present a highly optimized implementation of a parallel sequence alignment algorithm for the Berkeley Emulation Engine (BEE2) recongurable computer, allowing a single BEE2 to align simultaneously hundreds of sequences. For each recongurable processor (FPGA), we demonstrate a 61X speedup versus a state-of-the-art implementation on a modern conventional CPU core, and a 56X improvement in performance-per-Watt. We also show that our implementation is highly scalable and we provide performance results from a cluster implementation using 32 FPGAs.
We conclude that reconfigurable computers provide an excellent platform on which to run sequence alignment, and that clusters of recongurable computers will be able to cope far more easily with the vast quantities of data produced by new ultra-high-throughput sequencers
Accelerating Genomic Sequence Alignment using High Performance Reconfigurable Computers
Recongurable computing technology has progressed to a stage where it is now possible to achieve orders of magnitude performance and power eciency gains over conventional computer architectures for a subset of high performance computing applications. In this thesis, we investigate the potential of recongurable computers to accelerate genomic sequence alignment specically for genome sequencing applications.
We present a highly optimized implementation of a parallel sequence alignment algorithm for the Berkeley Emulation Engine (BEE2) recongurable computer, allowing a single BEE2 to align simultaneously hundreds of sequences. For each recongurable processor (FPGA), we demonstrate a 61X speedup versus a state-of-the-art implementation on a modern conventional CPU core, and a 56X improvement in performance-per-Watt. We also show that our implementation is highly scalable and we provide performance results from a cluster implementation using 32 FPGAs.
We conclude that recongurable computers provide an excellent platform on which to run sequence alignment, and that clusters of recongurable computers will be able to cope far more easily with the vast quantities of data produced by new ultra-high-throughput sequencers
High-Performance and Power-Aware Graph Processing on GPUs
Graphs are a common representation in many problem domains, including engineering, finance, medicine, and scientific applications. Different problems map to very large graphs, often involving millions of vertices. Even though very efficient sequential implementations of graph algorithms exist, they become impractical when applied on such actual very large graphs. On the other hand, graphics processing units (GPUs) have become widespread architectures as they provide massive parallelism at low cost. Parallel execution on GPUs may achieve speedup up to three orders of magnitude with respect to the sequential counterparts. Nevertheless, accelerating efficient and optimized sequential algorithms and porting (i.e., parallelizing) their implementation to such many-core architectures is a very challenging task. The task is made even harder since energy and power consumption are becoming constraints in addition, or in same case as an alternative, to performance. This work aims at developing a platform that provides (I) a library of parallel, efficient, and tunable implementations of the most important graph algorithms for GPUs, and (II) an advanced profiling model to analyze both performance and power consumption of the algorithm implementations. The platform goal is twofold. Through the library, it aims at saving developing effort in the parallelization task through a primitive-based approach. Through the profiling framework, it aims at customizing such primitives by considering both the architectural details and the target efficiency metrics (i.e., performance or power)
Methoden und Werkzeuge zum Einsatz von rekonfigurierbaren Akzeleratoren in Mehrkernsystemen
Rechensysteme mit Mehrkernprozessoren werden häufig um einen rekonfigurierbaren Akzelerator wie einen FPGA erweitert. Die Verlagerung von Anwendungsteilen in Hardware wird meist von Spezialisten vorgenommen. Damit Anwender selbst rekonfigurierbare Hardware programmieren können, ist mein Beitrag die komponentenbasierte Programmierung und Verwendung mit automatischer Beachtung der
Datenlokalität. So lässt sich auch bei datenintensiven Anwendungen Nutzen aus den Akzeleratoren erzielen