334,905 research outputs found
Parallel programming in biomedical signal processing
Dissertação para obtenção do Grau de Mestre em
Engenharia BiomédicaPatients with neuromuscular and cardiorespiratory diseases need to be monitored continuously. This constant monitoring gives rise to huge amounts of multivariate data which need to be processed as soon as possible, so that their most relevant features can be extracted.
The field of parallel processing, an area from the computational sciences, comes naturally as a way to provide an answer to this problem. For the parallel processing to succeed it is necessary to adapt the pre-existing signal processing algorithms to the modern architectures of computer systems with several processing units.
In this work parallel processing techniques are applied to biosignals, connecting the
area of computer science to the biomedical domain. Several considerations are made
on how to design parallel algorithms for signal processing, following the data parallel paradigm. The emphasis is given to algorithm design, rather than the computing
systems that execute these algorithms. Nonetheless, shared memory systems and distributed memory systems are mentioned in the present work.
Two signal processing tools integrating some of the parallel programming concepts
mentioned throughout this work were developed. These tools allow a fast and efficient analysis of long-term biosignals. The two kinds of analysis are focused on heart rate variability and breath frequency, and aim to the processing of electrocardiograms and respiratory signals, respectively.
The proposed tools make use of the several processing units that most of the actual
computers include in their architecture, giving the clinician a fast tool without him having to set up a system specifically meant to run parallel programs
The Role of Distributed Computing in Big Data Science: Case Studies in Forensics and Bioinformatics
2014 - 2015The era of Big Data is leading the generation of large amounts of data,
which require storage and analysis capabilities that can be only ad-
dressed by distributed computing systems. To facilitate large-scale
distributed computing, many programming paradigms and frame-
works have been proposed, such as MapReduce and Apache Hadoop,
which transparently address some issues of distributed systems and
hide most of their technical details.
Hadoop is currently the most popular and mature framework sup-
porting the MapReduce paradigm, and it is widely used to store and
process Big Data using a cluster of computers. The solutions such
as Hadoop are attractive, since they simplify the transformation
of an application from non-parallel to the distributed one by means
of general utilities and without many skills. However, without any
algorithm engineering activity, some target applications are not alto-
gether fast and e cient, and they can su er from several problems
and drawbacks when are executed on a distributed system. In fact, a
distributed implementation is a necessary but not su cient condition
to obtain remarkable performance with respect to a non-parallel coun-
terpart. Therefore, it is required to assess how distributed solutions
are run on a Hadoop cluster, and/or how their performance can be
improved to reduce resources consumption and completion times.
In this dissertation, we will show how Hadoop-based implementations
can be enhanced by using carefully algorithm engineering activity,
tuning, pro ling and code improvements. It is also analyzed how to
achieve these goals by working on some critical points, such as: data
local computation, input split size, number and granularity of tasks,
cluster con guration, input/output representation, etc.
i
In particular, to address these issues, we choose some case studies
coming from two research areas where the amount of data is rapidly
increasing, namely, Digital Image Forensics and Bioinformatics. We
mainly describe full- edged implementations to show how to design,
engineer, improve and evaluate Hadoop-based solutions for Source
Camera Identi cation problem, i.e., recognizing the camera used for
taking a given digital image, adopting the algorithm by Fridrich et al.,
and for two of the main problems in Bioinformatics, i.e., alignment-
free sequence comparison and extraction of k-mer cumulative or local
statistics.
The results achieved by our improved implementations show that they
are substantially faster than the non-parallel counterparts, and re-
markably faster than the corresponding Hadoop-based naive imple-
mentations. In some cases, for example, our solution for k-mer statis-
tics is approximately 30Ă faster than our Hadoop-based naive im-
plementation, and about 40Ă faster than an analogous tool build on
Hadoop. In addition, our applications are also scalable, i.e., execution
times are (approximately) halved by doubling the computing units.
Indeed, algorithm engineering activities based on the implementation
of smart improvements and supported by careful pro ling and tun-
ing may lead to a much better experimental performance avoiding
potential problems.
We also highlight how the proposed solutions, tips, tricks and insights
can be used in other research areas and problems.
Although Hadoop simpli es some tasks of the distributed environ-
ments, we must thoroughly know it to achieve remarkable perfor-
mance. It is not enough to be an expert of the application domain
to build Hadop-based implementations, indeed, in order to achieve
good performance, an expert of distributed systems, algorithm engi-
neering, tuning, pro ling, etc. is also required. Therefore, the best
performance depend heavily on the cooperation degree between the
domain expert and the distributed algorithm engineer. [edited by Author]XIV n.s
Platform Dependent Verification: On Engineering Verification Tools for 21st Century
The paper overviews recent developments in platform-dependent explicit-state
LTL model checking.Comment: In Proceedings PDMC 2011, arXiv:1111.006
Deterministic voting in distributed systems using error-correcting codes
Distributed voting is an important problem in reliable computing. In an N Modular Redundant (NMR) system, the N computational modules execute identical tasks and they need to periodically vote on their current states. In this paper, we propose a deterministic majority voting algorithm for NMR systems. Our voting algorithm uses error-correcting codes to drastically reduce the average case communication complexity. In particular, we show that the efficiency of our voting algorithm can be improved by choosing the parameters of the error-correcting code to match the probability of the computational faults. For example, consider an NMR system with 31 modules, each with a state of m bits, where each module has an independent computational error probability of 10^-3. In, this NMR system, our algorithm can reduce the average case communication complexity to approximately 1.0825 m compared with the communication complexity of 31 m of the naive algorithm in which every module broadcasts its local result to all other modules. We have also implemented the voting algorithm over a network of workstations. The experimental performance results match well the theoretical predictions
A Multi-Core Solver for Parity Games
We describe a parallel algorithm for solving parity games,\ud
with applications in, e.g., modal mu-calculus model\ud
checking with arbitrary alternations, and (branching) bisimulation\ud
checking. The algorithm is based on Jurdzinski's Small Progress\ud
Measures. Actually, this is a class of algorithms, depending on\ud
a selection heuristics.\ud
\ud
Our algorithm operates lock-free, and mostly wait-free (except for\ud
infrequent termination detection), and thus allows maximum\ud
parallelism. Additionally, we conserve memory by avoiding storage\ud
of predecessor edges for the parity graph through strictly\ud
forward-looking heuristics.\ud
\ud
We evaluate our multi-core implementation's behaviour on parity games\ud
obtained from mu-calculus model checking problems for a set of\ud
communication protocols, randomly generated problem instances, and\ud
parametric problem instances from the literature.\ud
\u
CCL: a portable and tunable collective communication library for scalable parallel computers
A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model
Parallel symbolic state-space exploration is difficult, but what is the alternative?
State-space exploration is an essential step in many modeling and analysis
problems. Its goal is to find the states reachable from the initial state of a
discrete-state model described. The state space can used to answer important
questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a
starting point for sophisticated investigations expressed in temporal logic.
Unfortunately, the state space is often so large that ordinary explicit data
structures and sequential algorithms cannot cope, prompting the exploration of
(1) parallel approaches using multiple processors, from simple workstation
networks to shared-memory supercomputers, to satisfy large memory and runtime
requirements and (2) symbolic approaches using decision diagrams to encode the
large structured sets and relations manipulated during state-space generation.
Both approaches have merits and limitations. Parallel explicit state-space
generation is challenging, but almost linear speedup can be achieved; however,
the analysis is ultimately limited by the memory and processors available.
Symbolic methods are a heuristic that can efficiently encode many, but not all,
functions over a structured and exponentially large domain; here the pitfalls
are subtler: their performance varies widely depending on the class of decision
diagram chosen, the state variable order, and obscure algorithmic parameters.
As symbolic approaches are often much more efficient than explicit ones for
many practical models, we argue for the need to parallelize symbolic
state-space generation algorithms, so that we can realize the advantage of both
approaches. This is a challenging endeavor, as the most efficient symbolic
algorithm, Saturation, is inherently sequential. We conclude by discussing
challenges, efforts, and promising directions toward this goal
- âŠ