4 research outputs found
Performance analysis of sequence alignment applications
Advances in molecular biology have led to a continued growth in the biological information generated by the scientific community. Additionally, this area has become a multi-disciplinary field, including components of mathematics, biology, chemistry, and computer science, generating several challenges in the scientific community from different points of view. For this reason, bioinformatic applications represent an increasingly important workload. However, even though the importance of this field is clear, common bioinformatic applications and their implications on micro-architectural design have not received enough attention from the computer architecture community. This paper presents a micro-architecture performance analysis of recognized bioinformatic applications for the comparison and alignment of biological sequences, including BLAST, FASTA and some recognized parallel implementations of the Smith-Waterman algorithm that use the Altivec SIMD extension to speed-up the performance. We adopt a simulation-based methodology to perform a detailed workload characterization. We analyze architectural and micro-architectural aspects like pipeline configurations, issue widths, functional unit mixes, memory hierarchy and their implications on the performance behavior. We have found that the memory subsystem is the component with more impact in the performance of the BLAST heuristic, the branch predictor is responsible for the major performance loss for FASTA and SSEARCH34, and long dependency chains are the limiting factor in the SIMD implementations of Smith-Waterman.Peer ReviewedPostprint (published version
Low-Level Haskell Code: Measurements and Optimization Techniques
Haskell is a lazy functional language with a strong static type system and
excellent support for parallel programming. The language features of Haskell
make it easier to write correct and maintainable programs, but execution speed
often suffers from the high levels of abstraction. While much past research
focuses on high-level optimizations that take advantage of the functional
properties of Haskell, relatively little attention has been paid to the
optimization opportunities in the low-level imperative code generated during
translation to machine code. One problem with current low-level optimizations
is that their effectiveness is limited by the obscured control flow caused by
Haskell's high-level abstractions. My thesis is that trace-based optimization
techniques can be used to improve the effectiveness of low-level optimizations
for Haskell programs. I claim three unique contributions in this work.
The first contribution is to expose some properties of low-level Haskell codes
by looking at the mix of operations performed by the selected benchmark codes
and comparing them to the low-level codes coming from traditional programming
languages. The low-level measurements reveal that the control flow is obscured
by indirect jumps caused by the implementation of lazy evaluation,
higher-order functions, and the separately managed stacks used by Haskell
programs.
My second contribution is a study on the effectiveness of a dynamic binary
trace-based optimizer running on Haskell programs. My results show that while
viable program traces frequently occur in Haskell programs the overhead
associated with maintaing the traces in a dynamic optimization system outweigh
the benefits we get from running the traces. To reduce the runtime overheads,
I explore a way to find traces in a separate profiling step.
My final contribution is to build and evaluate a static trace-based optimizer
for Haskell programs. The static optimizer uses profiling data to find traces
in a Haskell program and then restructures the code around the traces to
increase the scope available to the low-level optimizer. My results show that
we can successfully build traces in Haskell programs, and the optimized code
yields a speedup over existing low-level optimizers of up to 86%
with an average speedup of 5% across 32 benchmarks
Performance of Runtime Optimization on BLAST
Optimization of a real world application BLAST is used to demonstrate the limitations of static and profile-guided optimizations and to highlight the potential of runtime optimization systems. We analyze the performance profile of this application to determine performance bottlenecks and evaluate the effect of aggressive compiler optimizations on BLAST. We find that applying common optimizations (e.g. O3) can degrade performance. Profile guided optimizations do not show much improvement across the board, as current implementations do not address critical performance bottlenecks in BLAST. In some cases, these optimizations lower performance significantly due to unexpected secondary effects of aggressive optimizations. We also apply runtime optimization to BLAST using the ADORE framework. ADORE speeds up some queries by as much as 58% using data cache prefetching. Branch mispredictions can also be significant for some input sets. Dynamic optimization techniques to improve branch prediction accuracy are described and examined for the application. We find that the primary limitation to the application of runtime optimization for branch misprediction is the tight coupling between data and dependent branch. With better hardware support for influencing branch prediction, a runtime optimizer may deploy optimizations to reduce branch misprediction stalls