5,490 research outputs found
String Matching with Multicore CPUs: Performing Better with the Aho-Corasick Algorithm
Multiple string matching is known as locating all the occurrences of a given
number of patterns in an arbitrary string. It is used in bio-computing
applications where the algorithms are commonly used for retrieval of
information such as sequence analysis and gene/protein identification.
Extremely large amount of data in the form of strings has to be processed in
such bio-computing applications. Therefore, improving the performance of
multiple string matching algorithms is always desirable. Multicore
architectures are capable of providing better performance by parallelizing the
multiple string matching algorithms. The Aho-Corasick algorithm is the one that
is commonly used in exact multiple string matching algorithms. The focus of
this paper is the acceleration of Aho-Corasick algorithm through a multicore
CPU based software implementation. Through our implementation and evaluation of
results, we prove that our method performs better compared to the state of the
art
Reducing Timing Interferences in Real-Time Applications Running on Multicore Architectures
We introduce a unified wcet analysis and scheduling framework for real-time applications deployed on multicore architectures. Our method does not follow a particular programming model, meaning that any piece of existing code (in particular legacy) can be re-used, and aims at reducing automatically the worst-case number of timing interferences between tasks. Our method is based on the notion of Time Interest Points (tips), which are instructions that can generate and/or suffer from timing interferences. We show how such points can be extracted from the binary code of applications and selected prior to performing the wcet analysis. We then represent real-time tasks as sequences of time intervals separated by tips, and schedule those tasks so that the overall makespan (including the potential timing penalties incurred by interferences) is minimized. This scheduling phase is performed using an Integer Linear Programming (ilp) solver. Preliminary results on state-of-the-art benchmarks show promising results and pave the way for future extensions of the model and optimizations
Second-generation PLINK: rising to the challenge of larger and richer datasets
PLINK 1 is a widely used open-source C/C++ toolset for genome-wide
association studies (GWAS) and research in population genetics. However, the
steady accumulation of data from imputation and whole-genome sequencing studies
has exposed a strong need for even faster and more scalable implementations of
key functions. In addition, GWAS and population-genetic data now frequently
contain probabilistic calls, phase information, and/or multiallelic variants,
none of which can be represented by PLINK 1's primary data format.
To address these issues, we are developing a second-generation codebase for
PLINK. The first major release from this codebase, PLINK 1.9, introduces
extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space
Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic
improvements. In combination, these changes accelerate most operations by 1-4
orders of magnitude, and allow the program to handle datasets too large to fit
in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data
format capable of efficiently representing probabilities, phase, and
multiallelic variants, and (b) extensions of many functions to account for the
new types of information.
The second-generation versions of PLINK will offer dramatic improvements in
performance and compatibility. For the first time, users without access to
high-end computing resources can perform several essential analyses of the
feature-rich and very large genetic datasets coming into use.Comment: 2 figures, 1 additional fil
Tiled QR factorization algorithms
This work revisits existing algorithms for the QR factorization of
rectangular matrices composed of p-by-q tiles, where p >= q. Within this
framework, we study the critical paths and performance of algorithms such as
Sameh and Kuck, Modi and Clarke, Greedy, and those found within PLASMA.
Although neither Modi and Clarke nor Greedy is optimal, both are shown to be
asymptotically optimal for all matrices of size p = q^2 f(q), where f is any
function such that \lim_{+\infty} f= 0. This novel and important complexity
result applies to all matrices where p and q are proportional, p = \lambda q,
with \lambda >= 1, thereby encompassing many important situations in practice
(least squares). We provide an extensive set of experiments that show the
superiority of the new algorithms for tall matrices
Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
This paper introduces Tiramisu, a polyhedral framework designed to generate
high performance code for multiple platforms including multicores, GPUs, and
distributed machines. Tiramisu introduces a scheduling language with novel
extensions to explicitly manage the complexities that arise when targeting
these systems. The framework is designed for the areas of image processing,
stencils, linear algebra and deep learning. Tiramisu has two main features: it
relies on a flexible representation based on the polyhedral model and it has a
rich scheduling language allowing fine-grained control of optimizations.
Tiramisu uses a four-level intermediate representation that allows full
separation between the algorithms, loop transformations, data layouts, and
communication. This separation simplifies targeting multiple hardware
architectures with the same algorithm. We evaluate Tiramisu by writing a set of
image processing, deep learning, and linear algebra benchmarks and compare them
with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu
matches or outperforms existing compilers and libraries on different hardware
architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041
- …