9,046 research outputs found
Analyzing large-scale DNA Sequences on Multi-core Architectures
Rapid analysis of DNA sequences is important in preventing the evolution of
different viruses and bacteria during an early phase, early diagnosis of
genetic predispositions to certain diseases (cancer, cardiovascular diseases),
and in DNA forensics. However, real-world DNA sequences may comprise several
Gigabytes and the process of DNA analysis demands adequate computational
resources to be completed within a reasonable time. In this paper we present a
scalable approach for parallel DNA analysis that is based on Finite Automata,
and which is suitable for analyzing very large DNA segments. We evaluate our
approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog
(2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results
on a dual-socket shared-memory system with 24 physical cores show speed-ups of
up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel
approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and
Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201
Recommended from our members
Multi-Omic Profiling of Melophlus Sponges Reveals Diverse Metabolomic and Microbiome Architectures that Are Non-overlapping with Ecological Neighbors.
Marine sponge holobionts, defined as filter-feeding sponge hosts together with their associated microbiomes, are prolific sources of natural products. The inventory of natural products that have been isolated from marine sponges is extensive. Here, using untargeted mass spectrometry, we demonstrate that sponges harbor a far greater diversity of low-abundance natural products that have evaded discovery. While these low-abundance natural products may not be feasible to isolate, insights into their chemical structures can be gleaned by careful curation of mass fragmentation spectra. Sponges are also some of the most complex, multi-organismal holobiont communities in the oceans. We overlay sponge metabolomes with their microbiome structures and detailed metagenomic characterization to discover candidate gene clusters that encode production of sponge-derived natural products. The multi-omic profiling strategy for sponges that we describe here enables quantitative comparison of sponge metabolomes and microbiomes to address, among other questions, the ecological relevance of sponge natural products and for the phylochemical assignment of previously undescribed sponge identities
khmer: Working with Big Data in Bioinformatics
We introduce design and optimization considerations for the 'khmer' package.Comment: Invited chapter for forthcoming book on Performance of Open Source
Application
Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space
The notion of energy landscapes provides conceptual tools for understanding
the complexities of protein folding and function. Energy Landscape Theory
indicates that it is much easier to find sequences that satisfy the "Principle
of Minimal Frustration" when the folded structure is symmetric (Wolynes, P. G.
Symmetry and the Energy Landscapes of Biomolecules. Proc. Natl. Acad. Sci.
U.S.A. 1996, 93, 14249-14255). Similarly, repeats and structural mosaics may be
fundamentally related to landscapes with multiple embedded funnels. Here we
present analytical tools to detect and compare structural repetitions in
protein molecules. By an exhaustive analysis of the distribution of structural
repeats using a robust metric we define those portions of a protein molecule
that best describe the overall structure as a tessellation of basic units. The
patterns produced by such tessellations provide intuitive representations of
the repeating regions and their association towards higher order arrangements.
We find that some protein architectures can be described as nearly periodic,
while in others clear separations between repetitions exist. Since the method
is independent of amino acid sequence information we can identify structural
units that can be encoded by a variety of distinct amino acid sequences
Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.
The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics
Digital Ecosystems: Ecosystem-Oriented Architectures
We view Digital Ecosystems to be the digital counterparts of biological
ecosystems. Here, we are concerned with the creation of these Digital
Ecosystems, exploiting the self-organising properties of biological ecosystems
to evolve high-level software applications. Therefore, we created the Digital
Ecosystem, a novel optimisation technique inspired by biological ecosystems,
where the optimisation works at two levels: a first optimisation, migration of
agents which are distributed in a decentralised peer-to-peer network, operating
continuously in time; this process feeds a second optimisation based on
evolutionary computing that operates locally on single peers and is aimed at
finding solutions to satisfy locally relevant constraints. The Digital
Ecosystem was then measured experimentally through simulations, with measures
originating from theoretical ecology, evaluating its likeness to biological
ecosystems. This included its responsiveness to requests for applications from
the user base, as a measure of the ecological succession (ecosystem maturity).
Overall, we have advanced the understanding of Digital Ecosystems, creating
Ecosystem-Oriented Architectures where the word ecosystem is more than just a
metaphor.Comment: 39 pages, 26 figures, journa
- …