Search CORE

9,046 research outputs found

Analyzing large-scale DNA Sequences on Multi-core Architectures

Author: Memeti Suejb
Pllana Sabri
Publication venue
Publication date: 01/01/2015
Field of study

Rapid analysis of DNA sequences is important in preventing the evolution of different viruses and bacteria during an early phase, early diagnosis of genetic predispositions to certain diseases (cancer, cardiovascular diseases), and in DNA forensics. However, real-world DNA sequences may comprise several Gigabytes and the process of DNA analysis demands adequate computational resources to be completed within a reasonable time. In this paper we present a scalable approach for parallel DNA analysis that is based on Finite Automata, and which is suitable for analyzing very large DNA segments. We evaluate our approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog (2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201

arXiv.org e-Print Archive

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Linnéuniversitetets forskningsdatabas

Recommended from our members

Multi-Omic Profiling of Melophlus Sponges Reveals Diverse Metabolomic and Microbiome Architectures that Are Non-overlapping with Ecological Neighbors.

Author: Agarwal Vinayak
Allen Eric E
Biggs Jason S
Garg Neha
Mohanty Ipsita
Podell Sheila
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

Marine sponge holobionts, defined as filter-feeding sponge hosts together with their associated microbiomes, are prolific sources of natural products. The inventory of natural products that have been isolated from marine sponges is extensive. Here, using untargeted mass spectrometry, we demonstrate that sponges harbor a far greater diversity of low-abundance natural products that have evaded discovery. While these low-abundance natural products may not be feasible to isolate, insights into their chemical structures can be gleaned by careful curation of mass fragmentation spectra. Sponges are also some of the most complex, multi-organismal holobiont communities in the oceans. We overlay sponge metabolomes with their microbiome structures and detailed metagenomic characterization to discover candidate gene clusters that encode production of sponge-derived natural products. The multi-omic profiling strategy for sponges that we describe here enables quantitative comparison of sponge metabolomes and microbiomes to address, among other questions, the ecological relevance of sponge natural products and for the phylochemical assignment of previously undescribed sponge identities

eScholarship - University of California

khmer: Working with Big Data in Bioinformatics

Author: Brown C. Titus
McDonald Eric
Publication venue
Publication date: 09/03/2013
Field of study

We introduce design and optimization considerations for the 'khmer' package.Comment: Invited chapter for forthcoming book on Performance of Open Source Application

arXiv.org e-Print Archive

CiteSeerX

Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space

Author: Espada Rocío
Ferreiro Diego U.
Parra R. Gonzalo
Sippl Manfred J.
Sánchez Ignacio E.
Publication venue
Publication date: 01/01/2013
Field of study

The notion of energy landscapes provides conceptual tools for understanding the complexities of protein folding and function. Energy Landscape Theory indicates that it is much easier to find sequences that satisfy the "Principle of Minimal Frustration" when the folded structure is symmetric (Wolynes, P. G. Symmetry and the Energy Landscapes of Biomolecules. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 14249-14255). Similarly, repeats and structural mosaics may be fundamentally related to landscapes with multiple embedded funnels. Here we present analytical tools to detect and compare structural repetitions in protein molecules. By an exhaustive analysis of the distribution of structural repeats using a robust metric we define those portions of a protein molecule that best describe the overall structure as a tessellation of basic units. The patterns produced by such tessellations provide intuitive representations of the repeating regions and their association towards higher order arrangements. We find that some protein architectures can be described as nearly periodic, while in others clear separations between repetitions exist. Since the method is independent of amino acid sequence information we can identify structural units that can be encoded by a variety of distinct amino acid sequences

arXiv.org e-Print Archive

CiteSeerX

Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

Author: Lan H
Liu W
Lu M
Tan G
Vasilakos AV
Yin Z
Publication venue: 'Elsevier BV'
Publication date: 23/08/2022
Field of study

The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics

OPUS - University of Technology Sydney

Digital Ecosystems: Ecosystem-Oriented Architectures

Author: Briscoe Gerard
De Wilde Philippe
Sadedin Suzanne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/08/2011
Field of study

We view Digital Ecosystems to be the digital counterparts of biological ecosystems. Here, we are concerned with the creation of these Digital Ecosystems, exploiting the self-organising properties of biological ecosystems to evolve high-level software applications. Therefore, we created the Digital Ecosystem, a novel optimisation technique inspired by biological ecosystems, where the optimisation works at two levels: a first optimisation, migration of agents which are distributed in a decentralised peer-to-peer network, operating continuously in time; this process feeds a second optimisation based on evolutionary computing that operates locally on single peers and is aimed at finding solutions to satisfy locally relevant constraints. The Digital Ecosystem was then measured experimentally through simulations, with measures originating from theoretical ecology, evaluating its likeness to biological ecosystems. This included its responsiveness to requests for applications from the user base, as a measure of the ecological succession (ecosystem maturity). Overall, we have advanced the understanding of Digital Ecosystems, creating Ecosystem-Oriented Architectures where the word ecosystem is more than just a metaphor.Comment: 39 pages, 26 figures, journa

arXiv.org e-Print Archive

Heriot Watt Pure

Kent Academic Repository