Search CORE

153,882 research outputs found

A Two-Phase Dynamic Programming Algorithm Tool for DNA Sequences

Author: Fatumo S.
Oyelade O. J.
Publication venue
Publication date: 01/08/2013
Field of study

Sequence alignment has to do with the arrangement of DNA, RNA, and protein sequences to identify areas of similarity. Technic ally, it involves the arrangement of the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Similarity may be a consequence of functional, s tructural, or evolutionary relationships between the sequences. If two sequences in an alignment share a common ancestor, mismatches can be interpreted as mutations, and gaps as insertions. Such information becomes of great use in vital areas such as the study of d iseases, genomics and generally in the biological sciences. Thus, sequence alignment presents not just an exciting field of study, but a field of great importance to mankind. In this light, we extensively studied about seventy (70) existing sequence alignment tools available to us. Most of these tools are not user friendly and cannot be used by biologists. The few tools that attempted both Local and Global algorithms are not ready available freely. We therefore implemented a sequence alignment tool (CU-Aligner) in an understandable, user-friendly and portable way, with click-of-a-button simplicity. This is done utilizing the Needleman-Wunsh and Smith-Waterman algorithms for global and local alignments, respectively which focuses primarily on DNA sequences. Our aligner is implemented in the Java language in both application and applet mode and has been efficient on all windows operating systems

Covenant University Repository

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

PubMed Central

FigShare

Parametric Alignment of Drosophila Genomes

Author: Dewey Colin
Huggins Peter
Pachter Lior
Sturmfels Bernd
Woods Kevin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2005
Field of study

The classic algorithms of Needleman--Wunsch and Smith--Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces which are suitable for Needleman--Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters. The alignment polytopes, software, and supplementary material can be downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

A two-base encoded DNA sequence alignment problem in computational biology

Author: Chang Jen-Mei
Drakes Chiaka
Huang Erya
Langat Deidrey
McGrath Joseph
Morabito Mark
Pacheco Jose
Rodriguez Nancy
Salazar Daniel
Vemuri Rao
Vu Man
Wadhar Hem
Wu Qin
Publication venue
Publication date: 01/01/2009
Field of study

The recent introduction of instruments capable of producing millions of DNA sequence reads in a single run is rapidly changing the landscape of genetics. The primary objective of the "sequence alignment" problem is to search for a new algorithm that facilitates the use of two-base encoded data for large-scale re-sequencing projects. This algorithm should be able to perform local sequence alignment as well as error detection and correction in a reliable and systematic manner, enabling the direct comparison of encoded DNA sequence reads to a candidate reference DNA sequence. We will first briefly review two well-known sequence alignment approaches and provide a rudimentary improvement for implementation on parallel systems. Then, we carefully examin a unique sequencing technique known as the SOLiDTM System that can be implemented, and follow by the results from the global and local sequence alignment. In this report, the team presents an explanation of the algorithms for color space sequence data from the high-throughput re-sequencing technology and a theoretical parallel approach to the dynamic programming method for global and local alignment. The combination of the di-base approach and dynamic programming provides a possible viewpoint for large-scale re-sequencing projects. We anticipate the use of distributed computing to be the next-generation engine for large-scale problems like such

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions

Author: Arvestad
Blanchette
Brent
Butler
Clark
Goldman
Guttman
Guttman
Holmes
I. Jungreis
Kellis
Lin
M. F. Lin
M. Kellis
Ota
Ozsolak
Stark
Whelan
Yang
Publication venue
Publication date: 17/08/2010
Field of study

As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species _Drosophila_ genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE

PinAPL-Py: A comprehensive web-application for the analysis of CRISPR/Cas9 screens.

Author: Bath Tyler
Esko Jeffrey D
Harismendy Olivier
Kim Jihoon
Lewis Nathan E
Spahn Philipp N
Weiss Ryan J
Publication venue: eScholarship, University of California
Publication date: 01/11/2017
Field of study

Large-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise. To make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service. PinAPL-Py implements state-of-the-art tools and statistical models, assembled in a comprehensive workflow covering sequence quality control, automated sgRNA sequence extraction, alignment, sgRNA enrichment/depletion analysis and gene ranking. The workflow is set up to use a variety of popular sgRNA libraries as well as custom libraries that can be easily uploaded. Various analysis options are offered, suitable to analyze a large variety of CRISPR/Cas9 screening experiments. Analysis output includes ranked lists of sgRNAs and genes, and publication-ready plots. PinAPL-Py helps to advance genome-wide screening efforts by combining comprehensive functionality with user-friendly implementation. PinAPL-Py is freely accessible at http://pinapl-py.ucsd.edu with instructions and test datasets

Directory of Open Access Journals

eScholarship - University of California