4,937 research outputs found
A Two-Phase Dynamic Programming Algorithm Tool for DNA Sequences
Sequence alignment has to do with the arrangement of DNA, RNA, and protein sequences to identify areas of similarity. Technic ally, it
involves the arrangement of the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of
functional, structural, or evolutionary relationships between the sequences. Similarity may be a consequence of functional, s tructural, or
evolutionary relationships between the sequences. If two sequences in an alignment share a common ancestor, mismatches can be
interpreted as mutations, and gaps as insertions. Such information becomes of great use in vital areas such as the study of d iseases,
genomics and generally in the biological sciences. Thus, sequence alignment presents not just an exciting field of study, but a field of
great importance to mankind. In this light, we extensively studied about seventy (70) existing sequence alignment tools available to us.
Most of these tools are not user friendly and cannot be used by biologists. The few tools that attempted both Local and Global algorithms
are not ready available freely. We therefore implemented a sequence alignment tool (CU-Aligner) in an understandable, user-friendly and
portable way, with click-of-a-button simplicity. This is done utilizing the Needleman-Wunsh and Smith-Waterman algorithms for global
and local alignments, respectively which focuses primarily on DNA sequences. Our aligner is implemented in the Java language in both
application and applet mode and has been efficient on all windows operating systems
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Structural Alignment of RNAs Using Profile-csHMMs and Its Application to RNA Homology Search: Overview and New Results
Systematic research on noncoding RNAs (ncRNAs) has revealed that many ncRNAs are actively involved in various biological networks. Therefore, in order to fully understand the mechanisms of these networks, it is crucial to understand the roles of ncRNAs. Unfortunately, the annotation of ncRNA genes that give rise to functional RNA molecules has begun only recently, and it is far from being complete. Considering the huge amount of genome sequence data, we need efficient computational methods for finding ncRNA genes. One effective way of finding ncRNA genes is to look for regions that are similar to known ncRNA genes. As many ncRNAs have well-conserved secondary structures, we need statistical models that can represent such structures for this purpose. In this paper, we propose a new method for representing RNA sequence profiles and finding structural alignment of RNAs based on profile context-sensitive hidden Markov models (profile-csHMMs). Unlike existing models, the proposed approach can handle any kind of RNA secondary structures, including pseudoknots. We show that profile-csHMMs can provide an effective framework for the computational analysis of RNAs and the identification of ncRNA genes
Parametric Alignment of Drosophila Genomes
The classic algorithms of Needleman--Wunsch and Smith--Waterman find a
maximum a posteriori probability alignment for a pair hidden Markov model
(PHMM). In order to process large genomes that have undergone complex genome
rearrangements, almost all existing whole genome alignment methods apply fast
heuristics to divide genomes into small pieces which are suitable for
Needleman--Wunsch alignment. In these alignment methods, it is standard
practice to fix the parameters and to produce a single alignment for subsequent
analysis by biologists.
Our main result is the construction of a whole genome parametric alignment of
Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment
resolves the issue of robustness to changes in parameters by finding all
optimal alignments for all possible parameters in a PHMM. Our alignment draws
on existing heuristics for dividing whole genomes into small pieces for
alignment, and it relies on advances we have made in computing convex polytopes
that allow us to parametrically align non-coding regions using biologically
realistic models. We demonstrate the utility of our parametric alignment for
biological inference by showing that cis-regulatory elements are more conserved
between Drosophila melanogaster and Drosophila pseudoobscura than previously
thought. We also show how whole genome parametric alignment can be used to
quantitatively assess the dependence of branch length estimates on alignment
parameters.
The alignment polytopes, software, and supplementary material can be
downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure
"Multiple Sequence Alignment Using External Sources Of Information"
Multiple sequence alignment is an alignment of three or more protein
or nucleic acid sequences. The alignment area has always been of much
interest for researchers, this is due to that fact that many scientifi c researchs
depend in their workflow on sequence alignments. Thus, having an alignment
of high quality is of high importance. Much work has been done and is
still carried in this field to help improving the quality of alignments. Many
approaches have been developed so far for performing pairwise and multiple
sequence alignments, yet, most of those approaches rely basically on the
sequences to be aligned as their only input. Recently, some approaches began
to incorporate additional sources of information in the alignment process, the
sources of external data can come from user knowledge or online databases.
This data, when integrated in the workflow of the alignment programs, may
add new constraints to the produced alignment and improve its quality
by making it biologically more meaningful. In this thesis, I will introduce
new approaches for multiple sequence alignment which use the alignment
software DIALIGN along with external information from databases, where
useful information is extracted and then integrated in the alignment process.
By testing those approaches on benchmark databases, I will show that
using additional data during alignment produced better results than using
DIALIGN alone without any external input other than the sequences to be
aligned
- …