65,367 research outputs found
Pairwise alignment incorporating dipeptide covariation
Motivation: Standard algorithms for pairwise protein sequence alignment make
the simplifying assumption that amino acid substitutions at neighboring sites
are uncorrelated. This assumption allows implementation of fast algorithms for
pairwise sequence alignment, but it ignores information that could conceivably
increase the power of remote homolog detection. We examine the validity of this
assumption by constructing extended substitution matrixes that encapsulate the
observed correlations between neighboring sites, by developing an efficient and
rigorous algorithm for pairwise protein sequence alignment that incorporates
these local substitution correlations, and by assessing the ability of this
algorithm to detect remote homologies. Results: Our analysis indicates that
local correlations between substitutions are not strong on the average.
Furthermore, incorporating local substitution correlations into pairwise
alignment did not lead to a statistically significant improvement in remote
homology detection. Therefore, the standard assumption that individual residues
within protein sequences evolve independently of neighboring positions appears
to be an efficient and appropriate approximation
Multiple sequence alignment based on set covers
We introduce a new heuristic for the multiple alignment of a set of
sequences. The heuristic is based on a set cover of the residue alphabet of the
sequences, and also on the determination of a significant set of blocks
comprising subsequences of the sequences to be aligned. These blocks are
obtained with the aid of a new data structure, called a suffix-set tree, which
is constructed from the input sequences with the guidance of the
residue-alphabet set cover and generalizes the well-known suffix tree of the
sequence set. We provide performance results on selected BAliBASE amino-acid
sequences and compare them with those yielded by some prominent approaches
Optimally fast incremental Manhattan plane embedding and planar tight span construction
We describe a data structure, a rectangular complex, that can be used to
represent hyperconvex metric spaces that have the same topology (although not
necessarily the same distance function) as subsets of the plane. We show how to
use this data structure to construct the tight span of a metric space given as
an n x n distance matrix, when the tight span is homeomorphic to a subset of
the plane, in time O(n^2), and to add a single point to a planar tight span in
time O(n). As an application of this construction, we show how to test whether
a given finite metric space embeds isometrically into the Manhattan plane in
time O(n^2), and add a single point to the space and re-test whether it has
such an embedding in time O(n).Comment: 39 pages, 15 figure
Higher accuracy protein Multiple Sequence Alignment by Stochastic Algorithm
Multiple Sequence Alignment gives insight into evolutionary, structural and functional relationships among the proteins. Here, a novel Protein Alignment by Stochastic Algorithm (PASA) is developed. Evolutionary operators of a genetic algorithm, namely, mutation and selection are utilized in combining the output of two most important sequence alignment programs and then developing an optimized new algorithm. Efficiency of protein alignments is evaluated in terms of Total Column score which is equal to the number of correctly aligned columns between a test alignment and the reference alignment divided by the total number of columns in the reference alignment. The PASA optimizer achieves, on an average, significant better alignment over the well known individual bioinformatics tools. This PASA is statistically the most accurate protein alignment method today. It can have potential applications in drug discovery processes in the biotechnology industry
The Binary Space Partitioning-Tree Process
The Mondrian process represents an elegant and powerful approach for space
partition modelling. However, as it restricts the partitions to be
axis-aligned, its modelling flexibility is limited. In this work, we propose a
self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the
Mondrian process. The BSP-Tree process is an almost surely right continuous
Markov jump process that allows uniformly distributed oblique cuts in a
two-dimensional convex polygon. The BSP-Tree process can also be extended using
a non-uniform probability measure to generate direction differentiated cuts.
The process is also self-consistent, maintaining distributional invariance
under a restricted subdomain. We use Conditional-Sequential Monte Carlo for
inference using the tree structure as the high-dimensional variable. The
BSP-Tree process's performance on synthetic data partitioning and relational
modelling demonstrates clear inferential improvements over the standard
Mondrian process and other related methods
Quantum Hamiltonian reduction of W-algebras and category O
W-algebras are a class of non-commutative algebras related to the classical
universal enveloping algebras. They can be defined as a subquotient of U(g)
related to a choice of nilpotent element e and compatible nilpotent subalgebra
m. The definition is a quantum analogue of the classical construction of
Hamiltonian reduction.
We define a quantum version of Hamiltonian reduction by stages and use it to
construct intermediate reductions between different W-algebras U(g,e) in type
A.This allows us to express the W-algebra U(g,e') as a subquotient of U(g,e)
for nilpotent elements e' covering e. It also produces a collection of
(U(g,e),U(g,e'))-bimodules analogous to the generalised Gel'fand-Graev modules
used in the classical definition of the W-algebra; these can be used to obtain
adjoint functors between the corresponding module categories.
The category of modules over a W-algebra has a full subcategory defined in a
parallel fashion to that of the Bernstein-Gel'fand-Gel'fand (BGG) category O;
this version of category O(e) for W-algebras is equivalent to an infinitesimal
block of O by an argument of Mili\v{c}i\'{c} and Soergel. We therefore
construct analogues of the translation functors between the different blocks of
O, in this case being functors between the categories O(e) for different
W-algebras U(g,e). This follows an argument of Losev, and realises the category
O(e') as equivalent to a full subcategory of the category O(e) where e' is
greater than e in the refinement ordering.Comment: University of Toronto PhD thesis, defended July 2014, 57 page
A Two-Phase Dynamic Programming Algorithm Tool for DNA Sequences
Sequence alignment has to do with the arrangement of DNA, RNA, and protein sequences to identify areas of similarity. Technic ally, it
involves the arrangement of the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of
functional, structural, or evolutionary relationships between the sequences. Similarity may be a consequence of functional, s tructural, or
evolutionary relationships between the sequences. If two sequences in an alignment share a common ancestor, mismatches can be
interpreted as mutations, and gaps as insertions. Such information becomes of great use in vital areas such as the study of d iseases,
genomics and generally in the biological sciences. Thus, sequence alignment presents not just an exciting field of study, but a field of
great importance to mankind. In this light, we extensively studied about seventy (70) existing sequence alignment tools available to us.
Most of these tools are not user friendly and cannot be used by biologists. The few tools that attempted both Local and Global algorithms
are not ready available freely. We therefore implemented a sequence alignment tool (CU-Aligner) in an understandable, user-friendly and
portable way, with click-of-a-button simplicity. This is done utilizing the Needleman-Wunsh and Smith-Waterman algorithms for global
and local alignments, respectively which focuses primarily on DNA sequences. Our aligner is implemented in the Java language in both
application and applet mode and has been efficient on all windows operating systems
- …