8,344 research outputs found
HMMER web server: interactive sequence similarity searching
HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them
Exact Asymptotic Results for a Model of Sequence Alignment
Finding analytically the statistics of the longest common subsequence (LCS)
of a pair of random sequences drawn from c alphabets is a challenging problem
in computational evolutionary biology. We present exact asymptotic results for
the distribution of the LCS in a simpler, yet nontrivial, variant of the
original model called the Bernoulli matching (BM) model which reduces to the
original model in the large c limit. We show that in the BM model, for all c,
the distribution of the asymptotic length of the LCS, suitably scaled, is
identical to the Tracy-Widom distribution of the largest eigenvalue of a random
matrix whose entries are drawn from a Gaussian unitary ensemble. In particular,
in the large c limit, this provides an exact expression for the asymptotic
length distribution in the original LCS problem.Comment: 4 pages Revtex, 2 .eps figures include
Non-local on-shell field redefinition for the SME
This work instigates a study of non-local field mappings within the Lorentz-
and CPT-violating Standard-Model Extension (SME). An example of such a mapping
is constructed explicitly, and the conditions for the existence of its inverse
are investigated. It is demonstrated that the associated field redefinition can
remove b-type Lorentz violation from free SME fermions in certain situations.
These results are employed to obtain explicit expressions for the corresponding
Lorentz-breaking momentum-space eigenspinors and their orthogonality relations.Comment: 12 pages, REVTeX
Efficient chaining of seeds in ordered trees
We consider here the problem of chaining seeds in ordered trees. Seeds are
mappings between two trees Q and T and a chain is a subset of non overlapping
seeds that is consistent with respect to postfix order and ancestrality. This
problem is a natural extension of a similar problem for sequences, and has
applications in computational biology, such as mining a database of RNA
secondary structures. For the chaining problem with a set of m constant size
seeds, we describe an algorithm with complexity O(m2 log(m)) in time and O(m2)
in space
Bethe Ansatz in the Bernoulli Matching Model of Random Sequence Alignment
For the Bernoulli Matching model of sequence alignment problem we apply the
Bethe ansatz technique via an exact mapping to the 5--vertex model on a square
lattice. Considering the terrace--like representation of the sequence alignment
problem, we reproduce by the Bethe ansatz the results for the averaged length
of the Longest Common Subsequence in Bernoulli approximation. In addition, we
compute the average number of nucleation centers of the terraces.Comment: 14 pages, 5 figures (some points are clarified
Generalized Buneman pruning for inferring the most parsimonious multi-state phylogeny
Accurate reconstruction of phylogenies remains a key challenge in
evolutionary biology. Most biologically plausible formulations of the problem
are formally NP-hard, with no known efficient solution. The standard in
practice are fast heuristic methods that are empirically known to work very
well in general, but can yield results arbitrarily far from optimal. Practical
exact methods, which yield exponential worst-case running times but generally
much better times in practice, provide an important alternative. We report
progress in this direction by introducing a provably optimal method for the
weighted multi-state maximum parsimony phylogeny problem. The method is based
on generalizing the notion of the Buneman graph, a construction key to
efficient exact methods for binary sequences, so as to apply to sequences with
arbitrary finite numbers of states with arbitrary state transition weights. We
implement an integer linear programming (ILP) method for the multi-state
problem using this generalized Buneman graph and demonstrate that the resulting
method is able to solve data sets that are intractable by prior exact methods
in run times comparable with popular heuristics. Our work provides the first
method for provably optimal maximum parsimony phylogeny inference that is
practical for multi-state data sets of more than a few characters.Comment: 15 page
Consistency analysis of a nonbirefringent Lorentz-violating planar model
In this work analyze the physical consistency of a nonbirefringent
Lorentz-violating planar model via the analysis of the pole structure of its
Feynman propagators. The nonbirefringent planar model, obtained from the
dimensional reduction of the CPT-even gauge sector of the standard model
extension, is composed of a gauge and a scalar fields, being affected by
Lorentz-violating (LIV) coefficients encoded in the symmetric tensor
. The propagator of the gauge field is explicitly evaluated
and expressed in terms of linear independent symmetric tensors, presenting only
one physical mode. The same holds for the scalar propagator. A consistency
analysis is performed based on the poles of the propagators. The isotropic
parity-even sector is stable, causal and unitary mode for .
On the other hand, the anisotropic sector is stable and unitary but in general
noncausal. Finally, it is shown that this planar model interacting with a
Higgs field supports compactlike vortex configurations.Comment: 11 pages, revtex style, final revised versio
A unifying framework for seed sensitivity and its application to subset seeds
We propose a general approach to compute the seed sensitivity, that can be
applied to different definitions of seeds. It treats separately three
components of the seed sensitivity problem -- a set of target alignments, an
associated probability distribution, and a seed model -- that are specified by
distinct finite automata. The approach is then applied to a new concept of
subset seeds for which we propose an efficient automaton construction.
Experimental results confirm that sensitive subset seeds can be efficiently
designed using our approach, and can then be used in similarity search
producing better results than ordinary spaced seeds
PLAST-ncRNA: Partition function Local Alignment Search Tool for non-coding RNA sequences
Alignment-based programs are valuable tools for finding potential homologs in genome sequences. Previously, it has been shown that partition function posterior probabilities attuned to local alignment achieve a high accuracy in identifying distantly similar non-coding RNA sequences that are hidden in a large genome. Here, we present an online implementation of that alignment algorithm based on such probabilities. Our server takes as input a query RNA sequence and a large genome sequence, and outputs a list of hits that are above a mean posterior probability threshold. The output is presented in a format suited to local alignment. It can also be viewed within the PLAST alignment viewer applet that provides a list of all hits found and highlights regions of high posterior probability within each local alignment. The server is freely available at http://plastrna.njit.edu
CCDB: a curated database of genes involved in cervix cancer
The Cervical Cancer gene DataBase (CCDB, http://crdd.osdd.net/raghava/ccdb) is a manually curated catalog of experimentally validated genes that are thought, or are known to be involved in the different stages of cervical carcinogenesis. In spite of the large women population that is presently affected from this malignancy still at present, no database exists that catalogs information on genes associated with cervical cancer. Therefore, we have compiled 537 genes in CCDB that are linked with cervical cancer causation processes such as methylation, gene amplification, mutation, polymorphism and change in expression level, as evident from published literature. Each record contains details related to gene like architecture (exon–intron structure), location, function, sequences (mRNA/CDS/protein), ontology, interacting partners, homology to other eukaryotic genomes, structure and links to other public databases, thus augmenting CCDB with external data. Also, manually curated literature references have been provided to support the inclusion of the gene in the database and establish its association with cervix cancer. In addition, CCDB provides information on microRNA altered in cervical cancer as well as search facility for querying, several browse options and an online tool for sequence similarity search, thereby providing researchers with easy access to the latest information on genes involved in cervix cancer
- …