7,781 research outputs found
Truncated Profile Hidden Markov Models
The profile hidden Markov model (HMM) is a powerful method for remote homolog database search. However, evaluating the score of each database sequence against a profile HMM is computationally demanding. The computation time required for score evaluation is proportional to the number of states in the profile HMM. This paper examines whether the number of states can be truncated without reducing the ability of the HMM to find proteins containing members of a protein domain family. A genetic algorithm (GA) is presented which finds a good truncation of the HMM states. The results of using truncation on searches of the yeast, E. coli, and pig genomes for several different protein domain families is shown
Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction.
BackgroundOne of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.ResultsWe explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.ConclusionWhen attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used
High-Rate Vector Quantization for the Neyman-Pearson Detection of Correlated Processes
This paper investigates the effect of quantization on the performance of the
Neyman-Pearson test. It is assumed that a sensing unit observes samples of a
correlated stationary ergodic multivariate process. Each sample is passed
through an N-point quantizer and transmitted to a decision device which
performs a binary hypothesis test. For any false alarm level, it is shown that
the miss probability of the Neyman-Pearson test converges to zero exponentially
as the number of samples tends to infinity, assuming that the observed process
satisfies certain mixing conditions. The main contribution of this paper is to
provide a compact closed-form expression of the error exponent in the high-rate
regime i.e., when the number N of quantization levels tends to infinity,
generalizing previous results of Gupta and Hero to the case of non-independent
observations. If d represents the dimension of one sample, it is proved that
the error exponent converges at rate N^{2/d} to the one obtained in the absence
of quantization. As an application, relevant high-rate quantization strategies
which lead to a large error exponent are determined. Numerical results indicate
that the proposed quantization rule can yield better performance than existing
ones in terms of detection error.Comment: 47 pages, 7 figures, 1 table. To appear in the IEEE Transactions on
Information Theor
Learning a Hybrid Architecture for Sequence Regression and Annotation
When learning a hidden Markov model (HMM), sequen- tial observations can
often be complemented by real-valued summary response variables generated from
the path of hid- den states. Such settings arise in numerous domains, includ-
ing many applications in biology, like motif discovery and genome annotation.
In this paper, we present a flexible frame- work for jointly modeling both
latent sequence features and the functional mapping that relates the summary
response variables to the hidden state sequence. The algorithm is com- patible
with a rich set of mapping functions. Results show that the availability of
additional continuous response vari- ables can simultaneously improve the
annotation of the se- quential observations and yield good prediction
performance in both synthetic data and real-world datasets.Comment: AAAI 201
Hierarchical Bayesian sparse image reconstruction with application to MRFM
This paper presents a hierarchical Bayesian model to reconstruct sparse
images when the observations are obtained from linear transformations and
corrupted by an additive white Gaussian noise. Our hierarchical Bayes model is
well suited to such naturally sparse image applications as it seamlessly
accounts for properties such as sparsity and positivity of the image via
appropriate Bayes priors. We propose a prior that is based on a weighted
mixture of a positive exponential distribution and a mass at zero. The prior
has hyperparameters that are tuned automatically by marginalization over the
hierarchical Bayesian model. To overcome the complexity of the posterior
distribution, a Gibbs sampling strategy is proposed. The Gibbs samples can be
used to estimate the image to be recovered, e.g. by maximizing the estimated
posterior distribution. In our fully Bayesian approach the posteriors of all
the parameters are available. Thus our algorithm provides more information than
other previously proposed sparse reconstruction methods that only give a point
estimate. The performance of our hierarchical Bayesian sparse reconstruction
method is illustrated on synthetic and real data collected from a tobacco virus
sample using a prototype MRFM instrument.Comment: v2: final version; IEEE Trans. Image Processing, 200
Bayesian separation of spectral sources under non-negativity and full additivity constraints
This paper addresses the problem of separating spectral sources which are
linearly mixed with unknown proportions. The main difficulty of the problem is
to ensure the full additivity (sum-to-one) of the mixing coefficients and
non-negativity of sources and mixing coefficients. A Bayesian estimation
approach based on Gamma priors was recently proposed to handle the
non-negativity constraints in a linear mixture model. However, incorporating
the full additivity constraint requires further developments. This paper
studies a new hierarchical Bayesian model appropriate to the non-negativity and
sum-to-one constraints associated to the regressors and regression coefficients
of linear mixtures. The estimation of the unknown parameters of this model is
performed using samples generated using an appropriate Gibbs sampler. The
performance of the proposed algorithm is evaluated through simulation results
conducted on synthetic mixture models. The proposed approach is also applied to
the processing of multicomponent chemical mixtures resulting from Raman
spectroscopy.Comment: v4: minor grammatical changes; Signal Processing, 200
SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.
We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/
Conditioned stochastic particle systems and integrable quantum spin systems
We consider from a microscopic perspective large deviation properties of
several stochastic interacting particle systems, using their mapping to
integrable quantum spin systems. A brief review of recent work is given and
several new results are presented: (i) For the general disordered symmectric
exclusion process (SEP) on some finite lattice conditioned on no jumps into
some absorbing sublattice and with initial Bernoulli product measure with
density we prove that the probability of no absorption event
up to microscopic time can be expressed in terms of the generating function
for the particle number of a SEP with particle injection and empty initial
lattice. Specifically, for the symmetric simple exclusion process on conditioned on no jumps into the origin we obtain the explicit first and
second order expansion in of and also to first order in
the optimal microscopic density profile under this conditioning. For the
disordered ASEP on the finite torus conditioned on a very large current we show
that the effective dynamics that optimally realizes this rare event does not
depend on the disorder, except for the time scale. For annihilating and
coalescing random walkers we obtain the generating function of the number of
annihilated particles up to time , which turns out to exhibit some universal
features.Comment: 25 page
- âŠ