15,023 research outputs found
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
Spatially proximate amino acids in a protein tend to coevolve. A protein's
three-dimensional (3D) structure hence leaves an echo of correlations in the
evolutionary record. Reverse engineering 3D structures from such correlations
is an open problem in structural biology, pursued with increasing vigor as more
and more protein sequences continue to fill the data banks. Within this task
lies a statistical inference problem, rooted in the following: correlation
between two sites in a protein sequence can arise from firsthand interaction
but can also be network-propagated via intermediate sites; observed correlation
is not enough to guarantee proximity. To separate direct from indirect
interactions is an instance of the general problem of inverse statistical
mechanics, where the task is to learn model parameters (fields, couplings) from
observables (magnetizations, correlations, samples) in large systems. In the
context of protein sequences, the approach has been referred to as
direct-coupling analysis. Here we show that the pseudolikelihood method,
applied to 21-state Potts models describing the statistical properties of
families of evolutionarily related proteins, significantly outperforms existing
approaches to the direct-coupling analysis, the latter being based on standard
mean-field techniques. This improved performance also relies on a modified
score for the coupling strength. The results are verified using known crystal
structures of specific sequence instances of various protein families. Code
implementing the new method can be found at http://plmdca.csc.kth.se/.Comment: 19 pages, 16 figures, published versio
Inverse Statistical Physics of Protein Sequences: A Key Issues Review
In the course of evolution, proteins undergo important changes in their amino
acid sequences, while their three-dimensional folded structure and their
biological function remain remarkably conserved. Thanks to modern sequencing
techniques, sequence data accumulate at unprecedented pace. This provides large
sets of so-called homologous, i.e.~evolutionarily related protein sequences, to
which methods of inverse statistical physics can be applied. Using sequence
data as the basis for the inference of Boltzmann distributions from samples of
microscopic configurations or observables, it is possible to extract
information about evolutionary constraints and thus protein function and
structure. Here we give an overview over some biologically important questions,
and how statistical-mechanics inspired modeling approaches can help to answer
them. Finally, we discuss some open questions, which we expect to be addressed
over the next years.Comment: 18 pages, 7 figure
Theoretical Interpretations and Applications of Radial Basis Function Networks
Medical applications usually used Radial Basis Function Networks just as Artificial Neural Networks. However, RBFNs are Knowledge-Based Networks that can be interpreted in several way: Artificial Neural Networks, Regularization Networks, Support Vector Machines, Wavelet Networks, Fuzzy Controllers, Kernel Estimators, Instanced-Based Learners. A survey of their interpretations and of their corresponding learning algorithms is provided as well as a brief survey on dynamic learning algorithms. RBFNs' interpretations can suggest applications that are particularly interesting in medical domains
Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories
Phylodynamics is an area of population genetics that uses genetic sequence
data to estimate past population dynamics. Modern state-of-the-art Bayesian
nonparametric methods for recovering population size trajectories of unknown
form use either change-point models or Gaussian process priors. Change-point
models suffer from computational issues when the number of change-points is
unknown and needs to be estimated. Gaussian process-based methods lack local
adaptivity and cannot accurately recover trajectories that exhibit features
such as abrupt changes in trend or varying levels of smoothness. We propose a
novel, locally-adaptive approach to Bayesian nonparametric phylodynamic
inference that has the flexibility to accommodate a large class of functional
behaviors. Local adaptivity results from modeling the log-transformed effective
population size a priori as a horseshoe Markov random field, a recently
proposed statistical model that blends together the best properties of the
change-point and Gaussian process modeling paradigms. We use simulated data to
assess model performance, and find that our proposed method results in reduced
bias and increased precision when compared to contemporary methods. We also use
our models to reconstruct past changes in genetic diversity of human hepatitis
C virus in Egypt and to estimate population size changes of ancient and modern
steppe bison. These analyses show that our new method captures features of the
population size trajectories that were missed by the state-of-the-art methods.Comment: 36 pages, including supplementary informatio
Forecasting Long-Term Government Bond Yields: An Application of Statistical and AI Models
This paper evaluates several artificial intelligence and classical algorithms on their ability of forecasting the monthly yield of the US 10-year Treasury bonds from a set of four economic indicators. Due to the complexity of the prediction problem, the task represents a challenging test for the algorithms under evaluation. At the same time, the study is of particular significance for the important and paradigmatic role played by the US market in the world economy. Four data-driven artificial intelligence approaches are considered, namely, a manually built fuzzy logic model, a machine learned fuzzy logic model, a self-organising map model and a multi-layer perceptron model. Their performance is compared with the performance of two classical approaches, namely, a statistical ARIMA model and an econometric error correction model. The algorithms are evaluated on a complete series of end-month US 10-year Treasury bonds yields and economic indicators from 1986:1 to 2004:12. In terms of prediction accuracy and reliability of the modelling procedure, the best results are obtained by the three parametric regression algorithms, namely the econometric, the statistical and the multi-layer perceptron model. Due to the sparseness of the learning data samples, the manual and the automatic fuzzy logic approaches fail to follow with adequate precision the range of variations of the US 10-year Treasury bonds. For similar reasons, the self-organising map model gives an unsatisfactory performance. Analysis of the results indicates that the econometric model has a slight edge over the statistical and the multi-layer perceptron models. This suggests that pure data-driven induction may not fully capture the complicated mechanisms ruling the changes in interest rates. Overall, the prediction accuracy of the best models is only marginally better than the prediction accuracy of a basic one-step lag predictor. This result highlights the difficulty of the modelling task and, in general, the difficulty of building reliable predictors for financial markets.interest rates; forecasting; neural networks; fuzzy logic.
Probabilistic methods in the analysis of protein interaction networks
Imperial Users onl
- …