64,424 research outputs found

    Statistical properties of neutral evolution

    Full text link
    Neutral evolution is the simplest model of molecular evolution and thus it is most amenable to a comprehensive theoretical investigation. In this paper, we characterize the statistical properties of neutral evolution of proteins under the requirement that the native state remains thermodynamically stable, and compare them to the ones of Kimura's model of neutral evolution. Our study is based on the Structurally Constrained Neutral (SCN) model which we recently proposed. We show that, in the SCN model, the substitution rate decreases as longer time intervals are considered, and fluctuates strongly from one branch of the evolutionary tree to another, leading to a non-Poissonian statistics for the substitution process. Such strong fluctuations are also due to the fact that neutral substitution rates for individual residues are strongly correlated for most residue pairs. Interestingly, structurally conserved residues, characterized by a much below average substitution rate, are also much less correlated to other residues and evolve in a much more regular way. Our results could improve methods aimed at distinguishing between neutral and adaptive substitutions as well as methods for computing the expected number of substitutions occurred since the divergence of two protein sequences.Comment: 17 pages, 11 figure

    Prediction of Atomization Energy Using Graph Kernel and Active Learning

    Get PDF
    Data-driven prediction of molecular properties presents unique challenges to the design of machine learning methods concerning data structure/dimensionality, symmetry adaption, and confidence management. In this paper, we present a kernel-based pipeline that can learn and predict the atomization energy of molecules with high accuracy. The framework employs Gaussian process regression to perform predictions based on the similarity between molecules, which is computed using the marginalized graph kernel. To apply the marginalized graph kernel, a spatial adjacency rule is first employed to convert molecules into graphs whose vertices and edges are labeled by elements and interatomic distances, respectively. We then derive formulas for the efficient evaluation of the kernel. Specific functional components for the marginalized graph kernel are proposed, while the effect of the associated hyperparameters on accuracy and predictive confidence are examined. We show that the graph kernel is particularly suitable for predicting extensive properties because its convolutional structure coincides with that of the covariance formula between sums of random variables. Using an active learning procedure, we demonstrate that the proposed method can achieve a mean absolute error of 0.62 +- 0.01 kcal/mol using as few as 2000 training samples on the QM7 data set

    Graph-Based Change-Point Detection

    Get PDF
    We consider the testing and estimation of change-points -- locations where the distribution abruptly changes -- in a data sequence. A new approach, based on scan statistics utilizing graphs representing the similarity between observations, is proposed. The graph-based approach is non-parametric, and can be applied to any data set as long as an informative similarity measure on the sample space can be defined. Accurate analytic approximations to the significance of graph-based scan statistics for both the single change-point and the changed interval alternatives are provided. Simulations reveal that the new approach has better power than existing approaches when the dimension of the data is moderate to high. The new approach is illustrated on two applications: The determination of authorship of a classic novel, and the detection of change in a network over time
    • …
    corecore