Search CORE

2,968 research outputs found

Prediction of Atomization Energy Using Graph Kernel and Active Learning

Author: de Jong Wibe A.
Tang Yu-Hang
Publication venue: 'AIP Publishing'
Publication date: 01/01/2019
Field of study

Data-driven prediction of molecular properties presents unique challenges to the design of machine learning methods concerning data structure/dimensionality, symmetry adaption, and confidence management. In this paper, we present a kernel-based pipeline that can learn and predict the atomization energy of molecules with high accuracy. The framework employs Gaussian process regression to perform predictions based on the similarity between molecules, which is computed using the marginalized graph kernel. To apply the marginalized graph kernel, a spatial adjacency rule is first employed to convert molecules into graphs whose vertices and edges are labeled by elements and interatomic distances, respectively. We then derive formulas for the efficient evaluation of the kernel. Specific functional components for the marginalized graph kernel are proposed, while the effect of the associated hyperparameters on accuracy and predictive confidence are examined. We show that the graph kernel is particularly suitable for predicting extensive properties because its convolutional structure coincides with that of the covariance formula between sums of random variables. Using an active learning procedure, we demonstrate that the proposed method can achieve a mean absolute error of 0.62 +- 0.01 kcal/mol using as few as 2000 training samples on the QM7 data set

arXiv.org e-Print Archive

eScholarship - University of California

Smoothing Hazard Functions and Time-Varying Effects in Discrete Duration and Competing Risks Models

Author: Fahrmeir Ludwig
Wagenpfeil Stefan
Publication venue
Publication date: 01/01/1995
Field of study

State space or dynamic approaches to discrete or grouped duration data with competing risks or multiple terminating events allow simultaneous modelling and smooth estimation of hazard functions and time-varying effects in a flexible way. Full Bayesian or posterior mean estimation, using numerical integration techniques or Monte Carlo methods, can become computationally rather demanding or even infeasible for higher dimensions and larger data sets. Therefore, based on previous work on filtering and smoothing for multicategorical time series and longitudinal data, our approach uses posterior mode estimation. Thus we have to maximize posterior densities or, equivalently, a penalized likelihood, which enforces smoothness of hazard functions and time-varying effects by a roughness penalty. Dropping the Bayesian smoothness prior and adopting a nonparametric viewpoint, one might also start directly from maximizing this penalized likelihood. We show how Fisher scoring smoothing iterations can be carried out efficiently by iteratively applying linear Kalman filtering and smoothing to a working model. This algorithm can be combined with an EM-type procedure to estimate unknown smoothing- or hyperparameters. The methods are applied to a larger set of unemployment duration data with one and, in a further analysis, multiple terminating events from the German socio-economic panel GSOEP

CiteSeerX