Free energies as a function of a selected set of collective variables are
commonly computed in molecular simulation and of significant value in
understanding and engineering molecular behavior. These free energy surfaces
are most commonly estimated using variants of histogramming techniques, but
such approaches obscure two important facets of these functions. First, the
empirical observations along the collective variable are defined by an ensemble
of discrete observations and the coarsening of these observations into a
histogram bins incurs unnecessary loss of information. Second, the free energy
surface is itself almost always a continuous function, and its representation
by a histogram introduces inherent approximations due to the discretization. In
this study, we relate the observed discrete observations from biased
simulations to the inferred underlying continuous probability distribution over
the collective variables and derive histogram-free techniques for estimating
this free energy surface. We reformulate free energy surface estimation as
minimization of a Kullback-Leibler divergence between a continuous trial
function and the discrete empirical distribution and show that this is
equivalent to likelihood maximization of a trial function given a set of
sampled data. We then present a fully Bayesian treatment of this formalism,
which enables the incorporation of powerful Bayesian tools such as the
inclusion of regularizing priors, uncertainty quantification, and model
selection techniques. We demonstrate this new formalism in the analysis of
umbrella sampling simulations for the χ torsion of a valine sidechain in
the L99A mutant of T4 lysozyme with benzene bound in the cavity.Comment: 24 pages, 5 figure