53,304 research outputs found
Uncertainty Quantification Using Neural Networks for Molecular Property Prediction
Uncertainty quantification (UQ) is an important component of molecular
property prediction, particularly for drug discovery applications where model
predictions direct experimental design and where unanticipated imprecision
wastes valuable time and resources. The need for UQ is especially acute for
neural models, which are becoming increasingly standard yet are challenging to
interpret. While several approaches to UQ have been proposed in the literature,
there is no clear consensus on the comparative performance of these models. In
this paper, we study this question in the context of regression tasks. We
systematically evaluate several methods on five benchmark datasets using
multiple complementary performance metrics. Our experiments show that none of
the methods we tested is unequivocally superior to all others, and none
produces a particularly reliable ranking of errors across multiple datasets.
While we believe these results show that existing UQ methods are not sufficient
for all common use-cases and demonstrate the benefits of further research, we
conclude with a practical recommendation as to which existing techniques seem
to perform well relative to others
Mapping the Galaxy Color-Redshift Relation: Optimal Photometric Redshift Calibration Strategies for Cosmology Surveys
Calibrating the photometric redshifts of >10^9 galaxies for upcoming weak
lensing cosmology experiments is a major challenge for the astrophysics
community. The path to obtaining the required spectroscopic redshifts for
training and calibration is daunting, given the anticipated depths of the
surveys and the difficulty in obtaining secure redshifts for some faint galaxy
populations. Here we present an analysis of the problem based on the
self-organizing map, a method of mapping the distribution of data in a
high-dimensional space and projecting it onto a lower-dimensional
representation. We apply this method to existing photometric data from the
COSMOS survey selected to approximate the anticipated Euclid weak lensing
sample, enabling us to robustly map the empirical distribution of galaxies in
the multidimensional color space defined by the expected Euclid filters.
Mapping this multicolor distribution lets us determine where - in galaxy color
space - redshifts from current spectroscopic surveys exist and where they are
systematically missing. Crucially, the method lets us determine whether a
spectroscopic training sample is representative of the full photometric space
occupied by the galaxies in a survey. We explore optimal sampling techniques
and estimate the additional spectroscopy needed to map out the color-redshift
relation, finding that sampling the galaxy distribution in color space in a
systematic way can efficiently meet the calibration requirements. While the
analysis presented here focuses on the Euclid survey, similar analysis can be
applied to other surveys facing the same calibration challenge, such as DES,
LSST, and WFIRST.Comment: ApJ accepted, 17 pages, 10 figure
Conformal Prediction: a Unified Review of Theory and New Challenges
In this work we provide a review of basic ideas and novel developments about
Conformal Prediction -- an innovative distribution-free, non-parametric
forecasting method, based on minimal assumptions -- that is able to yield in a
very straightforward way predictions sets that are valid in a statistical sense
also in in the finite sample case. The in-depth discussion provided in the
paper covers the theoretical underpinnings of Conformal Prediction, and then
proceeds to list the more advanced developments and adaptations of the original
idea.Comment: arXiv admin note: text overlap with arXiv:0706.3188,
arXiv:1604.04173, arXiv:1709.06233, arXiv:1203.5422 by other author
Hedging predictions in machine learning
Recent advances in machine learning make it possible to design efficient
prediction algorithms for data sets with huge numbers of parameters. This paper
describes a new technique for "hedging" the predictions output by many such
algorithms, including support vector machines, kernel ridge regression, kernel
nearest neighbours, and by many other state-of-the-art methods. The hedged
predictions for the labels of new objects include quantitative measures of
their own accuracy and reliability. These measures are provably valid under the
assumption of randomness, traditional in machine learning: the objects and
their labels are assumed to be generated independently from the same
probability distribution. In particular, it becomes possible to control (up to
statistical fluctuations) the number of erroneous predictions by selecting a
suitable confidence level. Validity being achieved automatically, the remaining
goal of hedged prediction is efficiency: taking full account of the new
objects' features and other available information to produce as accurate
predictions as possible. This can be done successfully using the powerful
machinery of modern machine learning.Comment: 24 pages; 9 figures; 2 tables; a version of this paper (with
discussion and rejoinder) is to appear in "The Computer Journal
- …