465 research outputs found
A Causal, Data-Driven Approach to Modeling the Kepler Data
Astronomical observations are affected by several kinds of noise, each with
its own causal source; there is photon noise, stochastic source variability,
and residuals coming from imperfect calibration of the detector or telescope.
The precision of NASA Kepler photometry for exoplanet science---the most
precise photometric measurements of stars ever made---appears to be limited by
unknown or untracked variations in spacecraft pointing and temperature, and
unmodeled stellar variability. Here we present the Causal Pixel Model (CPM) for
Kepler data, a data-driven model intended to capture variability but preserve
transit signals. The CPM works at the pixel level so that it can capture very
fine-grained information about the variation of the spacecraft. The CPM
predicts each target pixel value from a large number of pixels of other stars
sharing the instrument variabilities while not containing any information on
possible transits in the target star. In addition, we use the target star's
future and past (auto-regression). By appropriately separating, for each data
point, the data into training and test sets, we ensure that information about
any transit will be perfectly isolated from the model. The method has four
hyper-parameters (the number of predictor stars, the auto-regressive window
size, and two L2-regularization amplitudes for model components), which we set
by cross-validation. We determine a generic set of hyper-parameters that works
well for most of the stars and apply the method to a corresponding set of
target stars. We find that we can consistently outperform (for the purposes of
exoplanet detection) the Kepler Pre-search Data Conditioning (PDC) method for
exoplanet discovery.Comment: Accepted for publication in the PAS
Center-surround filters emerge from optimizing predictivity in a free-viewing task
In which way do the local image statistics at the center of gaze differ from those at randomly chosen image locations? In 1999, Reinagel and Zador [1] showed that RMS contrast is significantly increased around fixated locations in natural images. Since then, numerous additional hypotheses have been proposed, based on edge content, entropy, self-information, higher-order statistics, or sophisticated models such as that of Itti and Koch [2]. While these models are rather different in terms of the used image features, they hardly differ in terms of their predictive power. This complicates the question of which bottom-up mechanism actually drives human eye movements. To shed some light on this problem, we analyze the nonlinear receptive fields of an eye movement model which is purely data-driven. It consists of a nonparametric radial basis function network, fitted to human eye movement data. To avoid a bias towards specific image features such as edges or corners, we deliberately chose raw pixel values as the input to our model, not the outputs of some filter bank. The learned model is analyzed by computing its optimal stimuli. It turns our that there are two maximally excitatory stimuli, both of which have center-surround structure, and two maximally inhibitory stimuli which are basically flat. We argue that these can be seen as nonlinear receptive fields of the underlying system. In particular, we show that a small radial basis function network with the optimal stimuli as centers predicts unseen eye movements as precisely as the full model. The fact that center-surround filters emerge from a simple optimality criterion—without any prior assumption that would make them more probable than e.g. edges, corners, or any other configuration of pixels values in a square patch—suggests a special role of these filters in free-viewing of natural images
Removing systematic errors for exoplanet search via latent causes
We describe a method for removing the effect of confounders in order to
reconstruct a latent quantity of interest. The method, referred to as
half-sibling regression, is inspired by recent work in causal inference using
additive noise models. We provide a theoretical justification and illustrate
the potential of the method in a challenging astronomy application.Comment: Extended version of a paper appearing in the Proceedings of the 32nd
International Conference on Machine Learning, Lille, France, 201
Machine Learning for Quantum Mechanical Properties of Atoms in Molecules
We introduce machine learning models of quantum mechanical observables of
atoms in molecules. Instant out-of-sample predictions for proton and carbon
nuclear chemical shifts, atomic core level excitations, and forces on atoms
reach accuracies on par with density functional theory reference. Locality is
exploited within non-linear regression via local atom-centered coordinate
systems. The approach is validated on a diverse set of 9k small organic
molecules. Linear scaling of computational cost in system size is demonstrated
for saturated polymers with up to sub-mesoscale lengths
Joint Kernel Maps
We develop a methodology for solving high dimensional dependency estimation problems between pairs of data types, which is viable in the case where the output of interest has very high dimension, e.g. thousands of dimensions. This is achieved by mapping the objects into continuous or discrete spaces, using joint kernels. Known correlations between input and output can be defined by such kernels, some of which can maintain linearity in the outputs to provide simple (closed form) pre-images. We provide examples of such kernels and empirical results on mass spectrometry prediction and mapping between images
Detecting Generalized Synchronization Between Chaotic Signals: A Kernel-based Approach
A unified framework for analyzing generalized synchronization in coupled
chaotic systems from data is proposed. The key of the proposed approach is the
use of the kernel methods recently developed in the field of machine learning.
Several successful applications are presented, which show the capability of the
kernel-based approach for detecting generalized synchronization. It is also
shown that the dynamical change of the coupling coefficient between two chaotic
systems can be captured by the proposed approach.Comment: 20 pages, 15 figures. massively revised as a full paper; issues on
the choice of parameters by cross validation, tests by surrogated data, etc.
are added as well as additional examples and figure
Towards fully covariant machine learning
Any representation of data involves arbitrary investigator choices. Because
those choices are external to the data-generating process, each choice leads to
an exact symmetry, corresponding to the group of transformations that takes one
possible representation to another. These are the passive symmetries; they
include coordinate freedom, gauge symmetry, and units covariance, all of which
have led to important results in physics. In machine learning, the most visible
passive symmetry is the relabeling or permutation symmetry of graphs. Our goal
is to understand the implications for machine learning of the many passive
symmetries in play. We discuss dos and don'ts for machine learning practice if
passive symmetries are to be respected. We discuss links to causal modeling,
and argue that the implementation of passive symmetries is particularly
valuable when the goal of the learning problem is to generalize out of sample.
This paper is conceptual: It translates among the languages of physics,
mathematics, and machine-learning. We believe that consideration and
implementation of passive symmetries might help machine learning in the same
ways that it transformed physics in the twentieth century.Comment: substantial revision from v1; submitted to TML
Quantifying the Effects of Contact Tracing, Testing, and Containment Measures in the Presence of Infection Hotspots
Multiple lines of evidence strongly suggest that infection hotspots, where a single individual infects many others, play a key role in the transmission dynamics of COVID-19. However, most of the existing epidemiological models fail to capture this aspect by neither representing the sites visited by individuals explicitly nor characterizing disease transmission as a function of individual mobility patterns. In this work, we introduce a temporal point process modeling framework that specifically represents visits to the sites where individuals get in contact and infect each other. Under our model, the number of infections caused by an infectious individual naturally emerges to be overdispersed. Using an efficient sampling algorithm, we demonstrate how to apply Bayesian optimization with longitudinal case data to estimate the transmission rate of infectious individuals at the sites they visit and in their households. Simulations using fine-grained and publicly available demographic data and site locations from Bern, Switzerland showcase the flexibility of our framework. To facilitate research and analyses of other cities and regions, we release an open-source implementation of our framework
Recommended from our members
Improving music genre classification using automatically induced harmony rules
We present a new genre classification framework using both low-level signal-based features and high-level harmony features. A state-of-the-art statistical genre classifier based on timbral features is extended using a first-order random forest containing for each genre rules derived from harmony or chord sequences. This random forest has been automatically induced, using the first-order logic induction algorithm TILDE, from a dataset, in which for each chord the degree and chord category are identified, and covering classical, jazz and pop genre classes. The audio descriptor-based genre classifier contains 206 features, covering spectral, temporal, energy, and pitch characteristics of the audio signal. The fusion of the harmony-based classifier with the extracted feature vectors is tested on three-genre subsets of the GTZAN and ISMIR04 datasets, which contain 300 and 448 recordings, respectively. Machine learning classifiers were tested using 5 × 5-fold cross-validation and feature selection. Results indicate that the proposed harmony-based rules combined with the timbral descriptor-based genre classification system lead to improved genre classification rates
- …