Search CORE

465 research outputs found

A Causal, Data-Driven Approach to Modeling the Kepler Data

Author: Foreman-Mackey Dan
Hogg David W.
Schölkopf Bernhard
Wang Dun
Publication venue: 'IOP Publishing'
Publication date: 25/04/2016
Field of study

Astronomical observations are affected by several kinds of noise, each with its own causal source; there is photon noise, stochastic source variability, and residuals coming from imperfect calibration of the detector or telescope. The precision of NASA Kepler photometry for exoplanet science---the most precise photometric measurements of stars ever made---appears to be limited by unknown or untracked variations in spacecraft pointing and temperature, and unmodeled stellar variability. Here we present the Causal Pixel Model (CPM) for Kepler data, a data-driven model intended to capture variability but preserve transit signals. The CPM works at the pixel level so that it can capture very fine-grained information about the variation of the spacecraft. The CPM predicts each target pixel value from a large number of pixels of other stars sharing the instrument variabilities while not containing any information on possible transits in the target star. In addition, we use the target star's future and past (auto-regression). By appropriately separating, for each data point, the data into training and test sets, we ensure that information about any transit will be perfectly isolated from the model. The method has four hyper-parameters (the number of predictor stars, the auto-regressive window size, and two L2-regularization amplitudes for model components), which we set by cross-validation. We determine a generic set of hyper-parameters that works well for most of the stars and apply the method to a corresponding set of target stars. We find that we can consistently outperform (for the purposes of exoplanet detection) the Kepler Pre-search Data Conditioning (PDC) method for exoplanet discovery.Comment: Accepted for publication in the PAS

arXiv.org e-Print Archive

MPG.PuRe

Center-surround filters emerge from optimizing predictivity in a free-viewing task

Author: Franz M.
Kienzle W.
Schölkopf B.
Wichmann F.
Publication venue
Publication date: 01/01/2007
Field of study

In which way do the local image statistics at the center of gaze differ from those at randomly chosen image locations? In 1999, Reinagel and Zador [1] showed that RMS contrast is significantly increased around fixated locations in natural images. Since then, numerous additional hypotheses have been proposed, based on edge content, entropy, self-information, higher-order statistics, or sophisticated models such as that of Itti and Koch [2]. While these models are rather different in terms of the used image features, they hardly differ in terms of their predictive power. This complicates the question of which bottom-up mechanism actually drives human eye movements. To shed some light on this problem, we analyze the nonlinear receptive fields of an eye movement model which is purely data-driven. It consists of a nonparametric radial basis function network, fitted to human eye movement data. To avoid a bias towards specific image features such as edges or corners, we deliberately chose raw pixel values as the input to our model, not the outputs of some filter bank. The learned model is analyzed by computing its optimal stimuli. It turns our that there are two maximally excitatory stimuli, both of which have center-surround structure, and two maximally inhibitory stimuli which are basically flat. We argue that these can be seen as nonlinear receptive fields of the underlying system. In particular, we show that a small radial basis function network with the optimal stimuli as centers predicts unseen eye movements as precisely as the full model. The fact that center-surround filters emerge from a simple optimality criterion—without any prior assumption that would make them more probable than e.g. edges, corners, or any other configuration of pixels values in a square patch—suggests a special role of these filters in free-viewing of natural images

CiteSeerX

MPG.PuRe

Removing systematic errors for exoplanet search via latent causes

Author: Foreman-Mackey Daniel
Hogg David W.
Janzing Dominik
Peters Jonas
Schölkopf Bernhard
Simon-Gabriel Carl-Johann
Wang Dun
Publication venue
Publication date: 12/05/2015
Field of study

We describe a method for removing the effect of confounders in order to reconstruct a latent quantity of interest. The method, referred to as half-sibling regression, is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification and illustrate the potential of the method in a challenging astronomy application.Comment: Extended version of a paper appearing in the Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 201

arXiv.org e-Print Archive

MPG.PuRe

Machine Learning for Quantum Mechanical Properties of Atoms in Molecules

Author: Hastie T.
Helgaker T.
Li L.
Matthias Rupp
O. Anatole von Lilienfeld
Ochterski J. W.
Raghunathan Ramakrishnan
Schölkopf B.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2015
Field of study

We introduce machine learning models of quantum mechanical observables of atoms in molecules. Instant out-of-sample predictions for proton and carbon nuclear chemical shifts, atomic core level excitations, and forces on atoms reach accuracies on par with density functional theory reference. Locality is exploited within non-linear regression via local atom-centered coordinate systems. The approach is validated on a diverse set of 9k small organic molecules. Linear scaling of computational cost in system size is demonstrated for saturated polymers with up to sub-mesoscale lengths

arXiv.org e-Print Archive

Joint Kernel Maps

Author: Bousquet O.
Mann
Noble W.
Schölkopf B.
Weston J.
Publication venue: Max Planck Institute for Biological Cybernetics
Publication date: 01/11/2004
Field of study

We develop a methodology for solving high dimensional dependency estimation problems between pairs of data types, which is viable in the case where the output of interest has very high dimension, e.g. thousands of dimensions. This is achieved by mapping the objects into continuous or discrete spaces, using joint kernels. Known correlations between input and output can be defined by such kernels, some of which can maintain linearity in the outputs to provide simple (closed form) pre-images. We provide examples of such kernels and empirical results on mass spectrometry prediction and mapping between images

MPG.PuRe

Detecting Generalized Synchronization Between Chaotic Signals: A Kernel-based Approach

Author: Akaho S
Buja A
Fukumizu K Bach F R Gretton A
Hiromichi Suetani
Kazuyuki Aihara
Melzer T Reiter M Bischof H
Pikovsky A
Schölkopf B
Schölkopf B
Shawe-Taylor J
Silverman B W
Stone M
Suetani H
Wahba G
Yukito Iba
Publication venue: 'IOP Publishing'
Publication date: 14/08/2006
Field of study

A unified framework for analyzing generalized synchronization in coupled chaotic systems from data is proposed. The key of the proposed approach is the use of the kernel methods recently developed in the field of machine learning. Several successful applications are presented, which show the capability of the kernel-based approach for detecting generalized synchronization. It is also shown that the dynamical change of the coupling coefficient between two chaotic systems can be captured by the proposed approach.Comment: 20 pages, 15 figures. massively revised as a full paper; issues on the choice of parameters by cross validation, tests by surrogated data, etc. are added as well as additional examples and figure

arXiv.org e-Print Archive

Crossref

Towards fully covariant machine learning

Author: Hogg David W.
Kevrekidis George A.
Schölkopf Bernhard
Villar Soledad
Yao Weichi
Publication venue
Publication date: 28/06/2023
Field of study

Any representation of data involves arbitrary investigator choices. Because those choices are external to the data-generating process, each choice leads to an exact symmetry, corresponding to the group of transformations that takes one possible representation to another. These are the passive symmetries; they include coordinate freedom, gauge symmetry, and units covariance, all of which have led to important results in physics. In machine learning, the most visible passive symmetry is the relabeling or permutation symmetry of graphs. Our goal is to understand the implications for machine learning of the many passive symmetries in play. We discuss dos and don'ts for machine learning practice if passive symmetries are to be respected. We discuss links to causal modeling, and argue that the implementation of passive symmetries is particularly valuable when the goal of the learning problem is to generalize out of sample. This paper is conceptual: It translates among the languages of physics, mathematics, and machine-learning. We believe that consideration and implementation of passive symmetries might help machine learning in the same ways that it transformed physics in the twentieth century.Comment: substantial revision from v1; submitted to TML

arXiv.org e-Print Archive

Quantifying the Effects of Contact Tracing, Testing, and Containment Measures in the Presence of Infection Hotspots

Author: Gomez Rodriguez M.
Kremer H.
Lorch L.
Schölkopf B.
Szanto A.
Trouleau W.
Tsirtsis S.
Publication venue
Publication date: 01/01/2021
Field of study

Multiple lines of evidence strongly suggest that infection hotspots, where a single individual infects many others, play a key role in the transmission dynamics of COVID-19. However, most of the existing epidemiological models fail to capture this aspect by neither representing the sites visited by individuals explicitly nor characterizing disease transmission as a function of individual mobility patterns. In this work, we introduce a temporal point process modeling framework that specifically represents visits to the sites where individuals get in contact and infect each other. Under our model, the number of infections caused by an infectious individual naturally emerges to be overdispersed. Using an efficient sampling algorithm, we demonstrate how to apply Bayesian optimization with longitudinal case data to estimate the transmission rate of infectious individuals at the sites they visit and in their households. Simulations using fine-grained and publicly available demographic data and site locations from Bern, Switzerland showcase the flexibility of our framework. To facilitate research and analyses of other cities and regions, we release an open-source implementation of our framework

MPG.PuRe

Recommended from our members

Improving music genre classification using automatically induced harmony rules

Author: Amélie Anglade
Aucouturier J.-J.
Cataltepe Z.
Emmanouil Benetos
Fukunaga K.
Lawson C. L.
Matthias Mauch
Piston W.
Pérez-Sancho C.
Quinlan J. R.
Schölkopf B.
Simon Dixon
Tzanetakis G.
van der Hedjen F.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2009
Field of study

We present a new genre classification framework using both low-level signal-based features and high-level harmony features. A state-of-the-art statistical genre classifier based on timbral features is extended using a first-order random forest containing for each genre rules derived from harmony or chord sequences. This random forest has been automatically induced, using the first-order logic induction algorithm TILDE, from a dataset, in which for each chord the degree and chord category are identified, and covering classical, jazz and pop genre classes. The audio descriptor-based genre classifier contains 206 features, covering spectral, temporal, energy, and pitch characteristics of the audio signal. The fusion of the harmony-based classifier with the extracted feature vectors is tested on three-genre subsets of the GTZAN and ISMIR04 datasets, which contain 300 and 448 recordings, respectively. Machine learning classifiers were tested using 5 × 5-fold cross-validation and feature selection. Results indicate that the proposed harmony-based rules combined with the timbral descriptor-based genre classification system lead to improved genre classification rates

City Research Online

Crossref

Ghent University Academic Bibliography

University of Miami: Scholarship Miami

The University of Manchester - Institutional Repository

Radboud Repository