1,645 research outputs found
LOCA: LOcal Conformal Autoencoder for standardized data coordinates
We propose a deep-learning based method for obtaining standardized data
coordinates from scientific measurements.Data observations are modeled as
samples from an unknown, non-linear deformation of an underlying Riemannian
manifold, which is parametrized by a few normalized latent variables. By
leveraging a repeated measurement sampling strategy, we present a method for
learning an embedding in that is isometric to the latent
variables of the manifold. These data coordinates, being invariant under smooth
changes of variables, enable matching between different instrumental
observations of the same phenomenon. Our embedding is obtained using a LOcal
Conformal Autoencoder (LOCA), an algorithm that constructs an embedding to
rectify deformations by using a local z-scoring procedure while preserving
relevant geometric information. We demonstrate the isometric embedding
properties of LOCA on various model settings and observe that it exhibits
promising interpolation and extrapolation capabilities. Finally, we apply LOCA
to single-site Wi-Fi localization data, and to -dimensional curved surface
estimation based on a -dimensional projection
An Emulator Toolbox to Approximate Radiative Transfer Models with Statistical Learning
Physically-based radiative transfer models (RTMs) help in understanding the processes occurring on the Earth’s surface and their interactions with vegetation and atmosphere. When it comes to studying vegetation properties, RTMs allows us to study light interception by plant canopies and are used in the retrieval of biophysical variables through model inversion. However, advanced RTMs can take a long computational time, which makes them unfeasible in many real applications. To overcome this problem, it has been proposed to substitute RTMs through so-called emulators. Emulators are statistical models that approximate the functioning of RTMs. Emulators are advantageous in real practice because of the computational efficiency and excellent accuracy and flexibility for extrapolation. We hereby present an “Emulator toolbox” that enables analysing multi-output machine learning regression algorithms (MO-MLRAs) on their ability to approximate an RTM. The toolbox is included in the free-access ARTMO’s MATLAB suite for parameter retrieval and model inversion and currently contains both linear and non-linear MO-MLRAs, namely partial least squares regression (PLSR), kernel ridge regression (KRR) and neural networks (NN). These MO-MLRAs have been evaluated on their precision and speed to approximate the soil vegetation atmosphere transfer model SCOPE (Soil Canopy Observation, Photochemistry and Energy balance). SCOPE generates, amongst others, sun-induced chlorophyll fluorescence as the output signal. KRR and NN were evaluated as capable of reconstructing fluorescence spectra with great precision. Relative errors fell below 0.5% when trained with 500 or more samples using cross-validation and principal component analysis to alleviate the underdetermination problem. Moreover, NN reconstructed fluorescence spectra about 50-times faster and KRR about 800-times faster than SCOPE. The Emulator toolbox is foreseen to open new opportunities in the use of advanced RTMs, in which both consistent physical assumptions and data-driven machine learning algorithms live together
Approximating Likelihood Ratios with Calibrated Discriminative Classifiers
In many fields of science, generalized likelihood ratio tests are established
tools for statistical inference. At the same time, it has become increasingly
common that a simulator (or generative model) is used to describe complex
processes that tie parameters of an underlying theory and measurement
apparatus to high-dimensional observations .
However, simulator often do not provide a way to evaluate the likelihood
function for a given observation , which motivates a new class of
likelihood-free inference algorithms. In this paper, we show that likelihood
ratios are invariant under a specific class of dimensionality reduction maps
. As a direct consequence, we show that
discriminative classifiers can be used to approximate the generalized
likelihood ratio statistic when only a generative model for the data is
available. This leads to a new machine learning-based approach to
likelihood-free inference that is complementary to Approximate Bayesian
Computation, and which does not require a prior on the model parameters.
Experimental results on artificial problems with known exact likelihoods
illustrate the potential of the proposed method.Comment: 35 pages, 5 figure
Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds
In this paper we address the problems of modeling the acoustic space
generated by a full-spectrum sound source and of using the learned model for
the localization and separation of multiple sources that simultaneously emit
sparse-spectrum sounds. We lay theoretical and methodological grounds in order
to introduce the binaural manifold paradigm. We perform an in-depth study of
the latent low-dimensional structure of the high-dimensional interaural
spectral data, based on a corpus recorded with a human-like audiomotor robot
head. A non-linear dimensionality reduction technique is used to show that
these data lie on a two-dimensional (2D) smooth manifold parameterized by the
motor states of the listener, or equivalently, the sound source directions. We
propose a probabilistic piecewise affine mapping model (PPAM) specifically
designed to deal with high-dimensional data exhibiting an intrinsic piecewise
linear structure. We derive a closed-form expectation-maximization (EM)
procedure for estimating the model parameters, followed by Bayes inversion for
obtaining the full posterior density function of a sound source direction. We
extend this solution to deal with missing data and redundancy in real world
spectrograms, and hence for 2D localization of natural sound sources such as
speech. We further generalize the model to the challenging case of multiple
sound sources and we propose a variational EM framework. The associated
algorithm, referred to as variational EM for source separation and localization
(VESSL) yields a Bayesian estimation of the 2D locations and time-frequency
masks of all the sources. Comparisons of the proposed approach with several
existing methods reveal that the combination of acoustic-space learning with
Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table
Forecasting Time Series with VARMA Recursions on Graphs
Graph-based techniques emerged as a choice to deal with the dimensionality
issues in modeling multivariate time series. However, there is yet no complete
understanding of how the underlying structure could be exploited to ease this
task. This work provides contributions in this direction by considering the
forecasting of a process evolving over a graph. We make use of the
(approximate) time-vertex stationarity assumption, i.e., timevarying graph
signals whose first and second order statistical moments are invariant over
time and correlated to a known graph topology. The latter is combined with VAR
and VARMA models to tackle the dimensionality issues present in predicting the
temporal evolution of multivariate time series. We find out that by projecting
the data to the graph spectral domain: (i) the multivariate model estimation
reduces to that of fitting a number of uncorrelated univariate ARMA models and
(ii) an optimal low-rank data representation can be exploited so as to further
reduce the estimation costs. In the case that the multivariate process can be
observed at a subset of nodes, the proposed models extend naturally to Kalman
filtering on graphs allowing for optimal tracking. Numerical experiments with
both synthetic and real data validate the proposed approach and highlight its
benefits over state-of-the-art alternatives.Comment: submitted to the IEEE Transactions on Signal Processin
Slepian Beamforming: Broadband Beamforming using Streaming Least Squares
In this paper we revisit the classical problem of estimating a signal as it
impinges on a multi-sensor array. We focus on the case where the impinging
signal's bandwidth is appreciable and is operating in a broadband regime.
Estimating broadband signals, often termed broadband (or wideband) beamforming,
is traditionally done through filter and summation, true time delay, or a
coupling of the two. Our proposed method deviates substantially from these
paradigms in that it requires no notion of filtering or true time delay. We use
blocks of samples taken directly from the sensor outputs to fit a robust
Slepian subspace model using a least squares approach. We then leverage this
model to estimate uniformly spaced samples of the impinging signal. Alongside a
careful discussion of this model and how to choose its parameters we show how
to fit the model to new blocks of samples as they are received, producing a
streaming output. We then go on to show how this method naturally extends to
adaptive beamforming scenarios, where we leverage signal statistics to
attenuate interfering sources. Finally, we discuss how to use our model to
estimate from dimensionality reducing measurements. Accompanying these
discussions are extensive numerical experiments establishing that our method
outperforms existing filter based approaches while being comparable in terms of
computational complexity
- …