82 research outputs found
FADO: A Deterministic Detection/Learning Algorithm
This paper proposes and studies a detection technique for adversarial
scenarios (dubbed deterministic detection). This technique provides an
alternative detection methodology in case the usual stochastic methods are not
applicable: this can be because the studied phenomenon does not follow a
stochastic sampling scheme, samples are high-dimensional and subsequent
multiple-testing corrections render results overly conservative, sample sizes
are too low for asymptotic results (as e.g. the central limit theorem) to kick
in, or one cannot allow for the small probability of failure inherent to
stochastic approaches. This paper instead designs a method based on insights
from machine learning and online learning theory: this detection algorithm -
named Online FAult Detection (FADO) - comes with theoretical guarantees of its
detection capabilities. A version of the margin is found to regulate the
detection performance of FADO. A precise expression is derived for bounding the
performance, and experimental results are presented assessing the influence of
involved quantities. A case study of scene detection is used to illustrate the
approach. The technology is closely related to the linear perceptron rule,
inherits its computational attractiveness and flexibility towards various
extensions
MINLIP for the Identification of Monotone Wiener Systems
This paper studies the MINLIP estimator for the identification of Wiener
systems consisting of a sequence of a linear FIR dynamical model, and a
monotonically increasing (or decreasing) static function. Given
observations, this algorithm boils down to solving a convex quadratic program
with variables and inequality constraints, implementing an inference
technique which is based entirely on model complexity control. The resulting
estimates of the linear submodel are found to be almost consistent when no
noise is present in the data, under a condition of smoothness of the true
nonlinearity and local Persistency of Excitation (local PE) of the data. This
result is novel as it does not rely on classical tools as a 'linearization'
using a Taylor decomposition, nor exploits stochastic properties of the data.
It is indicated how to extend the method to cope with noisy data, and empirical
evidence contrasts performance of the estimator against other recently proposed
techniques
On the Nuclear Norm heuristic for a Hankel matrix Recovery Problem
This note addresses the question if and why the nuclear norm heuristic can
recover an impulse response generated by a stable single-real-pole system, if
elements of the upper-triangle of the associated Hankel matrix were given.
Since the setting is deterministic, theories based on stochastic assumptions
for low-rank matrix recovery do not apply here. A 'certificate' which
guarantees the completion is constructed by exploring the structural
information of the hidden matrix. Experimental results and discussions
regarding the nuclear norm heuristic applied to a more general setting are also
given
Sparse Estimation From Noisy Observations of an Overdetermined Linear System
This note studies a method for the efficient estimation of a finite number of
unknown parameters from linear equations, which are perturbed by Gaussian
noise.
In case the unknown parameters have only few nonzero entries, the proposed
estimator performs more efficiently than a traditional approach.
The method consists of three steps:
(1) a classical Least Squares Estimate (LSE),
(2) the support is recovered through a Linear Programming (LP) optimization
problem which can be computed using a soft-thresholding step,
(3) a de-biasing step using a LSE on the estimated support set.
The main contribution of this note is a formal derivation of an associated
ORACLE property of the final estimate.
That is, when the number of samples is large enough, the estimate is shown to
equal the LSE based on the support of the {\em true} parameters.Comment: This paper is provisionally accepted by Automatic
On the Randomized Kaczmarz Algorithm
The Randomized Kaczmarz Algorithm is a randomized method which aims at
solving a consistent system of over determined linear equations. This note
discusses how to find an optimized randomization scheme for this algorithm,
which is related to the question raised by \cite{c2}. Illustrative experiments
are conducted to support the findings.Comment: This paper will appear in IEEE Signal processing letters, vol. 21,
no. 3, March 201
A machine-learning approach to measuring the escape of ionizing radiation from galaxies in the reionization epoch
Recent observations of galaxies at , along with the low value of
the electron scattering optical depth measured by the Planck mission, make
galaxies plausible as dominant sources of ionizing photons during the epoch of
reionization. However, scenarios of galaxy-driven reionization hinge on the
assumption that the average escape fraction of ionizing photons is
significantly higher for galaxies in the reionization epoch than in the local
Universe. The NIRSpec instrument on the James Webb Space Telescope (JWST) will
enable spectroscopic observations of large samples of reionization-epoch
galaxies. While the leakage of ionizing photons will not be directly measurable
from these spectra, the leakage is predicted to have an indirect effect on the
spectral slope and the strength of nebular emission lines in the rest-frame
ultraviolet and optical. Here, we apply a machine learning technique known as
lasso regression on mock JWST/NIRSpec observations of simulated galaxies
in order to obtain a model that can predict the escape fraction from
JWST/NIRSpec data. Barring systematic biases in the simulated spectra, our
method is able to retrieve the escape fraction with a mean absolute error of
for spectra with at a
rest-frame wavelength of 1500 {\AA} for our fiducial simulation. This
prediction accuracy represents a significant improvement over previous similar
approaches.Comment: 13 pages, 11 figures. Accepted for publication in Ap
Support and Quantile Tubes
This correspondence studies an estimator of the conditional support of a
distribution underlying a set of i.i.d. observations. The relation with mutual
information is shown via an extension of Fano's theorem in combination with a
generalization bound based on a compression argument. Extensions to estimating
the conditional quantile interval, and statistical guarantees on the minimal
convex hull are given
Componentwise Least Squares Support Vector Machines
This chapter describes componentwise Least Squares Support Vector Machines
(LS-SVMs) for the estimation of additive models consisting of a sum of
nonlinear components. The primal-dual derivations characterizing LS-SVMs for
the estimation of the additive model result in a single set of linear equations
with size growing in the number of data-points. The derivation is elaborated
for the classification as well as the regression case. Furthermore, different
techniques are proposed to discover structure in the data by looking for sparse
components in the model based on dedicated regularization schemes on the one
hand and fusion of the componentwise LS-SVMs training with a validation
criterion on the other hand. (keywords: LS-SVMs, additive models,
regularization, structure detection)Comment: 22 pages. Accepted for publication in Support Vector Machines: Theory
and Applications, ed. L. Wang, 200
An efficient method for sorting and selecting for social behaviour
In this article we provide a systematic experimental method for sorting
animals according to socially relevant traits, without assaying them or even
tagging them individually. Instead, they are repeatedly subjected to
behavioural assays in groups, between which the group memberships are
rearranged, in order to test the effect of many different combinations of
individuals on a group-level property or feature. We analyse this method using
a general model for the group feature, and simulate a variety of specific cases
to track how individuals are sorted in each case. We find that in the case
where the members of a group contribute equally to the group feature, the
sorting procedure increases the between-group behavioural variation well above
what is expected for groups randomly sampled from a population. For a wide
class of group feature models, the individual phenotypes are efficiently sorted
across the groups and thus become available for further analysis on how
individual properties affect group behaviour. We also show that the
experimental data can be used to estimate the individual-level repeatability of
the underlying traits.Comment: 16 pages, 3 figures + supplementary information (3 pages
Identifying reionization-epoch galaxies with extreme levels of Lyman continuum leakage in James Webb Space Telescope surveys
The James Webb Space Telescope (JWST) NIRSpec instrument will allow
rest-frame ultraviolet/optical spectroscopy of galaxies in the epoch of
reionization (EoR). Some galaxies may exhibit significant leakage of
hydrogen-ionizing photons into the intergalactic medium, resulting in faint
nebular emission lines. We present a machine learning framework for identifying
cases of very high hydrogen-ionizing photon escape from galaxies based on the
data quality expected from potential NIRSpec observations of EoR galaxies in
lensed fields. We train our algorithm on mock samples of JWST/NIRSpec data for
galaxies at redshifts --10. To make the samples more realistic, we combine
synthetic galaxy spectra based on cosmological galaxy simulations with
observational noise relevant for objects of a brightness similar
to EoR galaxy candidates uncovered in Frontier Fields observations of galaxy
cluster Abell-2744 and MACS-J0416. We find that ionizing escape fractions
() of galaxies brighter than
mag may be retrieved with mean absolute error 0.09(0.12) for 24h (1.5h) JWST/NIRSpec exposures at
resolution R=100. For 24h exposure time, even fainter galaxies
( mag) can be processed with 0.14. This framework simultaneously estimates the
redshift of these galaxies with a relative error less than 0.03 for both 24h
( mag) and 1.5h ( mag)
exposure times. We also consider scenarios where just a minor fraction of
galaxies attain high and present the conditions required for
detecting a subpopulation of high galaxies within the dataset.Comment: 10 pages, 7 figures. Accepted to be published in MNRA
- …