Search CORE

98,593 research outputs found

Metric-Based Model Selection For Time-Series Forecasting

Author: Nicolas Chapados
Yoshua Bengio
Publication venue
Publication date
Field of study

Metric-based methods, which use unlabeled data to detect gross differences in behavior away from the training points, have recently been introduced for model selection, often yielding very significant improvements over alternatives (including cross-validation). We introduce extensions that take advantage of the particular case of time-series data in which the task involves prediction with a horizon "h". The ideas are (i) to use at "t" the "h" unlabeled examples that precede "t" for model selection, and (ii) take advantage of the different error distributions of cross-validation and the metric methods. Experimental results establish the effectiveness of these extensions in the context of feature subset selection. Les méthodes métriques, et qui utilisent des données non-étiquetées pour détecter les différences brutes pour les comportements loin des pointes d'entrainement, ont été récemment introduites pour la sélection de modèles, apportant une amélioration dans beaucoup de cas (incluant la validation croisée). Nous présentons des prolongements à ces méthodes qui prennent avantage du cas particulier des séries temporelles pour lesquelles la tâche consiste en une prédiction avec un horizon "h". Les idées sont (i) d'utiliser au temps "t" les "h" exemples non-étiquetés qui précèdent "t", et (ii) profiter des différentes distributions d'erreur de validation croisée et de méthodes métriques. Des résultats expérimentaux établissent l'efficacité de ces prolongements dans le contexte de la sélection d'un sous-ensemble de caractéristiques.Unlabeled data, model selection, time-series, Données non-étiquetées, sélection de modèles, séries temporelles

Research Papers in Economics

Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics

Author: Bellet Aurélien
Clémençon Stéphan
Colin Igor
Publication venue
Publication date: 01/01/2016
Field of study

In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by

U

-statistics of degree

d\geq 1

, i.e. functionals of the training data with low variance that take the form of averages over

k

-tuples. From a computational perspective, the calculation of such statistics is highly expensive even for a moderate sample size

n

, as it requires averaging

O(n^d)

terms. This makes learning procedures relying on the optimization of such data functionals hardly feasible in practice. It is the major goal of this paper to show that, strikingly, such empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on

O(n)

terms only, usually referred to as incomplete

U

-statistics, without damaging the

O_{\mathbb{P}}(1/\sqrt{n})

learning rate of Empirical Risk Minimization (ERM) procedures. For this purpose, we establish uniform deviation results describing the error made when approximating a

U

-process by its incomplete version under appropriate complexity assumptions. Extensions to model selection, fast rate situations and various sampling techniques are also considered, as well as an application to stochastic gradient descent for ERM. Finally, numerical examples are displayed in order to provide strong empirical evidence that the approach we promote largely surpasses more naive subsampling techniques.Comment: To appear in Journal of Machine Learning Research. 34 pages. v2: minor correction to Theorem 4 and its proof, added 1 reference. v3: typo corrected in Proposition 3. v4: improved presentation, added experiments on model selection for clustering, fixed minor typo

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Practical Provably Secure Multi-node Communication

Author: Ali Omar
Ayoub Mahmoud F.
Youssef Moustafa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/10/2013
Field of study

We present a practical and provably-secure multimode communication scheme in the presence of a passive eavesdropper. The scheme is based on a random scheduling approach that hides the identity of the transmitter from the eavesdropper. This random scheduling leads to ambiguity at the eavesdropper with regard to the origin of the transmitted frame. We present the details of the technique and analyze it to quantify the secrecy-fairness-overhead trade-off. Implementation of the scheme over Crossbow Telosb motes, equipped with CC2420 radio chips, shows that the scheme can achieve significant secrecy gain with vanishing outage probability. In addition, it has significant overhead advantage over direct extensions to two-nodes schemes. The technique also has the advantage of allowing inactive nodes to leverage sleep mode to further save energy.Comment: Proceedings of the IEEE International Conference on Computing, Networking and Communications (ICNC 2014

arXiv.org e-Print Archive

Crossref

Closed-Loop Statistical Verification of Stochastic Nonlinear Systems Subject to Parametric Uncertainties

Author: ang
bishop
clarke
desautels
elliott
gotovos
kozarev
kulesza
maler
quindlen
rasmussen
tipping
topcu
Publication venue
Publication date: 01/10/2017
Field of study

This paper proposes a statistical verification framework using Gaussian processes (GPs) for simulation-based verification of stochastic nonlinear systems with parametric uncertainties. Given a small number of stochastic simulations, the proposed framework constructs a GP regression model and predicts the system's performance over the entire set of possible uncertainties. Included in the framework is a new metric to estimate the confidence in those predictions based on the variance of the GP's cumulative distribution function. This variance-based metric forms the basis of active sampling algorithms that aim to minimize prediction error through careful selection of simulations. In three case studies, the new active sampling algorithms demonstrate up to a 35% improvement in prediction error over other approaches and are able to correctly identify regions with low prediction confidence through the variance metric.Comment: 8 pages, submitted to ACC 201

arXiv.org e-Print Archive

Crossref

DSpace@MIT