98,593 research outputs found
Metric-Based Model Selection For Time-Series Forecasting
Metric-based methods, which use unlabeled data to detect gross differences in behavior away from the training points, have recently been introduced for model selection, often yielding very significant improvements over alternatives (including cross-validation). We introduce extensions that take advantage of the particular case of time-series data in which the task involves prediction with a horizon "h". The ideas are (i) to use at "t" the "h" unlabeled examples that precede "t" for model selection, and (ii) take advantage of the different error distributions of cross-validation and the metric methods. Experimental results establish the effectiveness of these extensions in the context of feature subset selection. Les méthodes métriques, et qui utilisent des données non-étiquetées pour détecter les différences brutes pour les comportements loin des pointes d'entrainement, ont été récemment introduites pour la sélection de modèles, apportant une amélioration dans beaucoup de cas (incluant la validation croisée). Nous présentons des prolongements à ces méthodes qui prennent avantage du cas particulier des séries temporelles pour lesquelles la tâche consiste en une prédiction avec un horizon "h". Les idées sont (i) d'utiliser au temps "t" les "h" exemples non-étiquetés qui précèdent "t", et (ii) profiter des différentes distributions d'erreur de validation croisée et de méthodes métriques. Des résultats expérimentaux établissent l'efficacité de ces prolongements dans le contexte de la sélection d'un sous-ensemble de caractéristiques.Unlabeled data, model selection, time-series, Données non-étiquetées, sélection de modèles, séries temporelles
Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics
In a wide range of statistical learning problems such as ranking, clustering
or metric learning among others, the risk is accurately estimated by
-statistics of degree , i.e. functionals of the training data with
low variance that take the form of averages over -tuples. From a
computational perspective, the calculation of such statistics is highly
expensive even for a moderate sample size , as it requires averaging
terms. This makes learning procedures relying on the optimization of
such data functionals hardly feasible in practice. It is the major goal of this
paper to show that, strikingly, such empirical risks can be replaced by
drastically computationally simpler Monte-Carlo estimates based on terms
only, usually referred to as incomplete -statistics, without damaging the
learning rate of Empirical Risk Minimization (ERM)
procedures. For this purpose, we establish uniform deviation results describing
the error made when approximating a -process by its incomplete version under
appropriate complexity assumptions. Extensions to model selection, fast rate
situations and various sampling techniques are also considered, as well as an
application to stochastic gradient descent for ERM. Finally, numerical examples
are displayed in order to provide strong empirical evidence that the approach
we promote largely surpasses more naive subsampling techniques.Comment: To appear in Journal of Machine Learning Research. 34 pages. v2:
minor correction to Theorem 4 and its proof, added 1 reference. v3: typo
corrected in Proposition 3. v4: improved presentation, added experiments on
model selection for clustering, fixed minor typo
Practical Provably Secure Multi-node Communication
We present a practical and provably-secure multimode communication scheme in
the presence of a passive eavesdropper. The scheme is based on a random
scheduling approach that hides the identity of the transmitter from the
eavesdropper. This random scheduling leads to ambiguity at the eavesdropper
with regard to the origin of the transmitted frame. We present the details of
the technique and analyze it to quantify the secrecy-fairness-overhead
trade-off. Implementation of the scheme over Crossbow Telosb motes, equipped
with CC2420 radio chips, shows that the scheme can achieve significant secrecy
gain with vanishing outage probability. In addition, it has significant
overhead advantage over direct extensions to two-nodes schemes. The technique
also has the advantage of allowing inactive nodes to leverage sleep mode to
further save energy.Comment: Proceedings of the IEEE International Conference on Computing,
Networking and Communications (ICNC 2014
Closed-Loop Statistical Verification of Stochastic Nonlinear Systems Subject to Parametric Uncertainties
This paper proposes a statistical verification framework using Gaussian
processes (GPs) for simulation-based verification of stochastic nonlinear
systems with parametric uncertainties. Given a small number of stochastic
simulations, the proposed framework constructs a GP regression model and
predicts the system's performance over the entire set of possible
uncertainties. Included in the framework is a new metric to estimate the
confidence in those predictions based on the variance of the GP's cumulative
distribution function. This variance-based metric forms the basis of active
sampling algorithms that aim to minimize prediction error through careful
selection of simulations. In three case studies, the new active sampling
algorithms demonstrate up to a 35% improvement in prediction error over other
approaches and are able to correctly identify regions with low prediction
confidence through the variance metric.Comment: 8 pages, submitted to ACC 201
- …