98,593 research outputs found

    Metric-Based Model Selection For Time-Series Forecasting

    Get PDF
    Metric-based methods, which use unlabeled data to detect gross differences in behavior away from the training points, have recently been introduced for model selection, often yielding very significant improvements over alternatives (including cross-validation). We introduce extensions that take advantage of the particular case of time-series data in which the task involves prediction with a horizon "h". The ideas are (i) to use at "t" the "h" unlabeled examples that precede "t" for model selection, and (ii) take advantage of the different error distributions of cross-validation and the metric methods. Experimental results establish the effectiveness of these extensions in the context of feature subset selection. Les méthodes métriques, et qui utilisent des données non-étiquetées pour détecter les différences brutes pour les comportements loin des pointes d'entrainement, ont été récemment introduites pour la sélection de modèles, apportant une amélioration dans beaucoup de cas (incluant la validation croisée). Nous présentons des prolongements à ces méthodes qui prennent avantage du cas particulier des séries temporelles pour lesquelles la tâche consiste en une prédiction avec un horizon "h". Les idées sont (i) d'utiliser au temps "t" les "h" exemples non-étiquetés qui précèdent "t", et (ii) profiter des différentes distributions d'erreur de validation croisée et de méthodes métriques. Des résultats expérimentaux établissent l'efficacité de ces prolongements dans le contexte de la sélection d'un sous-ensemble de caractéristiques.Unlabeled data, model selection, time-series, Données non-étiquetées, sélection de modèles, séries temporelles

    Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics

    Get PDF
    In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by UU-statistics of degree d1d\geq 1, i.e. functionals of the training data with low variance that take the form of averages over kk-tuples. From a computational perspective, the calculation of such statistics is highly expensive even for a moderate sample size nn, as it requires averaging O(nd)O(n^d) terms. This makes learning procedures relying on the optimization of such data functionals hardly feasible in practice. It is the major goal of this paper to show that, strikingly, such empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n)O(n) terms only, usually referred to as incomplete UU-statistics, without damaging the OP(1/n)O_{\mathbb{P}}(1/\sqrt{n}) learning rate of Empirical Risk Minimization (ERM) procedures. For this purpose, we establish uniform deviation results describing the error made when approximating a UU-process by its incomplete version under appropriate complexity assumptions. Extensions to model selection, fast rate situations and various sampling techniques are also considered, as well as an application to stochastic gradient descent for ERM. Finally, numerical examples are displayed in order to provide strong empirical evidence that the approach we promote largely surpasses more naive subsampling techniques.Comment: To appear in Journal of Machine Learning Research. 34 pages. v2: minor correction to Theorem 4 and its proof, added 1 reference. v3: typo corrected in Proposition 3. v4: improved presentation, added experiments on model selection for clustering, fixed minor typo

    Practical Provably Secure Multi-node Communication

    Full text link
    We present a practical and provably-secure multimode communication scheme in the presence of a passive eavesdropper. The scheme is based on a random scheduling approach that hides the identity of the transmitter from the eavesdropper. This random scheduling leads to ambiguity at the eavesdropper with regard to the origin of the transmitted frame. We present the details of the technique and analyze it to quantify the secrecy-fairness-overhead trade-off. Implementation of the scheme over Crossbow Telosb motes, equipped with CC2420 radio chips, shows that the scheme can achieve significant secrecy gain with vanishing outage probability. In addition, it has significant overhead advantage over direct extensions to two-nodes schemes. The technique also has the advantage of allowing inactive nodes to leverage sleep mode to further save energy.Comment: Proceedings of the IEEE International Conference on Computing, Networking and Communications (ICNC 2014

    Closed-Loop Statistical Verification of Stochastic Nonlinear Systems Subject to Parametric Uncertainties

    Full text link
    This paper proposes a statistical verification framework using Gaussian processes (GPs) for simulation-based verification of stochastic nonlinear systems with parametric uncertainties. Given a small number of stochastic simulations, the proposed framework constructs a GP regression model and predicts the system's performance over the entire set of possible uncertainties. Included in the framework is a new metric to estimate the confidence in those predictions based on the variance of the GP's cumulative distribution function. This variance-based metric forms the basis of active sampling algorithms that aim to minimize prediction error through careful selection of simulations. In three case studies, the new active sampling algorithms demonstrate up to a 35% improvement in prediction error over other approaches and are able to correctly identify regions with low prediction confidence through the variance metric.Comment: 8 pages, submitted to ACC 201
    corecore