55,769 research outputs found

    On Model Selection in Cosmology

    Full text link
    We review some of the common methods for model selection: the goodness of fit, the likelihood ratio test, Bayesian model selection using Bayes factors, and the classical as well as the Bayesian information theoretic approaches. We illustrate these different approaches by comparing models for the expansion history of the Universe. In the discussion we highlight the premises and objectives entering these different approaches to model selection and finally recommend the information theoretic approach.Comment: expanded version 26 pages, 12 figure

    Information Theoretic Feature Transformation Learning for Brain Interfaces

    Full text link
    Objective: A variety of pattern analysis techniques for model training in brain interfaces exploit neural feature dimensionality reduction based on feature ranking and selection heuristics. In the light of broad evidence demonstrating the potential sub-optimality of ranking based feature selection by any criterion, we propose to extend this focus with an information theoretic learning driven feature transformation concept. Methods: We present a maximum mutual information linear transformation (MMI-LinT), and a nonlinear transformation (MMI-NonLinT) framework derived by a general definition of the feature transformation learning problem. Empirical assessments are performed based on electroencephalographic (EEG) data recorded during a four class motor imagery brain-computer interface (BCI) task. Exploiting state-of-the-art methods for initial feature vector construction, we compare the proposed approaches with conventional feature selection based dimensionality reduction techniques which are widely used in brain interfaces. Furthermore, for the multi-class problem, we present and exploit a hierarchical graphical model based BCI decoding system. Results: Both binary and multi-class decoding analyses demonstrate significantly better performances with the proposed methods. Conclusion: Information theoretic feature transformations are capable of tackling potential confounders of conventional approaches in various settings. Significance: We argue that this concept provides significant insights to extend the focus on feature selection heuristics to a broader definition of feature transformation learning in brain interfaces.Comment: Accepted for publication by IEEE Transactions on Biomedical Engineerin

    Statistical and Information Theoretic Approaches to Model Selection and Averaging

    Get PDF
    In this thesis we consider model selection (MS) and its alternative, model averaging (MA), in seven research articles and in an introductory summary of the articles. The utilization of the minimum description length (MDL) principle is a common theme in five articles. In three articles we approach MA by estimating model weights using MDL and by making use of the idea of shrinkage estimation with special emphasis on the weighted average least squares (WALS) and penalized least squares (PenLS) estimation. We also apply MS and MA techniques to data on hip fracture treatment costs in seven hospital districts in Finland. Implementation of the MDL principle for MS is put into action by using the normalized maximum likelihood (NML). However, the straightforward use of the NML technique in Gaussian linear regression fails because the normalization coeffcient is not finite. Rissanen has proposed an elegant solution to the problem by constraining the data space properly. We demonstrate the effect of data constraints on the MS criterion and present a general convex constraint in data space and disscuss two particular cases: the rhomboidal and ellipsoidal constraints. From these findings we derive four new NML-based criteria. One particular constraint is related to the case when collinearity is present in data. We study WALS estimation which has the potential for a good risk profile. WALS is attractive in regression especially when the number of explanatory variables is large because its computational burden is light. We also apply WALS to estimation and comparison of hip fracture treatment costs between hospital districts in Finland. We present the WALS estimators as a special case of shrinkage estimators and we characterize a class of shrinkage estimators for which we derive the effciency bound. We demonstrate how shrinkage estimators are obtained by using the PenLS technique and we prove suffcient conditions for the PenLS estimator to belong to the class of shrinkage estimators. Through this connection we may derive new MA estimators and effectively utilize certain previously known estimators in MA. We also study the performance of the estimators by using simulation experiments based on hip fracture treatment cost data. We propose an MA estimator with weights selected by the NML criterion. The resulting mixture estimator usually performs better than the corresponding MS estimator. We report on simulation experiments where the performance potential of MDL weight selection is compared with the corresponding potential of the AIC, BIC and Mallow's MA estimators. We also exploit the finding that a smoothing spline estimator may be rewritten as a linear mixed model (LMM). We present the NML criterion for LMM's and propose an automatic data-based smoothing method based on this criterion. The performance of the MDL criterion is compared to AIC, BIC and generalized cross-validation criteria in simulation experiments. Finally we consider the sequential NML (sNML) criterion in logistic regression. We show that while the NML criterion becomes quickly computationally infeasible as the number of covariates and amount of data increases, the sNML criterion can still be exploited in MS. We also develop a risk adjustment model for hip fracture mortality in Finland by choosing comorbidities that have an effect on mortality after hip fracture

    Ranking by Dependence - A Fair Criteria

    Full text link
    Estimating the dependences between random variables, and ranking them accordingly, is a prevalent problem in machine learning. Pursuing frequentist and information-theoretic approaches, we first show that the p-value and the mutual information can fail even in simplistic situations. We then propose two conditions for regularizing an estimator of dependence, which leads to a simple yet effective new measure. We discuss its advantages and compare it to well-established model-selection criteria. Apart from that, we derive a simple constraint for regularizing parameter estimates in a graphical model. This results in an analytical approximation for the optimal value of the equivalent sample size, which agrees very well with the more involved Bayesian approach in our experiments.Comment: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006

    Minimax rank estimation for subspace tracking

    Full text link
    Rank estimation is a classical model order selection problem that arises in a variety of important statistical signal and array processing systems, yet is addressed relatively infrequently in the extant literature. Here we present sample covariance asymptotics stemming from random matrix theory, and bring them to bear on the problem of optimal rank estimation in the context of the standard array observation model with additive white Gaussian noise. The most significant of these results demonstrates the existence of a phase transition threshold, below which eigenvalues and associated eigenvectors of the sample covariance fail to provide any information on population eigenvalues. We then develop a decision-theoretic rank estimation framework that leads to a simple ordered selection rule based on thresholding; in contrast to competing approaches, however, it admits asymptotic minimax optimality and is free of tuning parameters. We analyze the asymptotic performance of our rank selection procedure and conclude with a brief simulation study demonstrating its practical efficacy in the context of subspace tracking.Comment: 10 pages, 4 figures; final versio

    Deep Active Localization

    Full text link
    Active localization is the problem of generating robot actions that allow it to maximally disambiguate its pose within a reference map. Traditional approaches to this use an information-theoretic criterion for action selection and hand-crafted perceptual models. In this work we propose an end-to-end differentiable method for learning to take informative actions that is trainable entirely in simulation and then transferable to real robot hardware with zero refinement. The system is composed of two modules: a convolutional neural network for perception, and a deep reinforcement learned planning module. We introduce a multi-scale approach to the learned perceptual model since the accuracy needed to perform action selection with reinforcement learning is much less than the accuracy needed for robot control. We demonstrate that the resulting system outperforms using the traditional approach for either perception or planning. We also demonstrate our approaches robustness to different map configurations and other nuisance parameters through the use of domain randomization in training. The code is also compatible with the OpenAI gym framework, as well as the Gazebo simulator.Comment: 10 page

    Optimal model-free prediction from multivariate time series

    Get PDF
    © 2015 American Physical Society.Forecasting a time series from multivariate predictors constitutes a challenging problem, especially using model-free approaches. Most techniques, such as nearest-neighbor prediction, quickly suffer from the curse of dimensionality and overfitting for more than a few predictors which has limited their application mostly to the univariate case. Therefore, selection strategies are needed that harness the available information as efficiently as possible. Since often the right combination of predictors matters, ideally all subsets of possible predictors should be tested for their predictive power, but the exponentially growing number of combinations makes such an approach computationally prohibitive. Here a prediction scheme that overcomes this strong limitation is introduced utilizing a causal preselection step which drastically reduces the number of possible predictors to the most predictive set of causal drivers making a globally optimal search scheme tractable. The information-theoretic optimality is derived and practical selection criteria are discussed. As demonstrated for multivariate nonlinear stochastic delay processes, the optimal scheme can even be less computationally expensive than commonly used suboptimal schemes like forward selection. The method suggests a general framework to apply the optimal model-free approach to select variables and subsequently fit a model to further improve a prediction or learn statistical dependencies. The performance of this framework is illustrated on a climatological index of El Niño Southern Oscillation

    Higher-order asymptotics for the parametric complexity

    Full text link
    The parametric complexity is the key quantity in the minimum description length (MDL) approach to statistical model selection. Rissanen and others have shown that the parametric complexity of a statistical model approaches a simple function of the Fisher information volume of the model as the sample size nn goes to infinity. This paper derives higher-order asymptotic expansions for the parametric complexity, in the case of exponential families and independent and identically distributed data. These higher-order approximations are calculated for some examples and are shown to have better finite-sample behaviour than Rissanen's approximation. The higher-order terms are given as expressions involving cumulants (or, more naturally, the Amari-Chentsov tensors), and these terms are likely to be interesting in themselves since they arise naturally from the general information-theoretic principles underpinning MDL. The derivation given here specializes to an alternative and arguably simpler proof of Rissanen's result (for the case considered here), proving for the first time that his approximation is O(n−1)O(n^{-1}).Comment: Version 3: Fixed a minor error in the introductio

    Context-Aware Query Selection for Active Learning in Event Recognition

    Full text link
    Activity recognition is a challenging problem with many practical applications. In addition to the visual features, recent approaches have benefited from the use of context, e.g., inter-relationships among the activities and objects. However, these approaches require data to be labeled, entirely available beforehand, and not designed to be updated continuously, which make them unsuitable for surveillance applications. In contrast, we propose a continuous-learning framework for context-aware activity recognition from unlabeled video, which has two distinct advantages over existing methods. First, it employs a novel active-learning technique that not only exploits the informativeness of the individual activities but also utilizes their contextual information during query selection; this leads to significant reduction in expensive manual annotation effort. Second, the learned models can be adapted online as more data is available. We formulate a conditional random field model that encodes the context and devise an information-theoretic approach that utilizes entropy and mutual information of the nodes to compute the set of most informative queries, which are labeled by a human. These labels are combined with graphical inference techniques for incremental updates. We provide a theoretical formulation of the active learning framework with an analytic solution. Experiments on six challenging datasets demonstrate that our framework achieves superior performance with significantly less manual labeling.Comment: To appear in Transactions of Pattern Pattern Analysis and Machine Intelligence (T-PAMI
    • …
    corecore