Search CORE

23,593 research outputs found

A Multi-objective Exploratory Procedure for Regression Model Selection

Author: Ankur Sinha
Davidson R.
Deb K.
Freund Y.
Goldberg D.
Jeffreys H.
Leamer E.E.
MacKay D. J.C.
Murata N.
Paterlini S.
Pekka Malo
Redmond M.
Takeuchi K.
Tibshirani R.
Timo Kuosmanen
Zitzler E.
Publication venue: 'Informa UK Limited'
Publication date: 13/07/2016
Field of study

Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance of the variables. To provide a robust method for model selection, this paper introduces the Multi-objective Genetic Algorithm for Variable Selection (MOGA-VS) that provides the user with an optimal set of regression models for a given data-set. The algorithm considers the regression problem as a two objective task, and explores the Pareto-optimal (best subset) models by preferring those models over the other which have less number of regression coefficients and better goodness of fit. The model exploration can be performed based on in-sample or generalization error minimization. The model selection is proposed to be performed in two steps. First, we generate the frontier of Pareto-optimal regression models by eliminating the dominated models without any user intervention. Second, a decision making process is executed which allows the user to choose the most preferred model using visualisations and simple metrics. The method has been evaluated on a recently published real dataset on Communities and Crime within United States.Comment: in Journal of Computational and Graphical Statistics, Vol. 24, Iss. 1, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Disentangling causal webs in the brain using functional Magnetic Resonance Imaging: A review of current approaches

Author: Anderson Paul
Bielczyk Natalia Z.
Buitelaar Jan K.
Glennon Jeffrey C.
Uithol Sebo
van Mourik Tim
Publication venue
Publication date: 01/01/2019
Field of study

In the past two decades, functional Magnetic Resonance Imaging has been used to relate neuronal network activity to cognitive processing and behaviour. Recently this approach has been augmented by algorithms that allow us to infer causal links between component populations of neuronal networks. Multiple inference procedures have been proposed to approach this research question but so far, each method has limitations when it comes to establishing whole-brain connectivity patterns. In this work, we discuss eight ways to infer causality in fMRI research: Bayesian Nets, Dynamical Causal Modelling, Granger Causality, Likelihood Ratios, LiNGAM, Patel's Tau, Structural Equation Modelling, and Transfer Entropy. We finish with formulating some recommendations for the future directions in this area

arXiv.org e-Print Archive

Directory of Open Access Journals

Radboud Repository

A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation

Author: Cheong Taesu
Chung Seokhyun
Park Young-Woong
Park Young-Woong
Publication venue: 'Elsevier BV'
Publication date: 26/07/2020
Field of study

Subset selection for multiple linear regression aims to construct a regression model that minimizes errors by selecting a small number of explanatory variables. Once a model is built, various statistical tests and diagnostics are conducted to validate the model and to determine whether the regression assumptions are met. Most traditional approaches require human decisions at this step. For example, the user adding or removing a variable until a satisfactory model is obtained. However, this trial-and-error strategy cannot guarantee that a subset that minimizes the errors while satisfying all regression assumptions will be found. In this paper, we propose a fully automated model building procedure for multiple linear regression subset selection that integrates model building and validation based on mathematical programming. The proposed model minimizes mean squared errors while ensuring that the majority of the important regression assumptions are met. We also propose an efficient constraint to approximate the constraint for the coefficient t-test. When no subset satisfies all of the considered regression assumptions, our model provides an alternative subset that satisfies most of these assumptions. Computational results show that our model yields better solutions (i.e., satisfying more regression assumptions) compared to the state-of-the-art benchmark models while maintaining similar explanatory power

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)