Skip to main content
Article thumbnail
Location of Repository

SAS® Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria

By Dennis J. Beal

Abstract

Multiple linear regression is a standard statistical tool that regresses p independent variables against a single dependent variable. The objective is to find a linear model that best predicts the dependent variable from the independent variables. Information criteria uses the covariance matrix and the number of parameters in a model to calculate a statistic that summarizes the information represented by the model by balancing a trade-off between a lack of fit term and a penalty term. SAS ® calculates Akaike’s Information Criteria (AIC) for every possible 2 p models for p ≤ 10 independent variables. AIC estimates a measure of the difference between a given model and the “true ” model. The model with the smallest AIC among all competing models is deemed the best model. This paper provides SAS code that can be used to simultaneously evaluate up to 1024 models to determine the best subset of variables that minimizes the information criteria among all possible subsets. Simulated multivariate data are used to compare the performance of AIC to select the true model with standard statistical techniques such as minimizing RMSE, forward selection, backward elimination, and stepwise regression. This paper is for intermediate SAS users of SAS/STAT who understand multivariate data analysis

Year: 2009
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.4675
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.itc.virginia.edu/re... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.