Multiple linear regression is a standard statistical tool that regresses p independent variables against a single dependent variable. The objective is to find a linear model that best predicts the dependent variable from the independent variables. Information criteria uses the covariance matrix and the number of parameters in a model to calculate a statistic that summarizes the information represented by the model by balancing a trade-off between a lack of fit term and a penalty term. SAS ® calculates Akaike’s Information Criteria (AIC) for every possible 2 p models for p ≤ 10 independent variables. AIC estimates a measure of the difference between a given model and the “true ” model. The model with the smallest AIC among all competing models is deemed the best model. This paper provides SAS code that can be used to simultaneously evaluate up to 1024 models to determine the best subset of variables that minimizes the information criteria among all possible subsets. Simulated multivariate data are used to compare the performance of AIC to select the true model with standard statistical techniques such as minimizing RMSE, forward selection, backward elimination, and stepwise regression. This paper is for intermediate SAS users of SAS/STAT who understand multivariate data analysis
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.