3 research outputs found

    Comparison between best subset and lasso regression on consumer price index Malaysia

    Get PDF
    This research is aimed to determine the factors contributing to the prediction of the total Consumer Price Index (CPI) in Malaysia through model selection using two methods which are the best subset and LASSO regression. The outliers are identified using the leverage values and studentized deleted residuals while the multicollinearity variables will undergo progressive elimination identified through Variance Inflation Factor (VIF) values. Both methods were compared using the Mean Square Error of Prediction (MSE(P)) to find the best approach to display the CPI data. The model with the smallest MSE(P) will be chosen as the best model. The result showed that the MSE(P) of the best model using both the best subset regression and LASSO regression is almost the same. Therefore, the model selection using LASSO regression will be chosen as the best approach due to the simple process in identifying the best model. The best LASSO model consists of nine major categories such as food and non-alcoholic beverages (X1), alcoholic beverages and tobacco (X2), clothing and footwear (X3), transport (X7), communication (X8), recreation service and culture (X9), education (X10), restaurants and hotels (X11), miscellaneous goods and services (X12)

    Empirical bayesian binary classification forests using bootstrap prior

    Get PDF
    In this paper, we present a new method called Empirical Bayesian Random Forest (EBRF) for binary classification problem. The prior ingredient for the method was obtained using the bootstrap prior technique. EBRF addresses explicitly low accuracy problem in Random Forest (RF) classifier when the number of relevant input variables is relatively lower compared to the total number of input variables. The improvement was achieved by replacing the arbitrary subsample variable size with empirical Bayesian estimate. An illustration of the proposed, and existing methods was performed using five high-dimensional microarray datasets that emanated from colon, breast, lymphoma and Central Nervous System (CNS) cancer tumours. Results from the data analysis revealed that EBRF provides reasonably higher accuracy, sensitivity, specificity and Area Under Receiver Operating Characteristics Curve (AUC) than RF in most of the datasets used
    corecore