20 research outputs found

    Associated factor of mortality rate amongst patients with AIDS and HIV-TB co-infections using zero inflated negative binomial method

    Get PDF
    Many data sets are characterized as count data with a preponderance of zeros. Data in the form of counts and proportions arise in many fields such as studies in medicine, public health, toxicology, epidemiology, sociology, psychology, engineering, agriculture and soon. When the dependent variable is a nonnegative count variable, a Poisson regression model is commonly used to explain the relationship between the outcome variable and a set of explanatory variables. However, if extra-zero Poisson counts are observed, it has been suggested that a zero-inflated Poisson regression model is more appropriate than the classical Poisson regression model. One frequently encountered problem in these data is that simple models such as the Poisson and the Binomial models failed to explain the variation that exists. Often, data exhibit extra-dispersion (over or under dispersion). Another complication in data in the form of counts and proportions is that they are sometimes too sparse, that is smaller values have greater tendency to occur. In the Poisson case counts that occur are generally small and in the binomial case the binomial denominators are often small. Therefore, valid procedures are needed to detect departures from the simple models. Hence, when a lot of extra zero exists, zero inflated Negative Binomial has been suggested when overdispersion is present. It is more appropriate than the classical Negative Binomial regression model. Hence, this thesis follows the general objective, that is to compare Zero-Inflated Negative Binomial and Negative Binomial in identifying associated factors. The specific objective is to fit a Zero-Inflated Negative Binomial death rate regression model for mortality rate among AIDS/HIV co-infection patients and to compare Zero-Inflated Negative Binomial death rate regression with Negative Binomial death rate, which is the best model when a data existing zeroes values. It follows by to determine overdispersion in the model. Lastly, to investigate the potential confounding factors affecting mortality rate among disease mapping co�infection patients among HIV-TB and AIDS. In this thesis, mortality rate is a subject of interest as dependent variable according to age categories by years. The data are analyzed from AIDS patients and HIV-TB mortality cases for comparing between Negative Binomial mortality and Zero Inflated Negative Binomial Mortality (ZINBM) which is better. Beyond this substantive concern, the choice should be based on the model providing the closest fit between the observed and predicted values. Unfortunately, the literature presents anomalous findings in terms of model superiority. In addition, the Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC) values were used to compare the fit between models. The results suggested that the literature are not entirely anomalous. However, the accuracy of the findings depended on the proportion of zeros and the distribution for the non zeros. ZINBDR tend to be the superior model, than the negative binomial model. The findings suggested there should be consideration of the proportion of zeroes and the distribution for the nonzero when selecting a model to accommodate zero-inflated data

    Undergraduate students’ errors on interval estimation based on variance neglect

    Get PDF
    Interval estimation is an important topic, especially in drawing conclusions on an event. Mathematics education students must possess the skill to formulate and use interval estimation. The errors of mathematics education students in formulating wrong interval estimates indicate a low understanding of interval estimation. This study explores the errors of mathematics education students in interpreting the variance in the questions regarding selecting the proper test statistic to formulate the interval estimation of mean accurately. Respondents in this study involved 36 students of mathematics education (N = 9 males, N = 27 females). This research is qualitative research with a qualitative descriptive approach. Data collection was carried out using the respondents’ ability test and interviews. The respondents’ ability test instrument was tested on 36 students and declared valid where r-count r-table with r-table of 0.3291, and declared reliable with a Cronbach Alpha value of 0.876 0.6. Through an exploratory approach, data were analyzed by categorizing, reducing, and interpreting to conclude students' abilities and thinking methods in formulating interval estimation of the mean based on the variance in questions. The results showed that mathematics education students neglected the variance, so they could not determine the test statistics correctly, resulting in error interval estimates. This study provides insight into the thinking methods of mathematics education students on variance in interval estimation problems in the hope of anticipating errors in formulating interval estimation problems

    Interaction effects on prediction of children weight at school entry using model averaging

    Get PDF
    Model selection introduce uncertainty to the model building process, therefore model averaging was introduced as an alternative to overcome the problem of underestimate of standards error in model selection. This research also focused on using selection criteria between Corrected Akaike's Information Criteria (AICC) and Bayesian Information Criteria (BIC) as weight for model averaging when involving interaction effects. Mean squared error of prediction (MSE(P)) was used in order to determine the best model for model averaging. Gateshead Millennium Study (GMS) data on children weight used to illustrate the comparison between AICC and BIC. The results showed that model selection criterion AICC performs better than BIC when there are small sample and large number of parameters included in the model. The presence of interaction variable in the model is not significant compared to the main factor variables due to the lower coefficient value of interaction variables. In conclusion, interaction variables give less information to the model as it coefficient value is lower than main factor

    A robust vector autoregressive model for forecasting economic growth in Malaysia

    Get PDF
    Economic indicator measures how solid or strong an economy of a country is. Basically, economic growth can be measured by using the economic indicators as they give an account of the quality or shortcoming of an economy. Vector Auto-regressive (VAR) method is commonly useful in forecasting the economic growth involving a bounteous of economic indicators. However, problems arise when its parameters are estimated using least square method which is very sensitive to the outliers existence. Thus, the aim of this study is to propose the best method in dealing with the outliers data so that the forecasting result is not biased. Data used in this study are the economic indicators monthly basis starting from January 1998 to January 2016. Two methods are considered, which are filtering technique via least median square (LMS), least trimmed square (LTS), least quartile difference (LQD) and imputation technique via mean and median. Using the mean absolute percentage error (MAPE) as the forecasting performance measure, this study concludes that Robust VAR with LQD filtering is a more appropriate model compare to others model

    ARIMA and VAR Modeling to Forecast Malaysian Economic Growth

    Get PDF
    This study presents a comparative study on univariate time series via Autoregressive Integrated Moving Average (ARIMA) model and multivariate time series via Vector Autoregressive (VAR) model in forecasting economic growth in Malaysia. This study used monthly economic indicators price from January 1998 to January 2016 and the economic indicators used to measure the economic growth are Currency in Circulation, Exchange Rate, External Reserve and Reserve Money. The aim is to evaluate a VAR and ARIMA model to forecast economic growth and to suggest the best time series model from existing model for forecasting economic growth in Malaysia. The forecast performances of these models were evaluated based on out-of-sample forecast procedure using an error measurement, Mean Absolute Percentage Error (MAPE). Results revealed that VAR model outperform ARIMA model in predicting the economic growth in term of lowest forecasting accuracy measurement

    Using Historical Return Data in the Black-Litterman Model for Optimal Portfolio Decision

    Get PDF
    In this paper, the Black-Litterman model which is the improved mean-variance optimization model, is discussed. Basically, the views given by the investors were incorporated into this model so that their views on risk and return, and risk tolerance could be quantified. For doing so, the market rates of return for the assets were calculated from the geometric mean. Moreover, the views of the investors were expressed in the matrix form. Then, the covariance matrix and the diagonal covariance matrix of the assets return were calculated. Accordingly, the mean rate of the asset return was computed. On this basis, the Black-Litterman optimization model was constructed. This model formulation was done by taking a set of possible rates of return for the assets. Particularly, the corresponding optimal portfolios of the assets with lower risk and higher expected return were further determined. For illustration, the historical return data for S&P 500, 3-month Treasury bill, and 10-year Treasury bond from 1928 to 2016 were employed to demonstrate the formulation of the ideal investment portfolio model. As a result, the efficient frontier of the portfolio is shown and the discussion is made. In conclusion, the Black-Litterman model could provide the optimal investment decision practically

    Using Historical Return Data in the Black-Litterman Model for Optimal Portfolio Decision

    Get PDF
    In this paper, the Black-Litterman model which is the improved mean-variance optimization model, is discussed. Basically, the views given by the investors were incorporated into this model so that their views on risk and return, and risk tolerance could be quantified. For doing so, the market rates of return for the assets were calculated from the geometric mean. Moreover, the views of the investors were expressed in the matrix form. Then, the covariance matrix and the diagonal covariance matrix of the assets return were calculated. Accordingly, the mean rate of the asset return was computed. On this basis, the Black-Litterman optimization model was constructed. This model formulation was done by taking a set of possible rates of return for the assets. Particularly, the corresponding optimal portfolios of the assets with lower risk and higher expected return were further determined. For illustration, the historical return data for S&P 500, 3-month Treasury bill, and 10-year Treasury bond from 1928 to 2016 were employed to demonstrate the formulation of the ideal investment portfolio model. As a result, the efficient frontier of the portfolio is shown and the discussion is made. In conclusion, the Black-Litterman model could provide the optimal investment decision practically

    Forecasting currency in circulation in Malaysia using arch and garch models

    Get PDF
    The monthly economic time series commonly contains the volatility periods and it is suitable to apply the Heteroscedastic model to it where the conditional variance is not constant throughout the time trend. The aim of this study is to model and forecast the currency in circulation (CIC) in Malaysia over the time period, from January 1998 to January 2016. Two methods are considered, which are Autoregressive Conditional Heteroscedastic (ARCH) and Generalized Autoregressive Conditional Heteroskedasticity (GARCH). Using the Root Mean Square Error (RMSE) as the forecasting performance measure, this study concludes that GARCH is a more appropriate model compared to ARCH

    The post hoc procedure in survival analysis for undergraduate students performance

    Get PDF
    Survival analysis is a term used to describe the analysis of data in the form of times from a welldefined time origin until the occurrence of any specific event. In an academic research, the time origin often corresponds to the recruitment of an individual into an experimental study. There are the unmeasured chance of finding a falsely significant difference between two or more groups. Compared more than two groups simultaneously increased the chance of making type 1 error. This paper proposed survival analysis with multiple comparison studies to came up with this issue which is to identify the best undergraduate student performance based on the three certificates of qualification which Diploma, Matriculation and STPM. The undergraduate student achievement data are taken to explain this methodology. Kaplan-Meier plotted with survival comparison test, Log-rank test is used to elaborate the application of the Scheffe test. The result reveals that the undergraduates’ students from STPM have performed better in Degree. The Kaplan-Meier curve shows a significant difference in survival plot among three certificates of qualification. However, p-value adjusted by Scheffe test for paired Matriculation and Diploma was found an insignificant difference. So, this study shows the importance of p-value adjustment with Scheffe test in comparing more than two groups to draw a right conclusion

    Empirical bayesian binary classification forests using bootstrap prior

    Get PDF
    In this paper, we present a new method called Empirical Bayesian Random Forest (EBRF) for binary classification problem. The prior ingredient for the method was obtained using the bootstrap prior technique. EBRF addresses explicitly low accuracy problem in Random Forest (RF) classifier when the number of relevant input variables is relatively lower compared to the total number of input variables. The improvement was achieved by replacing the arbitrary subsample variable size with empirical Bayesian estimate. An illustration of the proposed, and existing methods was performed using five high-dimensional microarray datasets that emanated from colon, breast, lymphoma and Central Nervous System (CNS) cancer tumours. Results from the data analysis revealed that EBRF provides reasonably higher accuracy, sensitivity, specificity and Area Under Receiver Operating Characteristics Curve (AUC) than RF in most of the datasets used
    corecore