49 research outputs found

    ZERO INFLATED NEGATIVE BINOMIAL MODELS IN SMALL AREA ESTIMATION

    Get PDF
    The problem of over-dispersion in Poisson data is usually solved by introducing prior distributions which lead to negative binomial models. Poisson data sometime is also suffered by excess zero problems, a condition when data contains too many zero or exceeds the distribution's expectation. Zero Inflated Negative Binomial (ZINB) method can be utilized to solve such problems. This paper demonstrates the adoption of ZINB methods in Small Area Estimation with excess zero data. It is shown that the excess zero problem has substantially influenced the Empirical Bayes (EB) estimates, and the adoption of ZINB methods has improved the precision and reliability of the estimates. Key Words: Small Area Estimation, Zero-Inflation, Poisson-Gamma, Negative Binomial Regression, Empirical Baye

    AUTOREGRESSIVE MOVING AVERAGE (ARMA) MODEL FOR DETECTING SPATIAL DEPENDENCE IN INDONESIAN INFANT MORTALITY DATA

    Get PDF
    Infant mortality is an important indicator that must to be monitored seriously. The infant mortality is associated with several determinants, such as the infant’s characteristics, maternal and fertility factors, housing condition, geographical area, and policy. It can also be influenced by the presence of spatial dependence between regency in Indonesia. This is due to the social and economic activity in one regency depend on social and economic activity in other regency, especially with neighboring area. Infant mortality data obtained from Indonesian Demographic and Health Survey (IDHS) published by Statistic Indonesia (BPS). In BPS’s publication, data is always sorted by regency code from the smallest to the largest. Therefore, the closeness of the regency code refers to the closeness of the regency itself. the infant mortality data by regency could be analogized as time series data. So that, the relationship between regency can be seen using Autoregressive Moving Average (ARMA) model. If the parameter at ARMA is significant, we can conclude that there is a spatial dependence on the infant mortality in Indonesia. This paper will focus on discussing whether there is a spatial dependenc in Indonesia’s Infant Mortality Data using ARMA approach. The result is the Autocorrelation Function (ACF) showed a significant effect until lag 3, and Partial Autocorrelation Function (PACF) showed a significant effect until lag 1. Based on Bayesian Information Criterion (BIC), the AR(1) fitted the model well. It shows that the probability of infant mortality in one regency is affected by probability of infant mortality in neighboring regency.Key words : ARMA, spatial dependence, infant mortality, IDH

    COMPARISON WILLIAMS METHOD AND BETA-BINOMIAL IN OVERDISPERSION OF LOGISTIC REGRESSION: A CASE OF INDONESIA GENERAL ELECTION DATA 2014

    Get PDF
    Democratization in Indonesia so far has resulted in increasingly rational voters. The rational voters in each district or city of Indonesia are varied due to many factors. The system of election in Indonesia today is direct election system in which every citizen has freedom to vote the preferred candidates or even not to vote at all. There were 12 political parties participated in the legislative election in 2014, whereas in the presidential election there were two pairs of president and vice-president candidates competed. This research was aimed to obtain models, at the district level, that properly relate the votes were gained by the two candidates and other variables such as human development index, the results of legislative election as specially coalition of political parties voting results. Since the vote data was binary and showed over-dispersion then a logistics model accounting for over-dispersion was utilized. An over-dispersion problem is present whenever observations which might be expected to correspond to the binomial distribution may have greater variance than ni πi (1-πi).In this research the William’s method and beta-binomial regression were used to overcome the problem. The result showed that the Williams method provided better estimates when was compared to beta-binomial regression. keyword: Logistic regression, Overdispersion, Williams’Method, Beta-Binomial Regression, General Electio

    METODE E-BLUP DALAM SMALL AREA ESTIMATION UNTUK MODEL YANG MENGANDUNG RANDOM WALK

    Get PDF
    Ada dua topik utama yang menjadi perhatian para statistisi dalam membahas persoalan survei. Yaitu persoalan pengembangan teknik penarikan contoh (sampling technique) dan pengembangan metodologi pendugaan parameter pupulasi (estimation methods). Adapaun persoalan mutakhir dalam metodologi pendugaan adalah menyangkut pendugaan untuk daerah atau domain survei yang memiliki contoh kecil atau bahkan tidak memiliki contoh satupun, Rao(2003). Misalnya survei untuk unit rumah tangga pada suatu survei berskala nasional. Umumnya untuk survei demikian banyaknya contoh rumah tangga untuk tiap kabupaten dalam suatu propinsi sangat kecil (small area). Bahkan bisa terjadi kabupaten tertentu tidak terpilih sebagai daerah survei sehingga contoh rumah tangga dari kabupaten tersebut tidak ada. Metode pendugaan langsung (direct estimation) untuk daerah atau kabupaten yang bersangkutan menjadi tidak layak karena contohnya terlalu kecil. Pada makalah ini akan dipaparkan metode pendugaan daerah kecil (small area estimation) dengan pendekatan pendugaan tidak langsung berbasis model (indirect estimation - model based). Khususnya untuk model yang mengandung langkah acak (random walk).   Kata Kunci :    direct estimation, indirect estimation, generalized regression, general linear mixed model, empirical best linear unbiased prediction, block diagonal covariance, random walk

    AUTOLOGISTICS MODELS FOR MODELLING AND MAPPING INFANT MORTALITY IN INDONESIA

    Get PDF
    According to Indonesia Demographic and Health Survey (IDHS) data, the highest child mortality occurred during the first year of age of infant. Infant mortality is an important indicator that must to be monitored seriously. The mortality is associated with several determinants, such as the infant’s characteristics, maternal and fertility factors, housing condition, and also geographical area. The aim of this research is to develop models that can be used to explain the effects of explanatory variables on infant mortality in Indonesia. It is also the aim of this research to develop a thematic map describing the distribution pattern of infant mortality probabilities at district level across the country. The response variable is a binary categorical variable with two outcomes, success and failure. The outcome is success if the infant died before achieving one year of age, and failed if the infant is still alive after one year of age. Modeling is using Logistic Regression model and Autologistics Regression model. The results showed that the Autologistics Regression model fitted the data reasonably well, all of the explanatory variable affect infant mortality, except infant’s sex. The results also showed that the probabiity of infant mortality was higher in Kalimantan island and Papua island. Keyword : Infant, mortality, autologistic, binary, IDH

    COMPARISON OF LOW BIRTH WEIGHT RATE ESTIMATES BASED ON DIFFERENT AGGREGATE LEVELS DATA USING LOGISTIC REGRESSION MODEL

    Get PDF
    Low Birth-Weight (LBW) is defined as a birth weight of a live-born infant of less than 2.500 grams regardless of gestational age. Case of LBW is associated with infant mortality, infant morbidity, inhibited growth and slow cognitive development, also chronic diseases in later life. It is vital because with high LBW rate the generation hardly grow into its full potential. There are many risk factors, whether direct or indirect, can cause a birth as a high risk of Low Birth Weight case. These factors are genetics, obstetrics, nutrition intakes, diseases, toxic exposures, pregnancy care and social factors. With these factors measured, statistical modelling can be used to estimate rate on group level or probability on individual level of the Low Birth Weight event. As the case is a binary response, Logistic Regression Model is commonly used.Data of LBW case and the risk factors came from Indonesian Demographic and Health Survey (IDHS) 2012. Published national rate of LBW was 7.3% with provincial rates fell between 4.7-15.7 %. Although the national rate was considered low, the wide variation of provincial rates showed that the problem was not handled so well. However, these rates cannot be measured yearly due to 5 year period of the survey. With the availability of risk factors data a model can be built to estimate the LBW rates. But, another problem for the model is the case when aggregate level data is available instead of individual level data. So, the purpose of this study was to compare models based on different aggregate levels and theirs estimated provincial rates. Comparison was done among individual birth level, mother level, household level and census block (cluster) level. Models from three former levels were quite similar with adequate significant parameters, while cluster level model was resulted only a few significant parameters. But instead, LBW rate estimates from cluster level model were the closest to the direct estimates. But the variance of these estimates was still higher than the other models.Key words : Low Birth-Weight, IDHS, Logistic Regression, GLM, Aggregate Dat

    Implementation of Winsorizing and random oversampling on data containing outliers and unbalanced data with the random forest classification method

    Get PDF
    Many researchers conduct research using the classification method, to find out the best method for predicting the class of an observation. Some of these studies explain that random forest is the best method. However, the classification of data containing outliers and unbalanced data is a complicated problem. Many researchers are also conducting research to deal with these problems. In this study, we propose a winsorizing to deal with outliers by replacing the outlier values with the upper and lower limit values obtained from the interquartile range method and random oversampling to balance the data. It is also known that cases of the Human Development Index (HDI) in regencies/cities in eastern Indonesia vary widely, so cases of HDI in these areas can be used as case studies of data containing outliers and unbalanced data. The purpose of this study was to compare the performance of the random forest before and after the data were applied to the winsorizing and random oversampling to predict HDI in districts/cities in eastern Indonesia. Classification method random forest after handling data containing outliers and unbalanced data has better performance in terms of accuracy and kappa values, which are 96.43% and 93.41%, respectively. The variables of expenditure per capita and the mean years of schooling are the most important

    ASSOCIATION RULES IN RANDOM FOREST FOR THE MOST INTERPRETABLE MODEL

    Get PDF
    Random forest is one of the most popular ensemble methods and has many advantages. However, random forest is a "black-box" model, so the model is difficult to interpret. This study discusses the interpretation of random forest with association rules technique using rules extracted from each decision tree in the random forest model. This analysis involves simulation and empirical data, to determine the factors that affect the poverty status of households in Tasikmalaya. The empirical data was sourced from Badan Pusat Statistik (BPS), the National Socio-Economic Survey (SUSENAS) data for West Java Province in 2019.  The results obtained are based on simulation data, the association rules technique can extract the set of rules that characterize the target variable. The application of interpretable random forest to empirical data shows that the rules that most distinguish the poverty status of households in Tasikmalaya are house wall materials and the main source of drinking water, house wall materials and cooking fuel, as well as house wall materials and motorcycle ownership

    MODELLING THE AVERAGE SCORES OF NATIONAL EXAMINATION IN WEST JAVA

    Get PDF
    Formal education in Indonesia is commonly divided into stages such as preschool, primary school (SD), Secondary School (SMP-SMA), and universities/colleges. Indonesian government has been taking serious efforts on how to improve the quality of education in Indonesia. The roadmap for continous improvement of education quality can be designed based on the results of National Examination (UN) taken regularly by high school students. This research was aimed at exploring informations on how the scores of UN can be linked with other explanatory variables. A panel data which consists of average scores of UN for all public senior high schools (SMA Negeri) in West Java Provinces during 2011-2013 and other related variables such as total scores of accreditation, regional domestic product, human development index, scores of school’s facilities and its infrastructure, scores of school’s educators, average scores of final school exams, were used in this research. The average scores of UN in this case were dependent on variations between high schools and time periods as well as other explanatory variables in which the effects were either fixed or random. The data of this research was modelled with linear mixed models and using the Generalized Estimating Equation (GEE) approach. Both linear mixed models and GEE have been commonly used to analyse the panel data. This paper showed that the GEE provided a model of better performance than the linear mixed models in explaining the variability of the response variable which was the average scores of UN. The GEE also showed significant correlation between explanatory variables and the response. Key words: fixed effects, GEE, linear mixed model, national examination, random effects

    Dynamic Time Warping Techniques for Time Series Clustering of Covid-19 Cases in DKI Jakarta

    Get PDF
    The number of positive cases of Covid-19 in DKI Jakarta has contributed to the national issues, reaching 25% of the total cases in Indonesia. The research examined and modeled the distribution pattern of Covid-19 positive cases in DKI Jakarta based on 44 districts spreading over six administrative areas. The data were regarding positive Covid-19 cases in DKI Jakarta for the past year, from April 2020 to April 2021. The research related to the pattern of positive Covid-19 distribution in 44 districts was carried out by time series clustering through Dynamic Time Warping (DTW) distances and agglomerative hierarchical methods. Then, the effectiveness of the clustering process is evaluated by comparing the predicted value of Covid-19 cases between clustering and non-clustering forecast results at the city level for the next 14 days through the Autoregressive Integrated Moving Average (ARIMA) model. The results group 44 districts into 6 optimal clusters based on the pattern of positive cases of Covid-19 in each district. The highest distribution rate is in cluster A, and the lowest is in cluster F. Geographical characteristics are also indicated by clusters A, B, E, and F. Then, the results show that the Mean Average Percentage Error (MAPE) value of the clustering model ranges from 16% to 20%. The difference between MAPE values to the non-clustering model implies that the forecasting accuracy is not far apart, which is in the round of 5%−6%
    corecore