38 research outputs found

    Distance-based regression for non-normal data

    Get PDF
    Distance-based regression (DBR) is a good alternative method for estimating the unknown parameters in regression modeling when dealing with mixed-type of exploratory variables. The concept of DBR is similar to classical linear regression (LR), but the explanatory variables are measured based on distance instead of raw values. This study extends the early study by Cuadras that investigated DBR on normal data, to consider the data that are non-normal. At the same time, we propose a new approach of DBR. The new DBR is focused on the categorical explanatory variables where it investigated the binomial, nominal and ordinal data separately. The investigation was set up in a Monte Carlo study, aiming to compare the performance of DBR over bootstrapping regression (nonparametric) based on R square (R2), mean square error (MSE) and Bayesian information criterion (BIC). The findings indicate that both DBR and new DBR outperformed LR in both numerical exploratory variables and mixed-type of exploratory variables

    Type I Error Rates of the Two-Sample Pseudo-Median Procedure

    Get PDF
    The performance of the pseudo-median based procedure is examined in terms of controlling Type I error for a two independent groups test. The procedure is a modification of the one-sample Wilcoxon statistic using the pseudo-median of differences between group values as the central measure of location. The proposed procedure was shown to have good control of Type I error rates under the study conditions regardless of distribution type

    Robust Correlation Procedure via Sn Estimator

    Get PDF
    Pearson correlation coefficient is the most widely used statistical technique when measuring a relationship between the bivariate normal distribution when the assumptions are fulfilled. However, this classical correlation coefficient performs poor in the presence of an outlier. Therefore, this study aims to propose a new version of robust correlation coefficient based on robust scale estimator Sn. The performance of the proposed robust correlation coefficient is assessed based on correlation value, average bias and standard error. The performance of the proposed coefficient is compared with the classical correlation together with the existing robust correlation coefficient. Classical correlation coefficient performs well under the condition of perfect data. However, its performance becomes worst when data is contaminated. Under the condition of data contamination, robust correlation coefficient performed better compared to classical correlation

    New procedure in testing differences between two groups

    Get PDF
    Despite the theoretical correctness of the t-test in testing differences between two groups and the existence of the nonparametric backup, i.e. Mann-Whitney-Wilcoxon test, these test fail to simultaneously control Type I error and maintain adequate power under certain condition. This study intends to alleviate this problem by applying the pseudo-median as the location measure of interest into the one-sample nonparametric Wilcoxon procedure in a two group setting.Pseudo-median is the median of all possible differences of observations from the two groups. Since the sampling distribution of this procedure is intractable, the bootstrap method was used to achieve the significance level.The finding shows that the new procedure has the ability to control Type I error rates and maintaining high power rates regardless of distributional shape whether symmetrical or asymmetrical. The performance of the new procedure is compatible to t-test and Mann-Whitney-Wilcoxon test

    Sycophant Curve Model and Pearson Correlation Coefficient: An Application to Behavioral Change in Nigeria

    Get PDF
    This study investigates the behavioural switch and relationship of people (associates) towards a transiting chief executive officer (CEO). During the tenure of a new CEO, the rate of patronage of different categories of people seeking political and economic relevance increases over time, but as the CEO’s tenure wanes, the patronage behavioural pattern decreases. The sycophant curve model (SCM) was proposed to determine the behavioural pattern change of patronage at the onset and transitioning of the CEO tenure to a new CEO. The Pearson correlation coefficient (PCC) was also investigated. The results revealed that during the tenure of any CEO, the associate behaviour is increasingly positive, while during the transitioning period and beyond, the rate of the behavioural switch by associates decreases gradually. The PCC (rp = 0.9966) affirmed a strong positive relationship between the CEO and associates for four years. Meanwhile, rp = –0.9966, indicating a strong negative behavioural switch between the CEO and associates after four years of transition and beyond. This study demonstrated that during the tenure of any CEO, the behavioural switch of the associates towards the CEO is extremely minimal and gradually increases after the transition period and beyon

    Recursive prediction model: a preliminary application to lassa fever outbreak in Nigeria

    Get PDF
    Lassa fever (LF) is endemic in West Africa and Nigeria in particular. Since 1969 when the disease was discovered, a seasonal outbreak is often reported in Nigeria. Many researchers have reported inconsistent or varying numbers of suspected, confirmed and death cases since 2012 to date. To enhance this reportage, and due to the high mortality rate associated with LF, it is pertinent to design a suitable and robust model that could predict or estimate the number of LF cases based on the onset data. To achieve these, we proposed a recursive prediction (RP) model that could do predictions with the onset data. The Pearson correlation coefficient (R), and R2 are applied to determine the performance analysis of the model. The RP model predicted 96.7% confirmed cases and 89.6% death cases for the first three months of 2022 based on the onset data. The model was also applied to predict COVID-19 death cases during the six weeks of the outbreak in India. The result showed a comparable prediction with the regression output for the COVID-19 death cases. This study demonstrated that the proposed model could be applied to perform prediction for any disease of unknown etiology during the onset of the disease outbreak without any treatment similar to the COVID-19 outbreak. The performance analysis of the RP showed that the model is useful to predict the increasing trend of an outbreak of a disease with unknown etiology without prior treatment experience and vaccines

    Performance analysis and discrimination procedure of two-group location model with some continuous and high-dimensional of binary variables

    Get PDF
    This research’s primary goal was to evaluate the performance analysis of the recently constructed smoothed location models (SLMs) for discrimination purposes by combining two kinds of multiple correspondence analysis (MCA) to handle high dimensionality problems arising from the binary variables. A previous study of SLM, together with MCA as well as principal component analysis (PCA), displayed that the misclassification rate was still very high with respect to a large number of binary variables. Thus, two new SLMs are constructed in this paper to solve this particular problem. The first model results from the combination of SLM with Burt MCA (denoted as SLM+Burt), and the second one is with the joint correspondence analysis (denoted as SLM+JCA). The findings showed that both models performed well for all sample sizes (n) and all binary variables (b) under investigation, except n=60 and b=25 for the SLM+JCA model. Overall, the SLM+JCA model yields a greater performance in contrast to the SLM+Burt model. Moreover, the concept and procedures of the discrimination for the two-group classification conducted in this paper can be extended to multi-class classification as practitioners often deal with many groups and complexities of variables

    Enhanced Robust Univariate Classification Methods for Solving Outliers and Overfitting Problems

    Get PDF
    The robustness of some classical univariate classifiers is hampered if the data are contaminated. Overfitting is another hiccup when the data sets are uncontaminated with a considerable sample size. The performance of the classification models can be easily biased by the outliers’ problems, of which the constructed model tends to be overfitted. Previous studies often used the Bayes Classifier (BC) and the Predictive Classifier (PC) to address two groups of univariate classification problems. Unfortunately for substantial large sample sizes and uncontaminated data, the BC method overfits when the Optimal Probability of Exact Classification (OPEC) is used as an evaluation benchmark. Meanwhile, for small sample sizes, the BC and PC methods are extremely susceptible to outliers. To overcome these two problems, we proposed two methods: the Smart Univariate Classifier (SUC) and the hybrid classifier. The latter is a combination of the SUC and the BC methods, known as the Smart Univariate Bayes Classifier (SUBC). The performance of the new classification methods was evaluated and compared with the conventional BC and PC methods using the OPEC as a benchmark value. To validate the performance of these classification methods, the Probability of Exact Classification (PEC) was compared with the OPEC value. The results showed that the proposed methods outperformed the conventional BC and PC methods based on the real data sets applied. Numerical results also revealed that the SUC method could solve the overfitting problem. The results further indicated that the two proposed methods were robust against outliers. Therefore, these new methods are helpful when practitioners are confronted with overfitting and data contamination problems

    Robust Correlation Procedure via Sn Estimator

    Get PDF
    Pearson correlation coefficient is the most widely used statistical technique when measuring a relationship between the bivariate normal distribution when the assumptions are fulfilled. However, this classical correlation coefficient performs poor in the presence of an outlier. Therefore, this study aims to propose a new version of robust correlation coefficient based on robust scale estimator Sn. The performance of the proposed robust correlation coefficient is assessed based on correlation value, average bias and standard error. The performance of the proposed coefficient is compared with the classical correlation together with the existing robust correlation coefficient. Classical correlation coefficient performs well under the condition of perfect data. However, its performance becomes worst when data is contaminated. Under the condition of data contamination, robust correlation coefficient performed better compared to classical correlatio

    Measuring the relationship of bivariate data using Hodges-Lehman estimator

    Get PDF
    The relationship of bivariate data ordinarily measured using correlation coefficient. The most commonly used correlation coefficient is the Pearson correlation coefficient. This coefficient is well-known as the best coefficient for interval or ratio bivariate data with a linear relationship. Even though this coefficient is good under the mentioned condition, it also becomes very sensitive to a small departure from linearity.Usually, this is because of the existence of an outlier. For that reason, this paper provides new robust correlation coefficients which combine the elements of nonparametric technique from the Hodges Lehmann estimator and the parametric technique based on the Pearson correlation coefficient. This paper also introduces different scale estimators such as median and median absolute deviation (MADn) and denoted by rHL(med) and rHL(MADn) respectively. The performance of the proposed correlation coefficients is measured by the coefficient values and these values are also being compared to the Pearson correlation coefficient and several existing robust correlation coefficients. The results show that the Pearson correlation coefficient (r) with no doubt is very good under perfect data condition, but with only 10% outliers, it not only give poor correlation value but turns the direction of the relationship to negative. While the rHL(med) and rHL(MADn) offer the highest coefficient values and these values are robust to the existence of outliers by up to 30%. With very good performance under all data conditions yet simple in the calculation, the rHL(med) and rHL(MADn) is considered a good alternative to the r when need to deal with outlier
    corecore