85 research outputs found
Robust Statistical Procedures For Testing The Equality Of Central Tendency Parameters Under Skewed Distributions [QA276.A12 S531 2005 f rb].
Kajian ini menyelidik kesan ralat Jenis I dan kuasa keatas dua jenis kaedah teguh. Kaedah pertama dikenali sebagai statistik S1 yang julung kalinya diselidik oleh Babu et
al. (1999).
This study examined the effect of Type I error and power on two types of robust methods. The first method is known as the S1 statistic, which was first studied by
Babu et al. (1999)
Robust Statistical Procedures For Testing The Equality Of Central Tendency Parameters Under Skewed Distributions
This study examined the effect of Type I error and power on two types of robust methods. The first method is known as the S1 statistic, which was first studied by Babu et al. (1999). This statistic uses median as the central tendency measure. An interesting characteristic of the S1 statistic is that the data needs no trimming when skewed. The second method, proposed by Othman et al. (2004) is known as the MOM-H statistic. In contrast to the S1 method, the MOM-H statistic will trim any extreme values, and unlike trimmed means, this statistic empirically det ermines the amount of trimming needed thus avoiding unnecessary trimming. The central tendency measure for this statistic is the modified one-step M-estimator (MOM) proposed by Wilcox and Keselman (2003). In this study, we modified the two statistical methods by incorporating some of the more robust scale estimators to these statistics. We identified four robust scale estimators with highest breakdown point and bounded influence functions as ascertained by Rouesseuw and Croux (1993) i.e. MADn, Qn, Sn, and Tn. These scale estimators functioned differently in each of the two statistical methods. For the S 1 statistic, the estimators replaced the default scale estimator to form modified S 1 procedures, and for the MOM-H statistic, these scale estimators were used as the trimming criterion used to determine the sample values for modified one-step M-estimator (MOM). To identify the sturdiness or robustness of each procedure, some variables were manipulated to create conditions which are known to highlight the strengths and weaknesses of tests designed to determine the central tendency measures equality
On robust mahalanobis distance issued from minimum vector variance
Detecting outliers in high dimension datasets remains a challenging task.Under this circumstance, robust location and scale estimators are usually proposed in place of the classical estimators. Recently, a new robust estimator for multivariate data known as minimum variance vector (MVV) was introduced. Besides inheriting the nice properties of the famous MCD estimator, MVV is computationally more efficient. This paper proposes MVV to detect outliers via Mahalanobis squared distance (MSD).The results revealed that MVV is more effective in detecting outliers and in controlling Type I error compared with MCD
Robust Linear Discriminant Analysis with Highest Breakdown Point Estimator
Linear Discriminant Analysis (LDA) is a supervised classification technique concerned with the relationship between a categorical variable and a set of interrelated variables.The main objective of LDA is to create a rule to distinguish between populations and allocating future observations to previously defined populations.The LDA yields optimal discriminant rule between two or more groups under the assumptions of normality and homoscedasticity.Nevertheless, the classical estimates, sample mean and sample covariance matrix, are highly affected when the ideal conditions are violated.To abate these problems, a new robust LDA rule using high breakdown point estimators has been proposed in this article.A winsorized approach used to estimate the location measure while the multiplication of Spearman’s rho and the rescaled median absolute deviation were used to estimate the scatter measure to replace the sample mean and sample covariance matrix, respectively.Simulation and real data study were conducted to evaluate the performance of the proposed model measured in terms of misclassification error rates.The computational results showed that the proposed LDA is always better than the classical LDA and were comparable with the existing robust LDAs
New discrimination procedure of location model for handling large categorical variables
The location model proposed in the past is a predictive discriminant rule that can classify new observations into one of two predefined groups based on mixtures of continuous and categorical variables. The ability of location model to discriminate new observation correctly is highly dependent on the number of multinomial cells created by the number of categorical variables. This study conducts a preliminary investigation to show the location model that uses maximum likelihood estimation has high misclassification rate up to 45% on average in dealing with more than six categorical variables for all 36 data tested. Such model indicated highly incorrect prediction as this model performed badly for large categorical variables even with large sample size. To alleviate the high rate of misclassification, a new strategy is embedded in the discriminant rule by introducing nonlinear principal component analysis (NPCA) into the classical location model (cLM), mainly to handle the large number of categorical variables. This new strategy is investigated on some simulation and real datasets through the estimation of misclassification rate using leave-one-out method. The results from numerical investigations manifest the feasibility of the proposed model as the misclassification rate is dramatically decreased compared to the cLM for all 18 different data settings. A practical application using real dataset demonstrates a significant improvement and obtains comparable result among the best methods that are compared. The overall findings reveal that the proposed model extended the applicability range of the location model as previously it was limited to only six categorical variables to achieve acceptable performance. This study proved that the proposed model with new discrimination procedure can be used as an alternative to the problems of mixed variables classification, primarily when facing with large categorical variables
A multivariate EWMA control chart for skewed populations using weighted variance method
This article proposes Multivariate Exponential Weighted Moving Average control chart for skewed population using heuristic Weighted Variance
(WV) method, obtained by decomposing the variance into the upper and lower segments according to the direction and degree of skewness.This method adjusts the variance-covariance matrix of quality characteristics.The proposed chart, called WV-MEWMA hereafter, reduces to standard multivariate Exponential Weighted Moving Average control chart
(standard MEWMA) when the underlying distribution is symmetric.In control and out-of-control ARLs of the proposed WV-MEWMA control chart are compared with those of the weighted standard deviation
Exponential Weighted Moving Average (WSD-MEWMA) and (standard MEWMA) control charts for multivariate normal, lognormal and gamma distributions. In general, the simulation results show that the performance of the proposed WV-MEWMA chart is better than WSD-MEWMA and Standard MEWMA charts when the underlying distributions are skewed
Type I Error Rates of the Two-Sample Pseudo-Median Procedure
The performance of the pseudo-median based procedure is examined in terms of controlling Type I error for a two independent groups test. The procedure is a modification of the one-sample Wilcoxon statistic using the pseudo-median of differences between group values as the central measure of location. The proposed procedure was shown to have good control of Type I error rates under the study conditions regardless of distribution type
Winsorized modified one step m-estimator in Alexander-Govern test
This research centres on independent group test of comparing two or more means by using the parametric method, namely the Alexander-Govern test.The Alexander-Govern (AG) test uses mean as a measure of its central tendency.It is a better alternative to the Welch test, James test and the ANOVA, because it has a good control of Type I error rates and produces a high power efficient for a normal data under variance heterogeneity,
but not for non-normal data. As a result, trimmed mean was applied on the test under non-normal data for two group condition, but as the number of groups increased above two, the test fails to be robust. Due to this, when the MOM estimator was applied on the test, it was not influenced by the number of groups, but failed to give a
good control of Type I error rates under skewed heavy tailed distribution.In this research, the Winsorized MOM estimator was applied in AG test as a measure of its central tendency. 5,000 data sets were simulated and analysed using Statistical Analysis Software (SAS) The result shows that with the pairing of unbalanced sample size with unequal variance of (1:36) and the combination of both balanced and unbalanced sample sizes with
both equal and unequal variances, under six group condition, for g = 0.5 and h = 0.5, for both positive and negative pairing condition, the test gives a remarkable control of Type I error rates. In overall, the AGWMOM test has the best control of Type I error rates, across the distributions and across the groups, compared to the AG test, the AGMOM test and the ANOVA
Type I error rates of t-test and T1 statistic under balanced and unbalanced designs
The t-test used prominently for comparing means between two groups is usually restricted with the assumptions of normality and variance homogeneity. However, the violation of the assumptions occurs in many real world data. In this study, two methods of comparing the means of two samples were conducted. The first method used the traditional t-test while the second method used the TI statistic for
comparing means. The performances of the t-test and the TI statistic were evaluated under different conditions. They were sample sizes, type of distributions (normal or non-normal), and unequal group variances. The Type I error rates of the two test statistics were obtained and compared. Based on the investigations, the TI statistic was able to produce good Type I error rates with values near to the nominal level, a = 0.05
- …