20,731 research outputs found

    Diagnostic measures for linear mixed measurement error models

    Get PDF
    In this paper, we present case deletion and mean shift outlier models for linear mixed measurement error models using the corrected likelihood of Nakamura (1990). We derive the corrected score test statistic for outliers detection based on mean shift outlier models. Furthermore, several case deletion diagnostics are constructed as a tool for influence diagnostics. It is found that they can be written in terms of studentized residuals of model, error contrast matrix and the inverse of the response variable covariance matrix. Our influence diagnostics are illustrated through a real data set

    A variance shift model for outlier detection and estimation in linear and linear mixed models

    Get PDF
    Outliers are data observations that fall outside the usual conditional ranges of the response data.They are common in experimental research data, for example, due to transcription errors or faulty experimental equipment. Often outliers are quickly identified and addressed, that is, corrected, removed from the data, or retained for subsequent analysis. However, in many cases they are completely anomalous and it is unclear how to treat them. Case deletion techniques are established methods in detecting outliers in linear fixed effects analysis. The extension of these methods to detecting outliers in linear mixed models has not been entirely successful, in the literature. This thesis focuses on a variance shift outlier model as an approach to detecting and assessing outliers in both linear fixed effects and linear mixed effects analysis. A variance shift outlier model assumes a variance shift parameter, !i, for the ith observation, where !i is unknown and estimated from the data. Estimated values of !i indicate observations with possibly inflated variances relative to the remainder of the observations in the data set and hence outliers. When outliers lurk within anomalous elements in the data set, a variance shift outlier model offers an opportunity to include anomalies in the analysis, but down-weighted using the variance shift estimate Ë!i. This down-weighting might be considered preferable to omitting data points (as in case-deletion methods). For very large values of !i a variance shift outlier model is approximately equivalent to the case deletion approach. We commence with a detailed review of parameter estimation and inferential procedures for the linear mixed model. The review is necessary for the development of the variance shift outlier model as a method for detecting outliers in linear fixed and linear mixed models. This review is followed by a discussion of the status of current research into linear mixed model diagnostics. Different types of residuals in the linear mixed model are defined. A decomposition of the leverage matrix for the linear mixed model leads to interpretable leverage measures. ii A detailed review of a variance shift outlier model in linear fixed effects analysis is given. The purpose of this review is firstly, to gain insight into the general case (the linear mixed model) and secondly, to develop the model further in linear fixed effects analysis. A variance shift outlier model can be formulated as a linear mixed model so that the calculations required to estimate the parameters of the model are those associated with fitting a linear mixed model, and hence the model can be fitted using standard software packages. Likelihood ratio and score test statistics are developed as objective measures for the variance shift estimates. The proposed test statistics initially assume balanced longitudinal data with a Gaussian distributed response variable. The dependence of the proposed test statistics on the second derivatives of the log-likelihood function is also examined. For the single-case outlier in linear fixed effects analysis, analytical expressions for the proposed test statistics are obtained. A resampling algorithm is proposed for assessing the significance of the proposed test statistics and for handling the problem of multiple testing. A variance shift outlier model is then adapted to detect a group of outliers in a fixed effects model. Properties and performance of the likelihood ratio and score test statistics are also investigated. A variance shift outlier model for detecting single-case outliers is also extended to linear mixed effects analysis under Gaussian assumptions for the random effects and the random errors. The variance parameters are estimated using the residual maximum likelihood method. Likelihood ratio and score tests are also constructed for this extended model. Two distinct computing algorithms which constrain the variance parameter estimates to be positive, are given. Properties of the resulting variance parameter estimates from each computing algorithm are also investigated. A variance shift outlier model for detecting single-case outliers in linear mixed effects analysis is extended to detect groups of outliers or subjects having outlying profiles with random intercepts and random slopes that are inconsistent with the corresponding model elements for the remaining subjects in the data set. The issue of influence on the fixed effects under a variance shift outlier model is also discussed

    Perturbation and scaled Cook's distance

    Get PDF
    Cook's distance [Technometrics 19 (1977) 15-18] is one of the most important diagnostic tools for detecting influential individual or subsets of observations in linear regression for cross-sectional data. However, for many complex data structures (e.g., longitudinal data), no rigorous approach has been developed to address a fundamental issue: deleting subsets with different numbers of observations introduces different degrees of perturbation to the current model fitted to the data, and the magnitude of Cook's distance is associated with the degree of the perturbation. The aim of this paper is to address this issue in general parametric models with complex data structures. We propose a new quantity for measuring the degree of the perturbation introduced by deleting a subset. We use stochastic ordering to quantify the stochastic relationship between the degree of the perturbation and the magnitude of Cook's distance. We develop several scaled Cook's distances to resolve the comparison of Cook's distance for different subset deletions. Theoretical and numerical examples are examined to highlight the broad spectrum of applications of these scaled Cook's distances in a formal influence analysis.Comment: Published in at http://dx.doi.org/10.1214/12-AOS978 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Robust Henderson III estimators of variance components in the nested error model

    Get PDF
    Common methods for estimating variance components in Linear Mixed Models include Maximum Likelihood (ML) and Restricted Maximum Likelihood (REML). These methods are based on the strong assumption of multivariate normal distribution and it is well know that they are very sensitive to outlying observations with respect to any of the random components. Several robust altematives of these methods have been proposed (e.g. Fellner 1986, Richardson and Welsh 1995). In this work we present several robust alternatives based on the Henderson method III which do not rely on the normality assumption and provide explicit solutions for the variance components estimators. These estimators can later be used to derive robust estimators of regression coefficients. Finally, we describe an application of this procedure to small area estimation, in which the main target is the estimation of the means of areas or domains when the within-area sample sizes are small.Henderson method III, Linear mixed model, Robust estimators, Variance component estimators

    Diagnostics for joint models for longitudinal and survival data

    Get PDF
    Joint models for longitudinal and survival data are a class of models that jointly analyse an outcome repeatedly observed over time such as a bio-marker and associated event times. These models are useful in two practical applications; firstly focusing on survival outcome whilst accounting for time varying covariates measured with error and secondly focusing on the longitudinal outcome while controlling for informative censoring. Interest on the estimation of these joint models has grown in the past two and half decades. However, minimal effort has been directed towards developing diagnostic assessment tools for these models. The available diagnostic tools have mainly been based on separate analysis of residuals for the longitudinal and survival sub-models which could be sub-optimal. In this thesis we make four contributions towards the body of knowledge. We first developed influence diagnostics for the shared parameter joint model for longitudinal and survival data based on Cook's statistics. We evaluated the performance of the diagnostics using simulation studies under different scenarios. We then illustrated these diagnostics using real data set from a multi-center clinical trial on TB pericarditis (IMPI). The second contribution was to implement a variance shift outlier model (VSOM) in the two-stage joint survival model. This was achieved by identifying outlying subjects in the longitudinal sub-model and down-weighting before the second stage of the joint model. The third contribution was to develop influence diagnostics for the multivariate joint model for longitudinal and survival data. In this setting we considered two longitudinal outcomes, square root CD4 cell count which was Gaussian in nature and antiretroviral therapy (ART) uptake which was binary. We achieved this by extending the univariate case i based on Cook's statistics for all parameters. The fourth contribution was to implement influence diagnostics in joint models for longitudinal and survival data with multiple failure types (competing risk). Using IMPI data set we considered two competing events in the joint model; death and constrictive pericarditis. Using simulation studies and IMPI dataset the developed diagnostics identified influential subjects as well as observations. The performance of the diagnostics was over 98% in simulation studies. We further conducted sensitivity analyses to check the impact of influential subjects and/or observations on parameter estimates by excluding them and re-fitting the joint model. We observed subtle differences, overall in the parameter estimates, which gives confidence that the initial inferences are credible and can be relied on. We illustrated case deletion diagnostics using the IMPI trial setting, these diagnostics can also be applied to clinical trials with similar settings. We therefore make a strong recommendation to analysts to conduct influence diagnostics in the joint model for longitudinal and survival data to ascertain the reliability of parameter estimates. We also recommend the implementation of VSOM in the longitudinal part of the two-stage joint model before the second stage

    Wage Determination in Russia: An Econometric Investigation

    Full text link
    Using a firm level dataset from four regions of Russia covering 1996/97, an investigation was carried out into how the surplus created within the firm is divided between profits and wages. An efficient bargaining framework based on the work of Svejnar (1986) is employed which takes into account the alternative wage or outside option available to employees in the firm as well as the value added per employee. Statistical differences in the share of the surplus taken by employees employed in state, private and mixed forms of firms are found. In addition, the results prove sensitive to the presence of outliers and influential observations. A variety of diagnostic methods are employed to identify these influential observations and robust methods are employed to lessen the influence of them. Whereas in practice some of the diagnostic and robust methods utilised proved incapable of identifying or accommodating the gross outlier(s) in the data, the more successful methods included robust regression, Winsorising, the Hadi and Siminoff algorithm, Cook's Distance and Covratio.http://deepblue.lib.umich.edu/bitstream/2027.42/39679/3/wp295.pd

    Prevalence of Inherited Hemoglobin Disorders and Relationships with Anemia and Micronutrient Status among Children in Yaoundé and Douala, Cameroon.

    Get PDF
    Information on the etiology of anemia is necessary to design effective anemia control programs. Our objective was to measure the prevalence of inherited hemoglobin disorders (IHD) in a representative sample of children in urban Cameroon, and examine the relationships between IHD and anemia. In a cluster survey of children 12-59 months of age (n = 291) in YaoundĂ© and Douala, we assessed hemoglobin (Hb), malaria infection, and plasma indicators of inflammation and micronutrient status. Hb S was detected by HPLC, and αâșthalassemia (3.7 kb deletions) by PCR. Anemia (Hb < 110 g/L), inflammation, and malaria were present in 45%, 46%, and 8% of children. A total of 13.7% of children had HbAS, 1.6% had HbSS, and 30.6% and 3.1% had heterozygous and homozygous αâșthalassemia. The prevalence of anemia was greater among HbAS compared to HbAA children (60.3 vs. 42.0%, p = 0.038), although mean Hb concentrations did not differ, p = 0.38). Hb and anemia prevalence did not differ among children with or without single gene deletion αâșthalassemia. In multi-variable models, anemia was independently predicted by HbAS, HbSS, malaria, iron deficiency (ID; inflammation-adjusted ferritin <12 ”g/L), higher C-reactive protein, lower plasma folate, and younger age. Elevated soluble transferrin receptor concentration (>8.3 mg/L) was associated with younger age, malaria, greater mean reticulocyte counts, inflammation, HbSS genotype, and ID. IHD are prevalent but contribute modestly to anemia among children in urban Cameroon

    Perturbation selection and influence measures in local influence analysis

    Get PDF
    Cook's [J. Roy. Statist. Soc. Ser. B 48 (1986) 133--169] local influence approach based on normal curvature is an important diagnostic tool for assessing local influence of minor perturbations to a statistical model. However, no rigorous approach has been developed to address two fundamental issues: the selection of an appropriate perturbation and the development of influence measures for objective functions at a point with a nonzero first derivative. The aim of this paper is to develop a differential--geometrical framework of a perturbation model (called the perturbation manifold) and utilize associated metric tensor and affine curvatures to resolve these issues. We will show that the metric tensor of the perturbation manifold provides important information about selecting an appropriate perturbation of a model. Moreover, we will introduce new influence measures that are applicable to objective functions at any point. Examples including linear regression models and linear mixed models are examined to demonstrate the effectiveness of using new influence measures for the identification of influential observations.Comment: Published in at http://dx.doi.org/10.1214/009053607000000343 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Wage Determination in Russia: An Econometric Investigation

    Get PDF
    Using a firm level dataset from four regions of Russia covering 1996/97, an investigation was carried out into how the surplus created within the firm is divided between profits and wages. An efficient bargaining framework based on the work of Svejnar (1986) is employed which takes into account the alternative wage or outside option available to employees in the firm as well as the value added per employee. Statistical differences in the share of the surplus taken by employees employed in state, private and mixed forms of firms are found. In addition, the results prove sensitive to the presence of outliers and influential observations. A variety of diagnostic methods are employed to identify these influential observations and robust methods are employed to lessen the influence of them. Whereas in practice some of the diagnostic and robust methods utilised proved incapable of identifying or accommodating the gross outlier(s) in the data, the more successful methods included robust regression, Winsorising, the Hadi and Siminoff algorithm, Cook's Distance and Covratio.Russian labour markets, efficient bargaining, outliers, regression diagnostics, robust regression
    • 

    corecore