8 research outputs found

    A permutation test for determining significance of clusters with applications to spatial and gene expression data

    No full text
    Hierarchical clustering is a common procedure for identifying structure in a data set, and this is frequently used for organizing genomic data. Although more advanced clustering algorithms are available, the simplicity and visual appeal of hierarchical clustering has made it ubiquitous in gene expression data analysis. Hence, even minor improvements in this framework would have significant impact. There is currently no simple and systematic way of assessing and displaying the significance of various clusters in a resulting dendrogram without making certain distributional assumptions or ignoring gene-specific variances. In this work, we introduce a permutation test based on comparing the within-cluster structure of the observed data with those of sample datasets obtained by permuting the cluster membership. We carry out this test at each node of the dendrogram using a statistic derived from the singular value decomposition of variance matrices. The p-values thus obtained provide insight into the significance of each cluster division. Given these values, one can also modify the dendrogram by combining non-significant branches. By adjusting the cut-off level of significance for branches, one can produce dendrograms with a desired level of detail for ease of interpretation. We demonstrate the usefulness of this approach by applying it to illustrative data sets

    A permutation test for determining significance of clusters with applications to spatial and gene expression data

    No full text
    Hierarchical clustering is a common procedure for identifying structure in a dataset, and this is frequently used for organizing genomic data. Although more advanced clustering algorithms are available, the simplicity and visual appeal of hierarchical clustering have made it ubiquitous in gene expression data analysis. Hence, even minor improvements in this framework would have significant impact. There is currently no simple and systematic way of assessing and displaying the significance of various clusters in a resulting dendrogram without making certain distributional assumptions or ignoring gene-specific variances. In this work, we introduce a permutation test based on comparing the within-cluster structure of the observed data with those of sample datasets obtained by permuting the cluster membership. We carry out this test at each node of the dendrogram using a statistic derived from the singular value decomposition of variance matrices. The p-values thus obtained provide insight into the significance of each cluster division. Given these values, one can also modify the dendrogram by combining non-significant branches. By adjusting the cut-off level of significance for branches, one can produce dendrograms with a desired level of detail for ease of interpretation. We demonstrate the usefulness of this approach by applying it to illustrative datasets.

    Risk Adjustment for Lumbar Dysfunction: Comparison of Linear Mixed Models With and Without Inclusion of Between-Clinic Variation as a Random Effect

    No full text
    Background Valid comparison of patient outcomes of physical therapy care requires risk adjustment for patient characteristics using statistical models. Because patients are clustered within clinics, results of risk adjustment models are likely to be biased by random, unobserved between-clinic differences. Such bias could lead to inaccurate prediction and interpretation of outcomes. Purpose The purpose of this study was to determine if including between-clinic variation as a random effect would improve the performance of a risk adjustment model for patient outcomes following physical therapy for low back dysfunction. Design This was a secondary analysis of data from a longitudinal cohort of 147,623 patients with lumbar dysfunction receiving physical therapy in 1,470 clinics in 48 states of the United States. Methods Three linear mixed models predicting patients\u27 functional status (FS) at discharge, controlling for FS at intake, age, sex, number of comorbidities, surgical history, and health care payer, were developed. Models were: (1) a fixed-effect model, (2) a random-intercept model that allowed clinics to have different intercepts, and (3) a random-slope model that allowed different intercepts and slopes for each clinic. Goodness of fit, residual error, and coefficient estimates were compared across the models. Results The random-effect model fit the data better and explained an additional 11% to 12% of the between-patient differences compared with the fixed-effect model. Effects of payer, acuity, and number of comorbidities were confounded by random clinic effects. Limitations Models may not have included some variables associated with FS at discharge. The clinics studied may not be representative of all US physical therapy clinics. Conclusions Risk adjustment models for functional outcome of patients with lumbar dysfunction that control for between-clinic variation performed better than a model that does not

    Identifying multidrug resistant tuberculosis transmission hotspots using routinely collected data

    No full text
    In most countries with large drug resistant tuberculosis epidemics, only those cases that are at highest risk of having MDRTB receive a drug sensitivity test (DST) at the time of diagnosis. Because of this prioritized testing, identification of MDRTB transmission hotspots in communities where TB cases do not receive DST is challenging, as any observed aggregation of MDRTB may reflect systematic differences in how testing is distributed in communities. We introduce a new disease mapping method, which estimates this missing information through probability-weighted locations, to identify geographic areas of increased risk of MDRTB transmission. We apply this method to routinely collected data from two districts in Lima, Peru over three consecutive years. This method identifies an area in the eastern part of Lima where previously untreated cases have increased risk of MDRTB. This may indicate an area of increased transmission of drug resistant disease, a finding that may otherwise have been missed by routine analysis of programmatic data. The risk of MDR among retreatment cases is also highest in these probable transmission hotspots, though a high level of MDR among retreatment cases is present throughout the study area. Identifying potential multidrug resistant tuberculosis (MDRTB) transmission hotspots may allow for targeted investigation and deployment of resources

    Methods used in the spatial analysis of tuberculosis epidemiology: a systematic review

    No full text
    corecore