19 research outputs found

    Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. Arrays are now being used to study DNA methylation at a large number of loci; for example, the Illumina GoldenGate platform assesses DNA methylation at 1505 loci associated with over 800 cancer-related genes. Model-based cluster analysis is often used to identify DNA methylation subgroups in data, but it is unclear how to cluster DNA methylation data from arrays in a scalable and reliable manner.</p> <p>Results</p> <p>We propose a novel model-based recursive-partitioning algorithm to navigate clusters in a beta mixture model. We present simulations that show that the method is more reliable than competing nonparametric clustering approaches, and is at least as reliable as conventional mixture model methods. We also show that our proposed method is more computationally efficient than conventional mixture model approaches. We demonstrate our method on the normal tissue samples and show that the clusters are associated with tissue type as well as age.</p> <p>Conclusion</p> <p>Our proposed recursively-partitioned mixture model is an effective and computationally efficient method for clustering DNA methylation data.</p

    Using a summary measure for multiple quality indicators in primary care: the Summary QUality InDex (SQUID)

    Get PDF
    BACKGROUND: Assessing the quality of primary care is becoming a priority in national healthcare agendas. Audit and feedback on healthcare quality performance indicators can help improve the quality of care provided. In some instances, fewer numbers of more comprehensive indicators may be preferable. This paper describes the use of the Summary Quality Index (SQUID) in tracking quality of care among patients and primary care practices that use an electronic medical record (EMR). All practices are part of the Practice Partner Research Network, representing over 100 ambulatory care practices throughout the United States. METHODS: The SQUID is comprised of 36 process and outcome measures, all of which are obtained from the EMR. This paper describes algorithms for the SQUID calculations, various statistical properties, and use of the SQUID within the context of a multi-practice quality improvement (QI) project. RESULTS: At any given time point, the patient-level SQUID reflects the proportion of recommended care received, while the practice-level SQUID reflects the average proportion of recommended care received by that practice's patients. Using quarterly reports, practice- and patient-level SQUIDs are provided routinely to practices within the network. The SQUID is responsive, exhibiting highly significant (p < 0.0001) increases during a major QI initiative, and its internal consistency is excellent (Cronbach's alpha = 0.93). Feedback from physicians has been extremely positive, providing a high degree of face validity. CONCLUSION: The SQUID algorithm is feasible and straightforward, and provides a useful QI tool. Its statistical properties and clear interpretation make it appealing to providers, health plans, and researchers

    Global tests for multiple binary outcomes

    Full text link
    The applied statistician often encounters the need to compare two or more groups with respect to more than one outcome or response. Several options are generally available, including reducing the dimension of the problem by averaging or summarizing the outcomes, using Bonferroni or other adjustments for multiple comparisons, or applying a global test based on a suitable multivariate model. For normally distributed data, it is well established that global tests tend to be significantly more sensitive than other procedures. While global tests have also been proposed for multiple binary outcomes, their properties have not been well studied nor have they been widely discussed in the context of clustered data. In this paper, we derive a class of quasi-likelihood score tests for multiple binary outcomes, and show that special cases of this class correspond to other tests that have been proposed. We discuss extensions to allow for clustered data, and compare the results to the simple approach of collapsing the data to a single binary outcome, indicating the presence or absence of at least one response. The asymptotic relative efficiencies of the tests are shown to depend not only on the correlation between the outcomes, but also on the response probabilities. Although global tests based on a multivariate model are generally recommended, our findings suggest that a test based on the collapsed data can maintain surprisingly high efficiency, especially when the outcomes of interest are rare. Data from several developmental toxicity studies illustrate our results

    Efficiency and power of tests for multiple binary outcomes

    Full text link
    Global tests provide a useful tool for comparing two or more groups with respect to multiple correlated outcomes. We adapt and compare the performance of tests that have been suggested for use with multiple continuous outcomes to the case of multiple binary outcomes. Comparisons and guidelines are based on asymptotic relative efficiencies (ARE’s) and simulations. These results are illustrated using an application from teratology. We extend the work of Lefkopoulou and Ryan to include general M-group comparisons alternatives where group effects may differ for each outcome. A concise form for this general class of score tests is derived. To compute the ARE’s for this class of tests, we devise a useful characterization of the alternative space based on multivariate polar coordinates. Our findings indicate that the common outcome effect tests are efficient for a remarkably large range of circumstances. A simple formula applies to compute the maximum number of unaffected outcomes that can be included in a set of outcomes for which the common outcome effect tests remain more efficient than those derived under multidimensional alternatives. For comparison, other global tests are also considered in the simulations: two tests based on resampling (maximal and minimal z tests), a rank-sum test, a generalized least squares test, and a test based on collapsing multiple endpoints to a single binary outcome. Besides the common outcome effect tests, the resampling tests and the rank-sum test are found to perform very well for the cases under consideration. © 1995 Taylor & Francis Group, LLC
    corecore