37 research outputs found

    Non-transitivity of the Win Ratio and Area Under the Receiver Operating Characteristics Curve (AUC): a case for evaluating the strength of stochastic comparisons

    Full text link
    The win ratio (WR) is a novel statistic used in randomized controlled trials that can account for hierarchies within event outcomes. In this paper we report and study the long-run non-transitive behavior of the win ratio and the closely related Area Under the Receiver Operating Characteristics Curve (AUC) and argue that their transitivity cannot be taken for granted. Crucially, traditional within-group statistics (i.e., comparison of means) are always transitive, while the WR can detect non-transitivity. Non-transitivity provides valuable information on the stochastic relationship between two treatment groups, which should be tested and reported. We specify the necessary conditions for transitivity, the sufficient conditions for non-transitivity and demonstrate non-transitivity in a real-life large randomized controlled trial for the WR of time-to-death. Our results can be used to rule out or evaluate possibility of non-transitivity and show the importance of studying the strength of stochastic relationships

    Quantum approximate Bayesian computation for NMR model inference

    Full text link
    Recent technological advances may lead to the development of small scale quantum computers capable of solving problems that cannot be tackled with classical computers. A limited number of algorithms has been proposed and their relevance to real world problems is a subject of active investigation. Analysis of many-body quantum system is particularly challenging for classical computers due to the exponential scaling of Hilbert space dimension with the number of particles. Hence, solving problems relevant to chemistry and condensed matter physics are expected to be the first successful applications of quantum computers. In this paper, we propose another class of problems from the quantum realm that can be solved efficiently on quantum computers: model inference for nuclear magnetic resonance (NMR) spectroscopy, which is important for biological and medical research. Our results are based on the cumulation of three interconnected studies. Firstly, we use methods from classical machine learning to analyze a dataset of NMR spectra of small molecules. We perform a stochastic neighborhood embedding and identify clusters of spectra, and demonstrate that these clusters are correlated with the covalent structure of the molecules. Secondly, we propose a simple and efficient method, aided by a quantum simulator, to extract the NMR spectrum of any hypothetical molecule described by a parametric Heisenberg model. Thirdly, we propose an efficient variational Bayesian inference procedure for extracting Hamiltonian parameters of experimentally relevant NMR spectra

    Statistical Workflow for Feature Selection in Human Metabolomics Data

    Get PDF
    High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity underlying human health and disease. Large-scale metabolomics data sources, generated using either targeted or nontargeted platforms, are becoming more common. Appropriate statistical analysis of these complex high-dimensional data will be critical for extracting meaningful results from such large-scale human metabolomics studies. Therefore, we consider the statistical analytical approaches that have been employed in prior human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we o ff er a step-by-step framework for pursuing statistical analyses of cohort-based human metabolomics data, with a focus on feature selection. We discuss the range of options and approaches that may be employed at each stage of data management, analysis, and interpretation and o ff er guidance on the analytical decisions that need to be considered over the course of implementing a data analysis workflow. Certain pervasive analytical challenges facing the field warrant ongoing focused research. Addressing these challenges, particularly those related to analyzing human metabolomics data, will allow for more standardization of as well as advances in how research in the field is practiced. In turn, such major analytical advances will lead to substantial improvements in the overall contributions of human metabolomics investigations

    Age of onset and cumulative risk of mental disorders:a cross-national analysis of population surveys from 29 countries

    Get PDF
    Background: Information on the frequency and timing of mental disorder onsets across the lifespan is of fundamental importance for public health planning. Broad, cross-national estimates of this information from coordinated general population surveys were last updated in 2007. We aimed to provide updated and improved estimates of age-of-onset distributions, lifetime prevalence, and morbid risk. Methods: In this cross-national analysis, we analysed data from respondents aged 18 years or older to the World Mental Health surveys, a coordinated series of cross-sectional, face-to-face community epidemiological surveys administered between 2001 and 2022. In the surveys, the WHO Composite International Diagnostic Interview, a fully structured psychiatric diagnostic interview, was used to assess age of onset, lifetime prevalence, and morbid risk of 13 DSM-IV mental disorders until age 75 years across surveys by sex. We did not assess ethnicity. The surveys were geographically clustered and weighted to adjust for selection probability, and standard errors of incidence rates and cumulative incidence curves were calculated using the jackknife repeated replications simulation method, taking weighting and geographical clustering of data into account. Findings: We included 156 331 respondents from 32 surveys in 29 countries, including 12 low-income and middle-income countries and 17 high-income countries, and including 85 308 (54·5%) female respondents and 71 023 (45·4%) male respondents. The lifetime prevalence of any mental disorder was 28·6% (95% CI 27·9–29·2) for male respondents and 29·8% (29·2–30·3) for female respondents. Morbid risk of any mental disorder by age 75 years was 46·4% (44·9–47·8) for male respondents and 53·1% (51·9–54·3) for female respondents. Conditional probabilities of first onset peaked at approximately age 15 years, with a median age of onset of 19 years (IQR 14–32) for male respondents and 20 years (12–36) for female respondents. The two most prevalent disorders were alcohol use disorder and major depressive disorder for male respondents and major depressive disorder and specific phobia for female respondents. Interpretation: By age 75 years, approximately half the population can expect to develop one or more of the 13 mental disorders considered in this Article. These disorders typically first emerge in childhood, adolescence, or young adulthood. Services should have the capacity to detect and treat common mental disorders promptly and to optimise care that suits people at these crucial parts of the life course. Funding: None.</p

    Impact of new variables on discrimination of risk prediction models

    Full text link
    Thesis (Ph.D.)--Boston UniversityPLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at [email protected]. Thank you.Risk prediction models for binary outcomes (such as the Framingham Risk Score for cardiovascular disease or the Gail Model for 5 year risk of breast cancer) have become the standard tools for health practitioners and policy makers. Rapid scientific progress in genetics and biochemistry has led to numerous new variables being proposed as candidates to improve existing models. Quality of risk prediction models is usually measured by the area under the receiver operating characteristic curve (AUC). Increase of AUC is used to evaluate how much added new variable contributes to model performance. However, the following paradox has been often reported in the literature: the new predictor is statistically significant in the multivariable model, but does not lead to a statistically significant change in the AUC. In the first part of this thesis we prove that the paradox outlined above is not true when data is normally distributed. We demonstrate that in this setting statistical significance of the new predictor(s) is always equivalent to the statistical significance of the increase in the AUC. In the second part, we show rigorously that the DeLong test, which is typically used to compare two AUCs, is invalid for nested models for any distribution of the data and for general type of risk prediction models, including logistic regression. Invalidity is the likely explanation for the paradox outlined above and results in DeLong test being overly conservative. In the third part of the thesis we focus on understanding what kind of statistical properties of the new predictor are beneficial for model performance. Using multivariate normal data we prove that contrary to common wisdom new variables uncorrelated with the old risk score are not always the strongest contributors to discrimination while negatively correlated ones are always beneficial. We also show that new predictor that has very high multiple R-square when linearly regressed on the old predictors can also be beneficial for risk prediction model. All results are illustrated using real-life Framingham data and conclusions and future direction are presented at the end.2031-01-0
    corecore