90,443 research outputs found

    Ranking USRDS Provider-Specific SMRs from 1998-2001

    Get PDF
    Provider profiling (ranking, league tables ) is prevalent in health services research. Similarly, comparing educational institutions and identifying differentially expressed genes depend on ranking. Effective ranking procedures must be structured by a hierarchical (Bayesian) model and guided by a ranking-specific loss function, however even optimal methods can perform poorly and estimates must be accompanied by uncertainty assessments. We use the 1998-2001 Standardized Mortality Ratio (SMR) data from United States Renal Data System (USRDS) as a platform to identify issues and approaches. Our analyses extend Liu et al. (2004) by combining evidence over multiple years via an AR(1) model; by considering estimates that minimize errors in classifying providers above or below a percentile cutpoint in addition to those that minimize rank-based, squared-error loss; by considering ranks based on the posterior probability that a provider\u27s SMR exceeds a threshold; by comparing these ranks to those produced by ranking MLEs and ranking P-values associated with testing whether a provider\u27s SMR = 1; by comparing results for a parametric and a non-parametric prior; by reporting on a suite of uncertainty measures. Results show that MLE-based and hypothesis test based ranks are far from optimal, that uncertainty measures effectively calibrate performance; that in the USRDS context ranks based on single-year data perform poorly, but that performance improves substantially when using the AR(1) model; that ranks based on posterior probabilities of exceeding a properly chosen SMR threshold are essentially identical to those produced by minimizing classification loss. These findings highlight areas requiring additional research and the need to educate stakeholders on the uses and abuses of ranks; on their proper role in science and policy; on the absolute necessity of accompanying estimated ranks with uncertainty assessments and ensuring that these uncertainties influence decisions

    Enhancing the effectiveness of ligand-based virtual screening using data fusion

    Get PDF
    Data fusion is being increasingly used to combine the outputs of different types of sensor. This paper reviews the application of the approach to ligand-based virtual screening, where the sensors to be combined are functions that score molecules in a database on their likelihood of exhibiting some required biological activity. Much of the literature to date involves the combination of multiple similarity searches, although there is also increasing interest in the combination of multiple machine learning techniques. Both approaches are reviewed here, focusing on the extent to which fusion can improve the effectiveness of searching when compared with a single screening mechanism, and on the reasons that have been suggested for the observed performance enhancement

    RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs

    Full text link
    Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this paper, we provide theoretical foundations on the power and robustness for the model-free knockoffs procedure introduced recently in Cand\`{e}s, Fan, Janson and Lv (2016) in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-free knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real data set is analyzed to further assess the performance of the suggested knockoffs procedure.Comment: 37 pages, 6 tables, 9 pages supplementary materia

    Two-Sided Infinite Systems of Competing Brownian Particles

    Get PDF
    Two-sided infinite systems of Brownian particles with rank-dependent dynamics, indexed by all integers, exhibit different properties from their one-sided infinite counterparts, indexed by positive integers, and from finite systems. Consider the gap process, which is formed by spacings between adjacent particles. In stark contrast with finite and one-sided infinite systems, two-sided infinite systems can have one- or two-parameter family of stationary gap distributions, or the gap process weakly converging to zero as time goes to infinity.Comment: 32 pages. Keywords: Competing Brownian particles, gap process, weak convergence, stationary distribution, named particles, ranked particles, stochastic domination, interacting particle system

    A comparison of score, rank and probability-based fusion methods for video shot retrieval

    Get PDF
    It is now accepted that the most effective video shot retrieval is based on indexing and retrieving clips using multiple, parallel modalities such as text-matching, image-matching and feature matching and then combining or fusing these parallel retrieval streams in some way. In this paper we investigate a range of fusion methods for combining based on multiple visual features (colour, edge and texture), for combining based on multiple visual examples in the query and for combining multiple modalities (text and visual). Using three TRECVid collections and the TRECVid search task, we specifically compare fusion methods based on normalised score and rank that use either the average, weighted average or maximum of retrieval results from a discrete Jelinek-Mercer smoothed language model. We also compare these results with a simple probability-based combination of the language model results that assumes all features and visual examples are fully independent

    Robust Tests in Genome-Wide Scans under Incomplete Linkage Disequilibrium

    Full text link
    Under complete linkage disequilibrium (LD), robust tests often have greater power than Pearson's chi-square test and trend tests for the analysis of case-control genetic association studies. Robust statistics have been used in candidate-gene and genome-wide association studies (GWAS) when the genetic model is unknown. We consider here a more general incomplete LD model, and examine the impact of penetrances at the marker locus when the genetic models are defined at the disease locus. Robust statistics are then reviewed and their efficiency and robustness are compared through simulations in GWAS of 300,000 markers under the incomplete LD model. Applications of several robust tests to the Wellcome Trust Case-Control Consortium [Nature 447 (2007) 661--678] are presented.Comment: Published in at http://dx.doi.org/10.1214/09-STS314 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An Overview of Classifier Fusion Methods

    Get PDF
    A number of classifier fusion methods have been recently developed opening an alternative approach leading to a potential improvement in the classification performance. As there is little theory of information fusion itself, currently we are faced with different methods designed for different problems and producing different results. This paper gives an overview of classifier fusion methods and attempts to identify new trends that may dominate this area of research in future. A taxonomy of fusion methods trying to bring some order into the existing “pudding of diversities” is also provided
    • 

    corecore