1,032 research outputs found

    Rank discriminants for predicting phenotypes from RNA expression

    Get PDF
    Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes and predicting clinical outcomes. Still, clinical applications remain scarce. One reason is that the complexity of the decision rules that emerge from standard statistical learning impedes biological understanding, in particular, any mechanistic interpretation. Here we explore decision rules for binary classification utilizing only the ordering of expression among several genes; the basic building blocks are then two-gene expression comparisons. The simplest example, just one comparison, is the TSP classifier, which has appeared in a variety of cancer-related discovery studies. Decision rules based on multiple comparisons can better accommodate class heterogeneity, and thereby increase accuracy, and might provide a link with biological mechanism. We consider a general framework ("rank-in-context") for designing discriminant functions, including a data-driven selection of the number and identity of the genes in the support ("context"). We then specialize to two examples: voting among several pairs and comparing the median expression in two groups of genes. Comprehensive experiments assess accuracy relative to other, more complex, methods, and reinforce earlier observations that simple classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Computer-Aided Detection and diagnosis for prostate cancer based on mono and multi-parametric MRI: A review

    No full text
    International audienceProstate cancer is the second most diagnosed cancer of men all over the world. In the last decades, new imaging techniques based on Magnetic Resonance Imaging (MRI) have been developed improving diagnosis.In practise, diagnosis can be affected by multiple factors such as observer variability and visibility and complexity of the lesions. In this regard, computer-aided detection and computer-aided diagnosis systemshave been designed to help radiologists in their clinical practice. Research on computer-aided systems specifically focused for prostate cancer is a young technology and has been part of a dynamic field ofresearch for the last ten years. This survey aims to provide a comprehensive review of the state of the art in this lapse of time, focusing on the different stages composing the work-flow of a computer-aidedsystem. We also provide a comparison between studies and a discussion about the potential avenues for future research. In addition, this paper presents a new public online dataset which is made available to theresearch community with the aim of providing a common evaluation framework to overcome some of the current limitations identified in this survey

    Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

    Get PDF
    The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research

    An ensemble approach to accurately detect somatic mutations using SomaticSeq

    Get PDF
    SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0758-2) contains supplementary material, which is available to authorized users

    Classification of clinical outcomes using high-throughput and clinical informatics.

    Get PDF
    It is widely recognized that many cancer therapies are effective only for a subset of patients. However clinical studies are most often powered to detect an overall treatment effect. To address this issue, classification methods are increasingly being used to predict a subset of patients which respond differently to treatment. This study begins with a brief history of classification methods with an emphasis on applications involving melanoma. Nonparametric methods suitable for predicting subsets of patients responding differently to treatment are then reviewed. Each method has different ways of incorporating continuous, categorical, clinical and high-throughput covariates. For nonparametric and parametric methods, distance measures specific to the method are used to make classification decisions. Approaches are outlined which employ these distances to measure treatment interactions and predict patients more sensitive to treatment. Simulations are also carried out to examine empirical power of some of these classification methods in an adaptive signature design. Results were compared with logistic regression models. It was found that parametric and nonparametric methods performed reasonably well. Relative performance of the methods depends on the simulation scenario. Finally a method was developed to evaluate power and sample size needed for an adaptive signature design in order to predict the subset of patients sensitive to treatment. It is hoped that this study will stimulate more development of nonparametric and parametric methods to predict subsets of patients responding differently to treatment

    Automated histopathological analyses at scale

    Get PDF
    Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2017.Cataloged from PDF version of thesis.Includes bibliographical references (pages 68-73).Histopathology is the microscopic examination of processed human tissues to diagnose conditions like cancer, tuberculosis, anemia and myocardial infractions. The diagnostic procedure is, however, very tedious, time-consuming and prone to misinterpretation. It also requires highly trained pathologists to operate, making it unsuitable for large-scale screening in resource-constrained settings, where experts are scarce and expensive. In this thesis, we present a software system for automated screening, backed by deep learning algorithms. This cost-effective, easily-scalable solution can be operated by minimally trained health workers and would extend the reach of histopathological analyses to settings such as rural villages, mass-screening camps and mobile health clinics. With metastatic breast cancer as our primary case study, we describe how the system could be used to test for the presence of a tumor, determine the precise location of a lesion, as well as the severity stage of a patient. We examine how the algorithms are combined into an end-to-end pipeline for utilization by hospitals, doctors and clinicians on a Software as a Service (SaaS) model. Finally, we discuss potential deployment strategies for the technology, as well an analysis of the market and distribution chain in the specific case of the current Indian healthcare ecosystem.by Mrinal Mohit.S.M

    Gd-EOB-DTP-enhanced MRC in the preoperative percutaneous management of intra and extrahepatic biliary leakages: Does it matter?

    Get PDF
    Postoperative bile leakage is a common complication of abdominal surgical procedures and a precise localization of is important to choose the best management. Many techniques are available to correctly identify bile leaks, including ultrasound (US), computed tomography (CT) or magnetic resonance imaging (MRI), being the latter the best to clearly depict "active" bile leakages. This paper presents the state of the art algorithm in the detection of biliary leakages in order to plan a percutaneous biliary drainage focusing on widely available and safe contrast agent, the Gb-EOB-DPA. We consider its pharmacokinetic properties and impact in biliary imaging explain current debates to optimize image quality. We report common sites of leakage after surgery with special considerations in cirrhotic liver to show what interventional radiologists should look to easily detect bile leaks

    Personalized risk schemes and machine learning to empower genomic prognostication models in myelodysplastic syndromes

    Get PDF
    Myelodysplastic syndromes (MDS) are characterized by variable clinical manifestations and outcomes. Several prognostic systems relying on clinical factors and cytogenetic abnormalities have been developed to help stratify MDS patients into different risk categories of distinct prognoses and therapeutic implications. The current abundance of molecular information poses the challenges of precisely defining patients' molecular profiles and their incorporation in clinically established diagnostic and prognostic schemes. Perhaps the prognostic power of the current systems can be boosted by incorporating molecular features. Machine learning (ML) algorithms can be helpful in developing more precise prognostication models that integrate complex genomic interactions at a higher dimensional level. These techniques can potentially generate automated diagnostic and prognostic models and assist in advancing personalized therapies. This review highlights the current prognostication models used in MDS while shedding light on the latest achievements in ML-based research
    corecore