1,032 research outputs found
Rank discriminants for predicting phenotypes from RNA expression
Statistical methods for analyzing large-scale biomolecular data are
commonplace in computational biology. A notable example is phenotype prediction
from gene expression data, for instance, detecting human cancers,
differentiating subtypes and predicting clinical outcomes. Still, clinical
applications remain scarce. One reason is that the complexity of the decision
rules that emerge from standard statistical learning impedes biological
understanding, in particular, any mechanistic interpretation. Here we explore
decision rules for binary classification utilizing only the ordering of
expression among several genes; the basic building blocks are then two-gene
expression comparisons. The simplest example, just one comparison, is the TSP
classifier, which has appeared in a variety of cancer-related discovery
studies. Decision rules based on multiple comparisons can better accommodate
class heterogeneity, and thereby increase accuracy, and might provide a link
with biological mechanism. We consider a general framework ("rank-in-context")
for designing discriminant functions, including a data-driven selection of the
number and identity of the genes in the support ("context"). We then specialize
to two examples: voting among several pairs and comparing the median expression
in two groups of genes. Comprehensive experiments assess accuracy relative to
other, more complex, methods, and reinforce earlier observations that simple
classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Computer-Aided Detection and diagnosis for prostate cancer based on mono and multi-parametric MRI: A review
International audienceProstate cancer is the second most diagnosed cancer of men all over the world. In the last decades, new imaging techniques based on Magnetic Resonance Imaging (MRI) have been developed improving diagnosis.In practise, diagnosis can be affected by multiple factors such as observer variability and visibility and complexity of the lesions. In this regard, computer-aided detection and computer-aided diagnosis systemshave been designed to help radiologists in their clinical practice. Research on computer-aided systems specifically focused for prostate cancer is a young technology and has been part of a dynamic field ofresearch for the last ten years. This survey aims to provide a comprehensive review of the state of the art in this lapse of time, focusing on the different stages composing the work-flow of a computer-aidedsystem. We also provide a comparison between studies and a discussion about the potential avenues for future research. In addition, this paper presents a new public online dataset which is made available to theresearch community with the aim of providing a common evaluation framework to overcome some of the current limitations identified in this survey
Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics
The Random Forest (RF) algorithm by Leo Breiman has become a
standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research
An ensemble approach to accurately detect somatic mutations using SomaticSeq
SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0758-2) contains supplementary material, which is available to authorized users
Classification of clinical outcomes using high-throughput and clinical informatics.
It is widely recognized that many cancer therapies are effective only for a subset of patients. However clinical studies are most often powered to detect an overall treatment effect. To address this issue, classification methods are increasingly being used to predict a subset of patients which respond differently to treatment. This study begins with a brief history of classification methods with an emphasis on applications involving melanoma. Nonparametric methods suitable for predicting subsets of patients responding differently to treatment are then reviewed. Each method has different ways of incorporating continuous, categorical, clinical and high-throughput covariates. For nonparametric and parametric methods, distance measures specific to the method are used to make classification decisions. Approaches are outlined which employ these distances to measure treatment interactions and predict patients more sensitive to treatment. Simulations are also carried out to examine empirical power of some of these classification methods in an adaptive signature design. Results were compared with logistic regression models. It was found that parametric and nonparametric methods performed reasonably well. Relative performance of the methods depends on the simulation scenario. Finally a method was developed to evaluate power and sample size needed for an adaptive signature design in order to predict the subset of patients sensitive to treatment. It is hoped that this study will stimulate more development of nonparametric and parametric methods to predict subsets of patients responding differently to treatment
Automated histopathological analyses at scale
Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2017.Cataloged from PDF version of thesis.Includes bibliographical references (pages 68-73).Histopathology is the microscopic examination of processed human tissues to diagnose conditions like cancer, tuberculosis, anemia and myocardial infractions. The diagnostic procedure is, however, very tedious, time-consuming and prone to misinterpretation. It also requires highly trained pathologists to operate, making it unsuitable for large-scale screening in resource-constrained settings, where experts are scarce and expensive. In this thesis, we present a software system for automated screening, backed by deep learning algorithms. This cost-effective, easily-scalable solution can be operated by minimally trained health workers and would extend the reach of histopathological analyses to settings such as rural villages, mass-screening camps and mobile health clinics. With metastatic breast cancer as our primary case study, we describe how the system could be used to test for the presence of a tumor, determine the precise location of a lesion, as well as the severity stage of a patient. We examine how the algorithms are combined into an end-to-end pipeline for utilization by hospitals, doctors and clinicians on a Software as a Service (SaaS) model. Finally, we discuss potential deployment strategies for the technology, as well an analysis of the market and distribution chain in the specific case of the current Indian healthcare ecosystem.by Mrinal Mohit.S.M
Gd-EOB-DTP-enhanced MRC in the preoperative percutaneous management of intra and extrahepatic biliary leakages: Does it matter?
Postoperative bile leakage is a common complication of abdominal surgical procedures and a precise localization of is important to choose the best management. Many techniques are available to correctly identify bile leaks, including ultrasound (US), computed tomography (CT) or magnetic resonance imaging (MRI), being the latter the best to clearly depict "active" bile leakages. This paper presents the state of the art algorithm in the detection of biliary leakages in order to plan a percutaneous biliary drainage focusing on widely available and safe contrast agent, the Gb-EOB-DPA. We consider its pharmacokinetic properties and impact in biliary imaging explain current debates to optimize image quality. We report common sites of leakage after surgery with special considerations in cirrhotic liver to show what interventional radiologists should look to easily detect bile leaks
Personalized risk schemes and machine learning to empower genomic prognostication models in myelodysplastic syndromes
Myelodysplastic syndromes (MDS) are characterized by variable clinical manifestations and outcomes. Several prognostic systems relying on clinical factors and cytogenetic abnormalities have been developed to help stratify MDS patients into different risk categories of distinct prognoses and therapeutic implications. The current abundance of molecular information poses the challenges of precisely defining patients' molecular profiles and their incorporation in clinically established diagnostic and prognostic schemes. Perhaps the prognostic power of the current systems can be boosted by incorporating molecular features. Machine learning (ML) algorithms can be helpful in developing more precise prognostication models that integrate complex genomic interactions at a higher dimensional level. These techniques can potentially generate automated diagnostic and prognostic models and assist in advancing personalized therapies. This review highlights the current prognostication models used in MDS while shedding light on the latest achievements in ML-based research
- …