12,745 research outputs found

    Rank discriminants for predicting phenotypes from RNA expression

    Get PDF
    Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes and predicting clinical outcomes. Still, clinical applications remain scarce. One reason is that the complexity of the decision rules that emerge from standard statistical learning impedes biological understanding, in particular, any mechanistic interpretation. Here we explore decision rules for binary classification utilizing only the ordering of expression among several genes; the basic building blocks are then two-gene expression comparisons. The simplest example, just one comparison, is the TSP classifier, which has appeared in a variety of cancer-related discovery studies. Decision rules based on multiple comparisons can better accommodate class heterogeneity, and thereby increase accuracy, and might provide a link with biological mechanism. We consider a general framework ("rank-in-context") for designing discriminant functions, including a data-driven selection of the number and identity of the genes in the support ("context"). We then specialize to two examples: voting among several pairs and comparing the median expression in two groups of genes. Comprehensive experiments assess accuracy relative to other, more complex, methods, and reinforce earlier observations that simple classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Classification Based Analysis on Cancer Datasets Using Predictor Measures

    Get PDF
    Cancer is a life-threatening disease. Probably the most effective way to reduce cancer deaths is to detect it earlier. Diagnosing the disease earlier needs an accurate and reliable procedure which could be used by physicians to distinguish between cancer from malignant ones without leaving for surgical biopsy. Data mining offers solution for such types of the problems where a large quantity of information about patients and their conditions are stored in clinical database. This paper focuses on prediction of some such diseases like Leukemia and Breast cancers. Naïve Bayes and SVM prediction models are built for the prediction and classification. The performance of the proposed models produced significant results of above 96% while compared with other models in terms of accuracy, computational time and convergence. Keywords: Prediction, Data Mining, Diagnosis, Cancer, Naïve Bayes, Supper Vector machine (SVM). DOI: 10.7176/CEIS/10-6-05 Publication date:July 31st 201

    Early Detection of Ovarian Cancer in Samples Pre-Diagnosis Using CA125 and MALDI-MS Peaks

    Get PDF
    Aim: A nested case-control discovery study was undertaken 10 test whether information within the serum peptidome can improve on the utility of CA125 for early ovarian cancer detection. Materials and Methods: High-throughput matrix-assisted laser desorption ionisation mass spectrometry (MALDI-MS) was used to profile 295 serum samples from women pre-dating their ovarian cancer diagnosis and from 585 matched control samples. Classification rules incorporating CA125 and MS peak intensities were tested for discriminating ability. Results: Two peaks were found which in combination with CA125 discriminated cases from controls up to 15 and 11 months before diagnosis, respectively, and earlier than using CA125 alone. One peak was identified as connective tissue-activating peptide III (CTAPIII), whilst the other was putatively identified as platelet factor 4 (PF4). ELISA data supported the down-regulation of PF4 in early cancer cases. Conclusion: Serum peptide information with CA125 improves lead time for early detection of ovarian cancer. The candidate markers are platelet-derived chemokines, suggesting a link between platelet function and tumour development

    Evolving classification of intensive care patients from event data

    Get PDF
    Objective: This work aims at predicting the patient discharge outcome on each hospitalization day by introducing a new paradigm—evolving classification of event data streams. Most classification algorithms implicitly assume the values of all predictive features to be available at the time of making the prediction. This assumption does not necessarily hold in the evolving classification setting (such as intensive care patient monitoring), where we may be interested in classifying the monitored entities as early as possible, based on the attributes initially available to the classifier, and then keep refining our classification model at each time step (e.g., on daily basis) with the arrival of additional attributes. / Materials and methods: An oblivious read-once decision-tree algorithm, called information network (IN), is extended to deal with evolving classification. The new algorithm, named incremental information network (IIN), restricts the order of selected features by the temporal order of feature arrival. The IIN algorithm is compared to six other evolving classification approaches on an 8-year dataset of adult patients admitted to two Intensive Care Units (ICUs) in the United Kingdom. / Results: Retrospective study of 3452 episodes of adult patients (≥ 16 years of age) admitted to the ICUs of Guy’s and St. Thomas’ hospitals in London between 2002 and 2009. Random partition (66:34) into a development (training) set n = 2287 and validation set n = 1165. Episode-related time steps: Day 0—time of ICU admission, Day x—end of the x-th day at ICU. The most accurate decision-tree models, based on the area under curve (AUC): Day 0: IN (AUC = 0.652), Day 1: IIN (AUC = 0.660), Day 2: J48 decision-tree algorithm (AUC = 0.678), Days 3–7: regenerative IN (AUC = 0.717–0.772). Logistic regression AUC: 0.582 (Day 0)—0.827 (Day 7). / Conclusions: Our experimental results have not identified a single optimal approach for evolving classification of ICU episodes. On Days 0 and 1, the IIN algorithm has produced the simplest and the most accurate models, which incorporate the temporal order of feature arrival. However, starting with Day 2, regenerative approaches have reached better performance in terms of predictive accuracy

    A Framework to Discover Emerging Patterns for Application in Microarray Data

    Get PDF
    Various supervised learning and gene selection methods have been used for cancer diagnosis. Most of these methods do not consider interactions between genes, although this might be interesting biologically and improve classification accuracy. Here we introduce a new CART-based method to discover emerging patterns. Emerging patterns are structures of the form (X1>a1)AND(X2<a2) that have differing frequencies in the considered classes. Interaction structures of this kind are of great interest in cancer research. Moreover, they can be used to define new variables for classification. Using simulated data sets, we show that our method allows the identification of emerging patterns with high efficiency. We also perform classification using two publicly available data sets (leukemia and colon cancer). For each data set, the method allows efficient classification as well as the identification of interesting patterns
    • …
    corecore