Search CORE

12,745 research outputs found

Rank discriminants for predicting phenotypes from RNA expression

Author: Afsari Bahman
Braga-Neto Ulisses M.
Geman Donald
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes and predicting clinical outcomes. Still, clinical applications remain scarce. One reason is that the complexity of the decision rules that emerge from standard statistical learning impedes biological understanding, in particular, any mechanistic interpretation. Here we explore decision rules for binary classification utilizing only the ordering of expression among several genes; the basic building blocks are then two-gene expression comparisons. The simplest example, just one comparison, is the TSP classifier, which has appeared in a variety of cancer-related discovery studies. Decision rules based on multiple comparisons can better accommodate class heterogeneity, and thereby increase accuracy, and might provide a link with biological mechanism. We consider a general framework ("rank-in-context") for designing discriminant functions, including a data-driven selection of the number and identity of the genes in the support ("context"). We then specialize to two examples: voting among several pairs and comparing the median expression in two groups of genes. Comprehensive experiments assess accuracy relative to other, more complex, methods, and reinforce earlier observations that simple classifiers are competitive.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS738 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Texas A&M Repository

Classification Based Analysis on Cancer Datasets Using Predictor Measures

Author: Kumar Dinesh
Raj Y. Sunil
Samuel Raj J. Adarsh
Savarimuthu Charles
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 30/07/2019
Field of study

Cancer is a life-threatening disease. Probably the most effective way to reduce cancer deaths is to detect it earlier. Diagnosing the disease earlier needs an accurate and reliable procedure which could be used by physicians to distinguish between cancer from malignant ones without leaving for surgical biopsy. Data mining offers solution for such types of the problems where a large quantity of information about patients and their conditions are stored in clinical database. This paper focuses on prediction of some such diseases like Leukemia and Breast cancers. Naïve Bayes and SVM prediction models are built for the prediction and classification. The performance of the proposed models produced significant results of above 96% while compared with other models in terms of accuracy, computational time and convergence. Keywords: Prediction, Data Mining, Diagnosis, Cancer, Naïve Bayes, Supper Vector machine (SVM). DOI: 10.7176/CEIS/10-6-05 Publication date:July 31st 201

International Institute for Science, Technology and Education (IISTE): E-Journals

Early Detection of Ovarian Cancer in Samples Pre-Diagnosis Using CA125 and MALDI-MS Peaks

Author: Burford B
Camuzeaux S
Cramer R
Devetyarov D
Ford J
Gammerman A
Gentry-Maharaj A
Hallett R
Jacobs I
Luo ZY
Mccurrie K
Menon U
Nouretdinov I
Smith C
Timms JF
Tiss A
Vovk V
Publication venue: INT INST ANTICANCER RESEARCH
Publication date: 01/01/2011
Field of study

Aim: A nested case-control discovery study was undertaken 10 test whether information within the serum peptidome can improve on the utility of CA125 for early ovarian cancer detection. Materials and Methods: High-throughput matrix-assisted laser desorption ionisation mass spectrometry (MALDI-MS) was used to profile 295 serum samples from women pre-dating their ovarian cancer diagnosis and from 585 matched control samples. Classification rules incorporating CA125 and MS peak intensities were tested for discriminating ability. Results: Two peaks were found which in combination with CA125 discriminated cases from controls up to 15 and 11 months before diagnosis, respectively, and earlier than using CA125 alone. One peak was identified as connective tissue-activating peptide III (CTAPIII), whilst the other was putatively identified as platelet factor 4 (PF4). ELISA data supported the down-regulation of PF4 in early cancer cases. Conclusion: Serum peptide information with CA125 improves lead time for early detection of ovarian cancer. The candidate markers are platelet-derived chemokines, suggesting a link between platelet function and tumour development

Central Archive at the University of Reading

LSHTM Research Online

UCL Discovery

The University of Manchester - Institutional Repository

Evolving classification of intensive care patients from event data

Author: Cassarino TG
Edgeworth J
Kozlakidis Z
Last M
Tosas O
Publication venue
Publication date: 01/01/2016
Field of study

Objective: This work aims at predicting the patient discharge outcome on each hospitalization day by introducing a new paradigm—evolving classification of event data streams. Most classification algorithms implicitly assume the values of all predictive features to be available at the time of making the prediction. This assumption does not necessarily hold in the evolving classification setting (such as intensive care patient monitoring), where we may be interested in classifying the monitored entities as early as possible, based on the attributes initially available to the classifier, and then keep refining our classification model at each time step (e.g., on daily basis) with the arrival of additional attributes. / Materials and methods: An oblivious read-once decision-tree algorithm, called information network (IN), is extended to deal with evolving classification. The new algorithm, named incremental information network (IIN), restricts the order of selected features by the temporal order of feature arrival. The IIN algorithm is compared to six other evolving classification approaches on an 8-year dataset of adult patients admitted to two Intensive Care Units (ICUs) in the United Kingdom. / Results: Retrospective study of 3452 episodes of adult patients (≥ 16 years of age) admitted to the ICUs of Guy’s and St. Thomas’ hospitals in London between 2002 and 2009. Random partition (66:34) into a development (training) set n = 2287 and validation set n = 1165. Episode-related time steps: Day 0—time of ICU admission, Day x—end of the x-th day at ICU. The most accurate decision-tree models, based on the area under curve (AUC): Day 0: IN (AUC = 0.652), Day 1: IIN (AUC = 0.660), Day 2: J48 decision-tree algorithm (AUC = 0.678), Days 3–7: regenerative IN (AUC = 0.717–0.772). Logistic regression AUC: 0.582 (Day 0)—0.827 (Day 7). / Conclusions: Our experimental results have not identified a single optimal approach for evolving classification of ICU episodes. On Days 0 and 1, the IIN algorithm has produced the simplest and the most accurate models, which incorporate the temporal order of feature arrival. However, starting with Day 2, regenerative approaches have reached better performance in terms of predictive accuracy

UCL Discovery

Oxford University Research Archive

A Framework to Discover Emerging Patterns for Application in Microarray Data

Author: Boulesteix Anne-Laure
Tutz Gerhard
Publication venue
Publication date: 01/01/2003
Field of study

Various supervised learning and gene selection methods have been used for cancer diagnosis. Most of these methods do not consider interactions between genes, although this might be interesting biologically and improve classification accuracy. Here we introduce a new CART-based method to discover emerging patterns. Emerging patterns are structures of the form (X1>a1)AND(X2<a2) that have differing frequencies in the considered classes. Interaction structures of this kind are of great interest in cancer research. Moreover, they can be used to define new variables for classification. Using simulated data sets, we show that our method allows the identification of emerging patterns with high efficiency. We also perform classification using two publicly available data sets (leukemia and colon cancer). For each data set, the method allows efficient classification as well as the identification of interesting patterns

CiteSeerX

Open Access LMU