3,878 research outputs found

    Semi-supervised Classification of Breast Cancer Expression Profiles Using Neural Networks

    Get PDF
    In classification tasks of biological data, there are usually fewer labeled than unlabeled samples because labeling samples is costly or time-consuming. In addition, labeled data sets can be re-used in different contexts as additional unlabeled data sets. For example, when searching the Gene Expression Omnibus (GEO) repository for microarray data sets of drug sensitivity and resistance experiments, the largest one has 2,522 samples, but the median has only 12 samples. In machine learning in general, utilizing unlabeled data in classification tasks is called semi-supervised learning. Artificial neural networks can be used to pre-train on unlabeled data before fine-tuning via back-propagation with labeled data. Such artificial neural networks enabling deep learning have gained attention since around 2010, since when they have been among the best-performing algorithms in visual object recognition. We measured accuracies in the task of classifying tissue taken from breast cancer patients at reductive surgery as chemotherapy-resistant or -sensitive. Different data sets were constructed by subsampling from GEO data set GSE25055 and GSE25065. Using these data sets, we compared classification accuracy of the neural networks autoencoder, Restricted Boltzmann Machine, Deep Belief Network (DBN) and support vector machine (SVM), and Transductive SVM (TSVM). Training was done both in supervised and semi-supervised mode. For the neural networks, we tried several different network architectures. Smoothing the validation set accuracies obtained during training iterations to alleviate low sample numbers helped in model selection of the best classifier. We also investigated the effect of different normalization procedures on the classification accuracy. The data were normalized with either RMA or MAS5, followed by either no batch-effect correction or Combat batch-effect correction. Only MAS5 profited from added Combat batch-effect correction, but normalization with RMA alone yielded the best classification accuracy. We were particularly interested whether classification accuracies improve when adding unlabeled samples in semi-supervised learning. Overall, neural networks and support vector machines performed similar. We found a slight improvement of classification accuracy when the number of unlabeled samples presented to DBN and TSVM was increased to the maximal number of samples in our data sets. However, this effect was only observed when the learning algorithms were presented the expression values of all 22,283 genes, not just the 500 most variable genes

    Machine learning methods for histopathological image analysis

    Full text link
    Abundant accumulation of digital histopathological images has led to the increased demand for their analysis, such as computer-aided diagnosis using machine learning techniques. However, digital pathological images and related tasks have some issues to be considered. In this mini-review, we introduce the application of digital pathological image analysis using machine learning algorithms, address some problems specific to such analysis, and propose possible solutions.Comment: 23 pages, 4 figure

    CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules

    Get PDF
    Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class

    Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Get PDF
    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach

    Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

    Get PDF
    Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL maps are derived through computational staining using a convolutional neural network trained to classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and correlation with overall survival. TIL map structural patterns were grouped using standard histopathological parameters. These patterns are enriched in particular T cell subpopulations derived from molecular measures. TIL densities and spatial structure were differentially enriched among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for the TCGA image archives with insights into the tumor-immune microenvironment

    Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A review

    Full text link
    Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning has enabled in-depth analysis of rich multi-omics profiles and medical imaging for cancer diagnosis and prognosis. Despite these advancements, machine learning models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to surmount these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art machine learning studies that adopted the fusion of biomedical knowledge and data, termed knowledge-informed machine learning, for cancer diagnosis and prognosis. Emphasizing the properties inherent in four primary data types including clinical, imaging, molecular, and treatment data, we highlight modeling considerations relevant to these contexts. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions to advance cancer research through knowledge-informed machine learning.Comment: 41 pages, 4 figures, 2 table
    • …
    corecore