140,954 research outputs found

    Knowledge-based variable selection for learning rules from proteomic data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The incorporation of biological knowledge can enhance the analysis of biomedical data. We present a novel method that uses a proteomic knowledge base to enhance the performance of a rule-learning algorithm in identifying putative biomarkers of disease from high-dimensional proteomic mass spectral data. In particular, we use the Empirical Proteomics Ontology Knowledge Base (EPO-KB) that contains previously identified and validated proteomic biomarkers to select <it>m/z</it>s in a proteomic dataset prior to analysis to increase performance.</p> <p>Results</p> <p>We show that using EPO-KB as a pre-processing method, specifically selecting all biomarkers found only in the biofluid of the proteomic dataset, reduces the dimensionality by 95% and provides a statistically significantly greater increase in performance over no variable selection and random variable selection.</p> <p>Conclusion</p> <p>Knowledge-based variable selection even with a sparsely-populated resource such as the EPO-KB increases overall performance of rule-learning for disease classification from high-dimensional proteomic mass spectra.</p

    Incorporating Figure Captions and Descriptive Text into Mesh Term Indexing: A Deep Learning Approach

    Get PDF
    The exponential increase of available documents online makes document classification an important application in natural language processing. The goal of text classification is to automatically assign categories to documents. Traditional text classifiers depend on features, such as, vocabulary and user-specified information which mainly relies on prior knowledge. In contrast, deep learning automatically learns effective features from data instead of adopting human-designed features. In this thesis, we specifically focus on biomedical document classification. Beyond text information from abstract and title, we also consider image and table captions, as well as paragraphs associated with images and tables, which we demonstrate to be an important feature source to our method

    Transfer Learning for Bayesian Case Detection Systems

    Get PDF
    In this age of big biomedical data, a variety of data has been produced worldwide. If we could combine that data more effectively, we might well develop a deeper understanding of biomedical problems and their solutions. Compared to traditional machine learning techniques, transfer learning techniques explicitly model differences among origins of data to provide a smooth transfer of knowledge. Most techniques focus on the transfer of data, while more recent techniques have begun to explore the possibility of transfer of models. Model-transfer techniques are especially appealing in biomedicine because they involve fewer privacy risks. Unfortunately, most model-transfer techniques are unable to handle heterogeneous scenarios where models differ in the features they contain, which occur commonly with biomedical data. This dissertation develops an innovative transfer learning framework to share both data and models under a variety of conditions, while allowing the inclusion of features that are unique to and informative about the target context. I used both synthetic and real-world datasets to test two hypotheses: 1) a transfer learning model that is learned using source knowledge and target data performs classification in the target context better than a target model that is learned solely from target data; 2) a transfer learning model performs classification in the target context better than a source model. I conducted a comprehensive analysis to investigate conditions where these two hypotheses hold, and more generally the factors that affect the effectiveness of transfer learning, providing empirical opinions about when and what to share. My research enables knowledge sharing under heterogeneous scenarios and provides an approach for understanding transfer learning performance in terms of differences of features, distributions, and sample sizes between source and target. The model-transfer algorithm can be viewed as a new Bayesian network learning algorithm with a flexible representation of prior knowledge. In concrete terms, this work shows the potential for transfer learning to assist in the rapid development of a case detection system for an emergent unknown disease. More generally, to my knowledge, this research is the first investigation of model-based transfer learning in biomedicine under heterogeneous scenarios

    LITERATURE MINING SUSTAINS AND ENHANCES KNOWLEDGE DISCOVERY FROM OMIC STUDIES

    Get PDF
    Genomic, proteomic and other experimentally generated data from studies of biological systems aiming to discover disease biomarkers are currently analyzed without sufficient supporting evidence from the literature due to complexities associated with automated processing. Extracting prior knowledge about markers associated with biological sample types and disease states from the literature is tedious, and little research has been performed to understand how to use this knowledge to inform the generation of classification models from ‘omic’ data. Using pathway analysis methods to better understand the underlying biology of complex diseases such as breast and lung cancers is state-of-the-art. However, the problem of how to combine literature-mining evidence with pathway analysis evidence is an open problem in biomedical informatics research. This dissertation presents a novel semi-automated framework, named Knowledge Enhanced Data Analysis (KEDA), which incorporates the following components: 1) literature mining of text; 2) classification modeling; and 3) pathway analysis. This framework aids researchers in assigning literature-mining-based prior knowledge values to genes and proteins associated with disease biology. It incorporates prior knowledge into the modeling of experimental datasets, enriching the development process with current findings from the scientific community. New knowledge is presented in the form of lists of known disease-specific biomarkers and their accompanying scores obtained through literature mining of millions of lung and breast cancer abstracts. These scores can subsequently be used as prior knowledge values in Bayesian modeling and pathway analysis. Ranked, newly discovered biomarker-disease-biofluid relationships which identify biomarker specificity across biofluids are presented. A novel method of identifying biomarker relationships is discussed that examines the attributes from the best-performing models. Pathway analysis results from the addition of prior information, ultimately lead to more robust evidence for pathway involvement in diseases of interest based on statistically significant standard measures of impact factor and p-values. The outcome of implementing the KEDA framework is enhanced modeling and pathway analysis findings. Enhanced knowledge discovery analysis leads to new disease-specific entities and relationships that otherwise would not have been identified. Increased disease understanding, as well as identification of biomarkers for disease diagnosis, treatment, or therapy targets should ultimately lead to validation and clinical implementation

    Cell Type Classification Via Deep Learning On Single-Cell Gene Expression Data

    Get PDF
    Single-cell sequencing is a recently advanced revolutionary technology which enables researchers to obtain genomic, transcriptomic, or multi-omics information through gene expression analysis. It gives the advantage of analyzing highly heterogenous cell type information compared to traditional sequencing methods, which is gaining popularity in the biomedical area. Moreover, this analysis can help for early diagnosis and drug development of tumor cells, and cancer cell types. In the workflow of gene expression data profiling, identification of the cell types is an important task, but it faces many challenges like the curse of dimensionality, sparsity, batch effect, and overfitting. However, these challenges can be overcome by performing a feature selection technique which selects more relevant features by reducing feature dimensions. In this research work, recurrent neural network-based feature selection model is proposed to extract relevant features from high dimensional, and low sample size data. Moreover, a deep learning-based gene embedding model is also proposed to reduce data sparsity of single-cell data for cell type identification. The proposed frameworks have been implemented with different architectures of recurrent neural networks, and demonstrated via real-world micro-array datasets and single-cell RNA-seq data and observed that the proposed models perform better than other feature selection models. A semi-supervised model is also implemented using the same workflow of gene embedding concept since labeling data is very cumbersome, time consuming, and requires manual effort and expertise in the field. Therefore, different ratios of labeled data are used in the experiment to validate the concept. Experimental results show that the proposed semi-supervised approach represents very encouraging performance even though a limited number of labeled data is used via the gene embedding concept. In addition, graph attention based autoencoder model has also been studied to learn the latent features by incorporating prior knowledge with gene expression data for cell type classification. Index Terms — Single-Cell Gene Expression Data, Gene Embedding, Semi-Supervised model, Incorporate Prior Knowledge, Gene-gene Interaction Network, Deep Learning, Graph Auto Encode

    A Robust Interpretable Deep Learning Classifier for Heart Anomaly Detection Without Segmentation

    Full text link
    Traditionally, abnormal heart sound classification is framed as a three-stage process. The first stage involves segmenting the phonocardiogram to detect fundamental heart sounds; after which features are extracted and classification is performed. Some researchers in the field argue the segmentation step is an unwanted computational burden, whereas others embrace it as a prior step to feature extraction. When comparing accuracies achieved by studies that have segmented heart sounds before analysis with those who have overlooked that step, the question of whether to segment heart sounds before feature extraction is still open. In this study, we explicitly examine the importance of heart sound segmentation as a prior step for heart sound classification, and then seek to apply the obtained insights to propose a robust classifier for abnormal heart sound detection. Furthermore, recognizing the pressing need for explainable Artificial Intelligence (AI) models in the medical domain, we also unveil hidden representations learned by the classifier using model interpretation techniques. Experimental results demonstrate that the segmentation plays an essential role in abnormal heart sound classification. Our new classifier is also shown to be robust, stable and most importantly, explainable, with an accuracy of almost 100% on the widely used PhysioNet dataset
    • …
    corecore