22 research outputs found

    Identification of integrated proteomics and transcriptomics signature of alcohol-associated liver disease using machine learning.

    Get PDF
    Distinguishing between alcohol-associated hepatitis (AH) and alcohol-associated cirrhosis (AC) remains a diagnostic challenge. In this study, we used machine learning with transcriptomics and proteomics data from liver tissue and peripheral mononuclear blood cells (PBMCs) to classify patients with alcohol-associated liver disease. The conditions in the study were AH, AC, and healthy controls. We processed 98 PBMC RNAseq samples, 55 PBMC proteomic samples, 48 liver RNAseq samples, and 53 liver proteomic samples. First, we built separate classification and feature selection pipelines for transcriptomics and proteomics data. The liver tissue models were validated in independent liver tissue datasets. Next, we built integrated gene and protein expression models that allowed us to identify combined gene-protein biomarker panels. For liver tissue, we attained 90% nested-cross validation accuracy in our dataset and 82% accuracy in the independent validation dataset using transcriptomic data. We attained 100% nested-cross validation accuracy in our dataset and 61% accuracy in the independent validation dataset using proteomic data. For PBMCs, we attained 83% and 89% accuracy with transcriptomic and proteomic data, respectively. The integration of the two data types resulted in improved classification accuracy for PBMCs, but not liver tissue. We also identified the following gene-protein matches within the gene-protein biomarker panels: CLEC4M-CLC4M, GSTA1-GSTA2 for liver tissue and SELENBP1-SBP1 for PBMCs. In this study, machine learning models had high classification accuracy for both transcriptomics and proteomics data, across liver tissue and PBMCs. The integration of transcriptomics and proteomics into a multi-omics model yielded improvement in classification accuracy for the PBMC data. The set of integrated gene-protein biomarkers for PBMCs show promise toward developing a liquid biopsy for alcohol-associated liver disease

    Towards integrated genomics data analyses to facilitate identification of diagnostic biomarkers

    No full text
    While the total amount of genomic data has rapidly increased over the past decade, most individual biomedical research studies are still limited to small numbers of participant samples due to the high costs of recruitment, sequencing, data storage, and data analysis. This results in many data sets with a low number of samples, but a very large number of features across multiple genomic data types. Appropriately handling the small sample size data sets and integrating multiple genomic data types is essential for identifying actionable diagnostic biomarkers. The overarching goal of my dissertation is to address some of these challenges using software engineering, bioinformatics, and machine learning methods. In this document, I will cover the three major projects of my dissertation. First, I will describe A-Lister, a software tool that I developed to filter, compare, and combine items across multiple differential expression files, to facilitate data integration and feature selection. Second, I implemented a multiclass machine learning approach to classify liver disease and identify gene expression biomarkers using a transcriptomics liver disease dataset. As part of this analysis, I have implemented a variety of bioinformatic pipelines, feature selection techniques, and machine learning classifiers to classify small sample size RNAseq data. Third, I created an integrated model using both transcriptomics and proteomics data to identify a combined gene and protein biomarker panel to classify liver disease. The tools and methods developed in my dissertation are not specific to liver disease, but are intended for use with any small sample size genomics datasets to aid in biomarker discovery

    Evolving Simple Models of Diverse Intrinsic Dynamics in Hippocampal Neuron Types

    No full text
    The diversity of intrinsic dynamics observed in neurons may enhance the computations implemented in the circuit by enriching network-level emergent properties such as synchronization and phase locking. Large-scale spiking network models of entire brain regions offer a platform to test theories of neural computation and cognitive function, providing useful insights on information processing in the nervous system. However, a systematic in-depth investigation requires network simulations to capture the biological intrinsic diversity of individual neurons at a sufficient level of accuracy. The computationally efficient Izhikevich model can reproduce a wide range of neuronal behaviors qualitatively. Previous studies using optimization techniques, however, were less successful in quantitatively matching experimentally recorded voltage traces. In this article, we present an automated pipeline based on evolutionary algorithms to quantitatively reproduce features of various classes of neuronal spike patterns using the Izhikevich model. Employing experimental data from Hippocampome.org, a comprehensive knowledgebase of neuron types in the rodent hippocampus, we demonstrate that our approach reliably fit Izhikevich models to nine distinct classes of experimentally recorded spike patterns, including delayed spiking, spiking with adaptation, stuttering, and bursting. Importantly, by leveraging the parameter-exploration capabilities of evolutionary algorithms, and by representing qualitative spike pattern class definitions in the error landscape, our approach creates several suitable models for each neuron type, exhibiting appropriate feature variabilities among neurons. Moreover, we demonstrate the flexibility of our methodology by creating multi-compartment Izhikevich models for each neuron type in addition to single-point versions. Although the results presented here focus on hippocampal neuron types, the same strategy is broadly applicable to any neural systems

    Differentiating between liver diseases by applying multiclass machine learning approaches to transcriptomics of liver tissue or blood-based samples.

    No full text
    Background & aimsLiver disease carries significant healthcare burden and frequently requires a combination of blood tests, imaging, and invasive liver biopsy to diagnose. Distinguishing between inflammatory liver diseases, which may have similar clinical presentations, is particularly challenging. In this study, we implemented a machine learning pipeline for the identification of diagnostic gene expression biomarkers across several alcohol-associated and non-alcohol-associated liver diseases, using either liver tissue or blood-based samples.MethodsWe collected peripheral blood mononuclear cells (PBMCs) and liver tissue samples from participants with alcohol-associated hepatitis (AH), alcohol-associated cirrhosis (AC), non-alcohol-associated fatty liver disease, chronic HCV infection, and healthy controls. We performed RNA sequencing (RNA-seq) on 137 PBMC samples and 67 liver tissue samples. Using gene expression data, we implemented a machine learning feature selection and classification pipeline to identify diagnostic biomarkers which distinguish between the liver disease groups. The liver tissue results were validated using a public independent RNA-seq dataset. The biomarkers were computationally validated for biological relevance using pathway analysis tools.ResultsUtilizing liver tissue RNA-seq data, we distinguished between AH, AC, and healthy conditions with overall accuracies of 90% in our dataset, and 82% in the independent dataset, with 33 genes. Distinguishing 4 liver conditions and healthy controls yielded 91% overall accuracy in our liver tissue dataset with 39 genes, and 75% overall accuracy in our PBMC dataset with 75 genes.ConclusionsOur machine learning pipeline was effective at identifying a small set of diagnostic gene biomarkers and classifying several liver diseases using RNA-seq data from liver tissue and PBMCs. The methodologies implemented and genes identified in this study may facilitate future efforts toward a liquid biopsy diagnostic for liver diseases.Lay summaryDistinguishing between inflammatory liver diseases without multiple tests can be challenging due to their clinically similar characteristics. To lay the groundwork for the development of a non-invasive blood-based diagnostic across a range of liver diseases, we compared samples from participants with alcohol-associated hepatitis, alcohol-associated cirrhosis, chronic hepatitis C infection, and non-alcohol-associated fatty liver disease. We used a machine learning computational approach to demonstrate that gene expression data generated from either liver tissue or blood samples can be used to discover a small set of gene biomarkers for effective diagnosis of these liver diseases
    corecore