1,700 research outputs found

    Deep Learning Models For Biomedical Data Analysis

    Get PDF
    The field of biomedical data analysis is a vibrant area of research dedicated to extracting valuable insights from a wide range of biomedical data sources, including biomedical images and genomics data. The emergence of deep learning, an artificial intelligence approach, presents significant prospects for enhancing biomedical data analysis and knowledge discovery. This dissertation focused on exploring innovative deep-learning methods for biomedical image processing and gene data analysis. During the COVID-19 pandemic, biomedical imaging data, including CT scans and chest x-rays, played a pivotal role in identifying COVID-19 cases by categorizing patient chest x-ray outcomes as COVID-19-positive or negative. While supervised deep learning methods have effectively recognized COVID-19 patterns in chest x-ray datasets, the availability of annotated training data remains limited. To address this challenge, the thesis introduced a semi-supervised deep learning model named ssResNet, built upon the Residual Neural Network (ResNet) architecture. The model combines supervised and unsupervised paths, incorporating a weighted supervised loss function to manage data imbalance. The strategies to diminish prediction uncertainty in deep learning models for critical applications like medical image processing is explore. It achieves this through an ensemble deep learning model, integrating bagging deep learning and model calibration techniques. This ensemble model not only boosts biomedical image segmentation accuracy but also reduces prediction uncertainty, as validated on a comprehensive chest x-ray image segmentation dataset. Furthermore, the thesis introduced an ensemble model integrating Proformer and ensemble learning methodologies. This model constructs multiple independent Proformers for predicting gene expression, their predictions are combined through weighted averaging to generate final predictions. Experimental outcomes underscore the efficacy of this ensemble model in enhancing prediction performance across various metrics. In conclusion, this dissertation advances biomedical data analysis by harnessing the potential of deep learning techniques. It devises innovative approaches for processing biomedical images and gene data. By leveraging deep learning\u27s capabilities, this work paves the way for further progress in biomedical data analytics and its applications within clinical contexts. Index Terms- biomedical data analysis, COVID-19, deep learning, ensemble learning, gene data analytics, medical image segmentation, prediction uncertainty, Proformer, Residual Neural Network (ResNet), semi-supervised learning

    Learn to Generate Time Series Conditioned Graphs with Generative Adversarial Nets

    Full text link
    Deep learning based approaches have been utilized to model and generate graphs subjected to different distributions recently. However, they are typically unsupervised learning based and unconditioned generative models or simply conditioned on the graph-level contexts, which are not associated with rich semantic node-level contexts. Differently, in this paper, we are interested in a novel problem named Time Series Conditioned Graph Generation: given an input multivariate time series, we aim to infer a target relation graph modeling the underlying interrelationships between time series with each node corresponding to each time series. For example, we can study the interrelationships between genes in a gene regulatory network of a certain disease conditioned on their gene expression data recorded as time series. To achieve this, we propose a novel Time Series conditioned Graph Generation-Generative Adversarial Networks (TSGG-GAN) to handle challenges of rich node-level context structures conditioning and measuring similarities directly between graphs and time series. Extensive experiments on synthetic and real-word gene regulatory networks datasets demonstrate the effectiveness and generalizability of the proposed TSGG-GAN

    SeqNet: An R Package for Generating Gene-Gene Networks and Simulating RNA-Seq Data

    Get PDF
    Gene expression data provide an abundant resource for inferring connections in gene regulatory networks. While methodologies developed for this task have shown success, a challenge remains in comparing the performance among methods. Gold-standard datasets are scarce and limited in use. And while tools for simulating expression data are available, they are not designed to resemble the data obtained from RNA-seq experiments. SeqNet is an R package that provides tools for generating a rich variety of gene network structures and simulating RNA-seq data from them. This produces in silico RNA-seq data for benchmarking and assessing gene network inference methods. The package is available from the Comprehensive R Archive Network at https://CRAN.R-project.org/package= SeqNet and on GitHub at https://github.com/tgrimes/SeqNet

    GENE REGULATORY NETWORK INFERENCE USING K-NEAREST-NEIGHBOR BASED MUTUAL INFORMATION AND THREE-NODE NETWORK CLASSIFICATION USING DIMENSIONALITY REDUCTION AND MACHINE LEARNING

    Get PDF
    Background: A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past twenty years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization. Results: In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution using k-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov-Stoögbauer-Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Third, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods. Finally, we compare our three newly developed methods to classify three-node motifs: (i) MI and Z-score profiles, (ii) Dimensionality reduction by PCA and clustering using K-means, (iii) Supervised machine learning algorithms using MI input data. We show that at least 22 different 3-node motifs in-silico and 16 motifs on E.coli experimental data can be distinguished by using all 2d and 3d MI quantities and without any a priori knowledge of the regulator (source) genes. Conclusions: Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction - which combines CMIA, and the KSG-MI estimator - achieves an improvement of 20-35% in precision-recall measures over the current gold standard in the field. Validated on E. coli gene expression data, our method for three-node motifs classification achieves more than 60% overall accuracy, with 9 network motifs reaching as high as 80-100% precision. This new methods will enable researchers to discover new gene interactions or choose gene candidates for experimental validations

    Outlook Magazine, Winter 2012

    Get PDF
    https://digitalcommons.wustl.edu/outlook/1188/thumbnail.jp

    Transcription factor binding specificity and occupancy : elucidation, modelling and evaluation

    Get PDF
    The major contributions of this thesis are addressing the need for an objective quality evaluation of a transcription factor binding model, demonstrating the value of the tools developed to this end and elucidating how in vitro and in vivo information can be utilized to improve TF binding specificity models. Accurate elucidation of TF binding specificity remains an ongoing challenge in gene regulatory research. Several in vitro and in vivo experimental techniques have been developed followed by a proliferation of algorithms, and ultimately, the binding models. This increase led to a choice problem for the end users: which tools to use, and which is the most accurate model for a given TF? Therefore, the first section of this thesis investigates the motif assessment problem: how scoring functions, choice and processing of benchmark data, and statistics used in evaluation affect motif ranking. This analysis revealed that TF motif quality assessment requires a systematic comparative analysis, and that scoring functions used have a TF-specific effect on motif ranking. These results advised the design of a Motif Assessment and Ranking Suite MARS, supported by PBM and ChIP-seq benchmark data and an extensive collection of PWM motifs. MARS implements consistency, enrichment, and scoring and classification-based motif evaluation algorithms. Transcription factor binding is also influenced and determined by contextual factors: chromatin accessibility, competition or cooperation with other TFs, cell line or condition specificity, binding locality (e.g. proximity to transcription start sites) and the shape of the binding site (DNA-shape). In vitro techniques do not capture such context; therefore, this thesis also combines PBM and DNase-seq data using a comparative k-mer enrichment approach that compares open chromatin with genome-wide prevalence, achieving a modest performance improvement when benchmarked on ChIP-seq data. Finally, since statistical and probabilistic methods cannot capture all the information that determine binding, a machine learning approach (XGBooost) was implemented to investigate how the features contribute to TF specificity and occupancy. This combinatorial approach improves the predictive ability of TF specificity models with the most predictive feature being chromatin accessibility, while the DNA-shape and conservation information all significantly improve on the baseline model of k-mer and DNase data. The results and the tools introduced in this thesis are useful for systematic comparative analysis (via MARS) and a combinatorial approach to modelling TF binding specificity, including appropriate feature engineering practices for machine learning modelling

    Engineering stress resilient plants using gene regulatory network rewiring

    Get PDF
    In spite of advances in food production brought on by the Green Revolution, the challenge of providing access to nutritious, safe food that has been grown sustainably is considerable. One such barrier to food security is biotic stress - infection with pathogens such as bacteria, fungi and oomycetes impact negatively on plant growth and survival. Synthetic biology, an interdisciplinary field combining biology, engineering and mathematics, is a promising tool for understanding and developing stress tolerant plants. The response of the model plant Arabidopsis thaliana to biotic and abiotic stresses involves the transcriptional reprogramming of thousands of genes. Among these differentially expressed genes are transcription factors, which form complex causal networks specific to the stress in question. This thesis focuses on network rewiring as a tool for enhancing the Arabidopsis response to stress, in particular to Botrytis cinerea infection. This is a model system for studying plant-necrotrophic pathogen interactions and as such, a large amount of data are available, including a high-resolution transcriptomic time series of Arabidopsis during B. cinerea infection. This was used to construct gene regulatory networks with hundreds of transcription factors that are differentially expressed, in order to obtain a systems view of the effects of infection and the relationships between these regulators. Rewiring was applied to subnetworks of the original network using two different methodologies: control engineering, and Gaussian process dynamical systems. The former focuses on eliminating the effects of perturbation on a single node in a small 9-gene network, and requires detailed parameterisation of biological processes such as mRNA degradation and transcription rates. The latter provides a general modelling framework for optimising the overall expression of genes in a larger 70 gene subnetwork that eschews parameterisation or definition of a precise function for modelling relationships between genes. The process of generating stably transformed and rewired Arabidopsis is long and requires growing hundreds of plants for each construct. In order to test the hypotheses generated by such computational tools quickly and on a large scale, Arabidopsis protoplasts treated with chitin were trialled as a model system for studying plant defence responses to B. cinerea. RNAseq analysis of protoplasts was used to determine the similarities and differences between the defence responses triggered in protoplasts and in Arabidopsis plants. Both protoplasts and plants were also rewired, and gene expression measurements used to understand the effects of this genetic engineering on the defence response of each
    corecore