72 research outputs found

    동시조절 유전적 상호작용 발굴을 위한 하이퍼그래프 모델

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2014. 2. 장병탁.A comprehensive understanding of biological systems requires the analysis of higher-order interactions among many genomic factors. Various genomic factors cooperate to affect biological processes including cancer occurrence, progression and metastasis. However, the complexity of genomic interactions presents a major barrier to identifying their co-regulatory roles and functional effects. Thus, this dissertation addresses the problem of analyzing complex relationships among many genomic factors in biological processes including cancers. We propose a hypergraph approach for modeling, learning and extracting: explicitly modeling higher-order genomic interactions, efficiently learning based on evolutionary methods, and effectively extracting biological knowledge from the model. A hypergraph model is a higher-order graphical model explicitly representing complex relationships among many variables from high-dimensional data. This property allows the proposed model to be suitable for the analysis of biological and medical phenomena characterizing higher-order interactions between various genomic factors. This dissertation proposes the advanced hypergraph-based models in terms of the learning methods and the model structures to analyze large-scale biological data focusing on identifying co-regulatory genomic interactions on a genome-wide level. We introduce an evolutionary approach based on information-theoretic criteria into the learning mechanisms for efficiently searching a huge problem space reflecting higher-order interactions between factors. This evolutionary learning is explained from the perspective of a sequential Bayesian sampling framework. Also, a hierarchy is introduced into the hypergraph model for modeling hierarchical genomic relationships. This hierarchical structure allows the hypergraph model to explicitly represent gene regulatory circuits as functional blocks or groups across the level of epigenetic, transcriptional, and post-transcriptional regulation. Moreover, the proposed graph-analyzing method is able to grasp the global structures of biological systems such as genomic modules and regulatory networks by analyzing the learned model structures. The proposed model is applied to analyzing cancer genomics considered as a major topic in current biology and medicine. We show that the performance of our model competes with or outperforms state-of-the-art models on multiple cancer genomic data. Furthermore, the propose model is capable of discovering new or hidden patterns as candidates of potential gene regulatory circuits such as gene modules, miRNA-mRNA networks, and multiple genomic interactions, associated with the specific cancer. The results of these analysis can provide several crucial evidences that can pave the way for identifying unknown functions in the cancer system. The proposed hypergraph model will contribute to elucidating core regulatory mechanisms and to comprehensive understanding of biological processes including cancers.Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .i 1 Introduction 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problems to be Addressed . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 The Proposed Approach and its Contribution . . . . . . . . . . . . . . 4 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . 6 2 Related Work 2.1 Analysis of Co-Regulatory Genomic Interactions from Omics Data . . 9 2.2 Probabilistic Graphical Models for Biological Problems . . . . . . . . 11 2.2.1 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Higher-order Graphical Models for Biological Problems . . . . . . . . 16 2.3.1 Higher-Order Models . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.2 Hypergraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Hypergraph Classifiers for Identifying Prognostic Modules in Cancer 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Analyzing Gene Modules for Cancer Prognosis Prediction . . . . . . 24 3.3 Hypergraph Classifiers for Identifying Cancer Gene Modules . . . . 26 3.3.1 Hypergraph Classifiers . . . . . . . . . . . . . . . . . . . . . . 26 3.3.2 Bayesian Evolutionary Algorithm . . . . . . . . . . . . . . . . 27 3.3.3 Bayesian Evolutionary Learning for Hypergraph Classifiers . 29 3.4 Predicting Cancer Clinical Outcomes Based on Gene Modules . . . . 34 3.4.1 Data and Experimental Settings . . . . . . . . . . . . . . . . . 34 3.4.2 Prediction Performance . . . . . . . . . . . . . . . . . . . . . . 36 3.4.3 Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4.4 Identification of Prognostic Gene Modules . . . . . . . . . . . 44 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4 Hypergraph-based Models for Constructing Higher-Order miRNA-mRNA Interaction Networks in Cancer 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Analyzing Relationships between miRNAs and mRNAs from Heterogeneous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Hypergraph-based Models for Identifying miRNA-mRNA Interactions 57 4.3.1 Hypergraph-based Models . . . . . . . . . . . . . . . . . . . . 57 4.3.2 Learning Hypergraph-based Models . . . . . . . . . . . . . . . 61 4.3.3 Building Interaction Networks from Hypergraphs . . . . . . . 64 4.4 Constructing miRNA-mRNA Interaction Networks Based on Higher- Order Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Data and Experimental Settings . . . . . . . . . . . . . . . . . 66 4.4.2 Classification Performance . . . . . . . . . . . . . . . . . . . . 68 4.4.3 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 70 CONTENTS iii 4.4.4 Constructed Higher-Order miRNA-mRNA Interaction Networks in Prostate Cancer . . . . . . . . . . . . . . . . . . . . . 74 4.4.5 Functional Analysis of the Constructed Interaction Networks 78 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5 Hierarchical Hypergraphs for Identifying Higher-Order Genomic Interactions in Multilevel Regulation 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2 Analyzing Epigenetic and Genetic Interactions from Multiple Genomic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3 Hierarchical Hypergraphs for Identifying Epigenetic and Genetic Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.3.1 Hierarchical Hypergraphs . . . . . . . . . . . . . . . . . . . . . 92 5.3.2 Learning Hierarchical Hypergraphs . . . . . . . . . . . . . . . 95 5.4 Identifying Higher-Order Genomic Interactions in Multilevel Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.4.1 Data and Experimental Settings . . . . . . . . . . . . . . . . . 100 5.4.2 Identified Higher-Order miRNA-mRNA Interactions Induced by DNA Methylation in Ovarian Cancer . . . . . . . . . . . . 102 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6 Concluding Remarks 6.1 Summary of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 107 6.2 Directions for Further Research . . . . . . . . . . . . . . . . . . . . . . 109 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 초록 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132Docto

    Integration of multi-scale protein interactions for biomedical data analysis

    Get PDF
    With the advancement of modern technologies, we observe an increasing accumulation of biomedical data about diseases. There is a need for computational methods to sift through and extract knowledge from the diverse data available in order to improve our mechanistic understanding of diseases and improve patient care. Biomedical data come in various forms as exemplified by the various omics data. Existing studies have shown that each form of omics data gives only partial information on cells state and motivated jointly mining multi-omics, multi-modal data to extract integrated system knowledge. The interactome is of particular importance as it enables the modelling of dependencies arising from molecular interactions. This Thesis takes a special interest in the multi-scale protein interactome and its integration with computational models to extract relevant information from biomedical data. We define multi-scale interactions at different omics scale that involve proteins: pairwise protein-protein interactions, multi-protein complexes, and biological pathways. Using hypergraph representations, we motivate considering higher-order protein interactions, highlighting the complementary biological information contained in the multi-scale interactome. Based on those results, we further investigate how those multi-scale protein interactions can be used as either prior knowledge, or auxiliary data to develop machine learning algorithms. First, we design a neural network using the multi-scale organization of proteins in a cell into biological pathways as prior knowledge and train it to predict a patient's diagnosis based on transcriptomics data. From the trained models, we develop a strategy to extract biomedical knowledge pertaining to the diseases investigated. Second, we propose a general framework based on Non-negative Matrix Factorization to integrate the multi-scale protein interactome with multi-omics data. We show that our approach outperforms the existing methods, provide biomedical insights and relevant hypotheses for specific cancer types

    동적 멀티모달 데이터 학습을 위한 심층 하이퍼네트워크

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 장병탁.Recent advancements in information communication technology has led the explosive increase of data. Dissimilar to traditional data which are structured and unimodal, in particular, the characteristics of recent data generated from dynamic environments are summarized as high-dimensionality, multimodality, and structurelessness as well as huge-scale size. The learning from non-stationary multimodal data is essential for solving many difficult problems in artificial intelligence. However, despite many successful reports, existing machine learning methods have mainly focused on solving practical problems represented by large-scaled but static databases, such as image classification, tagging, and retrieval. Hypernetworks are a probabilistic graphical model representing empirical distribution, using a hypergraph structure that is a large collection of many hyperedges encoding the associations among variables. This representation allows the model to be suitable for characterizing the complex relationships between features with a population of building blocks. However, since a hypernetwork is represented by a huge combinatorial feature space, the model requires a large number of hyperedges for handling the multimodal large-scale data and thus faces the scalability problem. In this dissertation, we propose a deep architecture of hypernetworks for dealing with the scalability issue for learning from multimodal data with non-stationary properties such as videos, i.e., deep hypernetworks. Deep hypernetworks handle the issues through the abstraction at multiple levels using a hierarchy of multiple hypergraphs. We use a stochastic method based on Monte-Carlo simulation, a graph MC, for efficiently constructing hypergraphs representing the empirical distribution of the observed data. The structure of a deep hypernetwork continuously changes as the learning proceeds, and this flexibility is contrasted to other deep learning models. The proposed model incrementally learns from the data, thus handling the nonstationary properties such as concept drift. The abstract representations in the learned models play roles of multimodal knowledge on data, which are used for the content-aware crossmodal transformation including vision-language conversion. We view the vision-language conversion as a machine translation, and thus formulate the vision-language translation in terms of the statistical machine translation. Since the knowledge on the video stories are used for translation, we call this story-aware vision-language translation. We evaluate deep hypernetworks on large-scale vision-language multimodal data including benmarking datasets and cartoon video series. The experimental results show the deep hypernetworks effectively represent visual-linguistic information abstracted at multiple levels of the data contents as well as the associations between vision and language. We explain how the introduction of a hierarchy deals with the scalability and non-stationary properties. In addition, we present the story-aware vision-language translation on cartoon videos by generating scene images from sentences and descriptive subtitles from scene images. Furthermore, we discuss the meaning of our model for lifelong learning and the improvement direction for achieving human-level artificial intelligence.1 Introduction 1.1 Background and Motivation 1.2 Problems to be Addressed 1.3 The Proposed Approach and its Contribution 1.4 Organization of the Dissertation 2 RelatedWork 2.1 Multimodal Leanring 2.2 Models for Learning from Multimodal Data 2.2.1 Topic Model-Based Multimodal Leanring 2.2.2 Deep Network-based Multimodal Leanring 2.3 Higher-Order Graphical Models 2.3.1 Hypernetwork Models 2.3.2 Bayesian Evolutionary Learning of Hypernetworks 3 Multimodal Hypernetworks for Text-to-Image Retrievals 3.1 Overview 3.2 Hypernetworks for Multimodal Associations 3.2.1 Multimodal Hypernetworks 3.2.2 Incremental Learning of Multimodal Hypernetworks 3.3 Text-to-Image Crossmodal Inference 3.3.1 Representatation of Textual-Visual Data 3.3.2 Text-to-Image Query Expansion 3.4 Text-to-Image Retrieval via Multimodal Hypernetworks 3.4.1 Data and Experimental Settings 3.4.2 Text-to-Image Retrieval Performance 3.4.3 Incremental Learning for Text-to-Image Retrieval 3.5 Summary 4 Deep Hypernetworks for Multimodal Cocnept Learning from Cartoon Videos 4.1 Overview 4.2 Visual-Linguistic Concept Representation of Catoon Videos 4.3 Deep Hypernetworks for Modeling Visual-Linguistic Concepts 4.3.1 Sparse Population Coding 4.3.2 Deep Hypernetworks for Concept Hierarchies 4.3.3 Implication of Deep Hypernetworks on Cognitive Modeling 4.4 Learning of Deep Hypernetworks 4.4.1 Problem Space of Deep Hypernetworks 4.4.2 Graph Monte-Carlo Simulation 4.4.3 Learning of Concept Layers 4.4.4 Incremental Concept Construction 4.5 Incremental Concept Construction from Catoon Videos 4.5.1 Data Description and Parameter Setup 4.5.2 Concept Representation and Development 4.5.3 Character Classification via Concept Learning 4.5.4 Vision-Language Conversion via Concept Learning 4.6 Summary 5 Story-awareVision-LanguageTranslation usingDeepConcept Hiearachies 5.1 Overview 5.2 Vision-Language Conversion as a Machine Translation 5.2.1 Statistical Machine Translation 5.2.2 Vision-Language Translation 5.3 Story-aware Vision-Language Translation using Deep Concept Hierarchies 5.3.1 Story-aware Vision-Language Translation 5.3.2 Vision-to-Language Translation 5.3.3 Language-to-Vision Translation 5.4 Story-aware Vision-Language Translation on Catoon Videos 5.4.1 Data and Experimental Setting 5.4.2 Scene-to-Sentence Generation 5.4.3 Sentence-to-Scene Generation 5.4.4 Visual-Linguistic Story Summarization of Cartoon Videos 5.5 Summary 6 Concluding Remarks 6.1 Summary of the Dissertation 6.2 Directions for Further Research Bibliography 한글초록Docto

    AI in drug discovery and its clinical relevance

    Get PDF
    The COVID-19 pandemic has emphasized the need for novel drug discovery process. However, the journey from conceptualizing a drug to its eventual implementation in clinical settings is a long, complex, and expensive process, with many potential points of failure. Over the past decade, a vast growth in medical information has coincided with advances in computational hardware (cloud computing, GPUs, and TPUs) and the rise of deep learning. Medical data generated from large molecular screening profiles, personal health or pathology records, and public health organizations could benefit from analysis by Artificial Intelligence (AI) approaches to speed up and prevent failures in the drug discovery pipeline. We present applications of AI at various stages of drug discovery pipelines, including the inherently computational approaches of de novo design and prediction of a drug's likely properties. Open-source databases and AI-based software tools that facilitate drug design are discussed along with their associated problems of molecule representation, data collection, complexity, labeling, and disparities among labels. How contemporary AI methods, such as graph neural networks, reinforcement learning, and generated models, along with structure-based methods, (i.e., molecular dynamics simulations and molecular docking) can contribute to drug discovery applications and analysis of drug responses is also explored. Finally, recent developments and investments in AI-based start-up companies for biotechnology, drug design and their current progress, hopes and promotions are discussed in this article.  Other InformationPublished in:HeliyonLicense: https://creativecommons.org/licenses/by/4.0/See article on publisher's website: https://doi.org/10.1016/j.heliyon.2023.e17575 </p

    Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare

    Get PDF
    Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has been widely adopted as an imputation for missing data due to its robustness and simplicity and it is also a promising method to outperform other machine learning methods. This paper provides a comprehensive review of different imputation techniques used to replace the missing data. The goal of the review paper is to bring specific attention to potential improvements to existing methods and provide readers with a better grasps of imputation technique trends

    Systematic review on missing data imputation techniques with machine learning algorithms for healthcare

    Get PDF
    Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has been widely adopted as an imputation for missing data due to its robustness and simplicity and it is also a promising method to outperform other machine learning methods. This paper provides a comprehensive review of different imputation techniques used to replace the missing data. The goal of the review paper is to bring specific attention to potential improvements to existing methods and provide readers with a better grasps of imputation technique trends

    Pacific Symposium on Biocomputing 2023

    Get PDF
    The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field

    Explainable clinical decision support system: opening black-box meta-learner algorithm expert's based

    Get PDF
    Mathematical optimization methods are the basic mathematical tools of all artificial intelligence theory. In the field of machine learning and deep learning the examples with which algorithms learn (training data) are used by sophisticated cost functions which can have solutions in closed form or through approximations. The interpretability of the models used and the relative transparency, opposed to the opacity of the black-boxes, is related to how the algorithm learns and this occurs through the optimization and minimization of the errors that the machine makes in the learning process. In particular in the present work is introduced a new method for the determination of the weights in an ensemble model, supervised and unsupervised, based on the well known Analytic Hierarchy Process method (AHP). This method is based on the concept that behind the choice of different and possible algorithms to be used in a machine learning problem, there is an expert who controls the decisionmaking process. The expert assigns a complexity score to each algorithm (based on the concept of complexity-interpretability trade-off) through which the weight with which each model contributes to the training and prediction phase is determined. In addition, different methods are presented to evaluate the performance of these algorithms and explain how each feature in the model contributes to the prediction of the outputs. The interpretability techniques used in machine learning are also combined with the method introduced based on AHP in the context of clinical decision support systems in order to make the algorithms (black-box) and the results interpretable and explainable, so that clinical-decision-makers can take controlled decisions together with the concept of "right to explanation" introduced by the legislator, because the decision-makers have a civil and legal responsibility of their choices in the clinical field based on systems that make use of artificial intelligence. No less, the central point is the interaction between the expert who controls the algorithm construction process and the domain expert, in this case the clinical one. Three applications on real data are implemented with the methods known in the literature and with those proposed in this work: one application concerns cervical cancer, another the problem related to diabetes and the last one focuses on a specific pathology developed by HIV-infected individuals. All applications are supported by plots, tables and explanations of the results, implemented through Python libraries. The main case study of this thesis regarding HIV-infected individuals concerns an unsupervised ensemble-type problem, in which a series of clustering algorithms are used on a set of features and which in turn produce an output used again as a set of meta-features to provide a set of labels for each given cluster. The meta-features and labels obtained by choosing the best algorithm are used to train a Logistic regression meta-learner, which in turn is used through some explainability methods to provide the value of the contribution that each algorithm has had in the training phase. The use of Logistic regression as a meta-learner classifier is motivated by the fact that it provides appreciable results and also because of the easy explainability of the estimated coefficients

    Simulation Intelligence: Towards a New Generation of Scientific Methods

    Full text link
    The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science
    corecore