77 research outputs found

    Cancer classification in the genomic era: five contemporary problems

    Full text link
    Abstract Classification is an everyday instinct as well as a full-fledged scientific discipline. Throughout the history of medicine, disease classification is central to how we develop knowledge, make diagnosis, and assign treatment. Here, we discuss the classification of cancer and the process of categorizing cancer subtypes based on their observed clinical and biological features. Traditionally, cancer nomenclature is primarily based on organ location, e.g., “lung cancer” designates a tumor originating in lung structures. Within each organ-specific major type, finer subgroups can be defined based on patient age, cell type, histological grades, and sometimes molecular markers, e.g., hormonal receptor status in breast cancer or microsatellite instability in colorectal cancer. In the past 15+ years, high-throughput technologies have generated rich new data regarding somatic variations in DNA, RNA, protein, or epigenomic features for many cancers. These data, collected for increasingly large tumor cohorts, have provided not only new insights into the biological diversity of human cancers but also exciting opportunities to discover previously unrecognized cancer subtypes. Meanwhile, the unprecedented volume and complexity of these data pose significant challenges for biostatisticians, cancer biologists, and clinicians alike. Here, we review five related issues that represent contemporary problems in cancer taxonomy and interpretation. (1) How many cancer subtypes are there? (2) How can we evaluate the robustness of a new classification system? (3) How are classification systems affected by intratumor heterogeneity and tumor evolution? (4) How should we interpret cancer subtypes? (5) Can multiple classification systems co-exist? While related issues have existed for a long time, we will focus on those aspects that have been magnified by the recent influx of complex multi-omics data. Exploration of these problems is essential for data-driven refinement of cancer classification and the successful application of these concepts in precision medicine.http://deepblue.lib.umich.edu/bitstream/2027.42/134599/1/40246_2015_Article_49.pd

    유전체 서열 분석에서 고차 관계의 진화적 기계학습

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 협동과정 생물정보학전공, 2014. 2. 장병탁.One of the basic research goals in life science is to understand the complex relationships between biological factors and phenotypes, and to identify the various factors affecting the phenotype. In particular, genomic sequences play a significant role in determining the phenotype, such as gene expression and a susceptibility to disease, so the studies for the fundamental information stored in genome is essential to understanding biological processes. Previous genomic sequence analyses mainly focused on identification of a single associated factor or pairwise relationships with significant effects. Recent development of high-throughput technologies has made it possible to identify the causal factors by carrying out genome-wide analysis. However, it still remains as a challenge to discover higher-order interactions of multiple factors because this involves huge search spaces and computational costs. In this dissertation, we develop effective methods for identifying the higher-order relationships of sequence elements affecting the phenotype, by combining statistical learning with evolutionary computation. The methods are applied to finding the associated combinatorial factors and dysfunctional modules in various genome-wide sequence analysis problems. Firstly, we show statistical learning-based methods to detect co-regulatory sequence motifs and to investigate combinatorial effects of DNA methylation, affecting on downstream gene expression. Next, to examine the sequence datasets with a huge number of attributes on human genome, we apply evolutionary computation approaches. Our methods search the problem feature space based on machine learning techniques using training datasets in evolutionary computation processes and are able to find candidate solution well in computationally expensive optimization problems. The experimental results show that the approaches are useful to find the higher-order relationships associated to disease using genomic and epigenomic datasets. In conclusion, our studies would provide practical methods to analyze complex interactions among sequence elements in genomic/epigenomic studies.Abstract i 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization of the dissertation . . . . . . . . . . . . . . . . . . . . . 7 2 Genome biology and computational analysis 9 2.1 Fundamentals of genome biology . . . . . . . . . . . . . . . . . . . . 9 2.1.1 DNA, gene, chromosomes and cell biology . . . . . . . . . . . 9 2.1.2 Gene expression and regulation . . . . . . . . . . . . . . . . . 10 2.1.3 Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.4 Epigenomics . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Evolutionary machine learning . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Machine learning and evolutionary computation . . . . . . . 13 2.2.2 Evolutionary computation in biology . . . . . . . . . . . . . . 13 3 Identifying co-regulatory sequence motifs 16 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Investigation of the relationship between regulatory sequence motifs and expression prolfies . . . . . . . . . . . . . . . . . . 18 3.2.2 Preparation of the gene expression datasets . . . . . . . . . . 21 3.2.3 Preparation of the gene sequence datasets . . . . . . . . . . . 22 3.2.4 Measurement of the eect of motif combinations . . . . . . . 23 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 Identication of the relationship between gene expression and known motifs . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.2 Identification of cell cycle-related motifs . . . . . . . . . . . . 28 3.3.3 Combinational effects of regulatory motifs . . . . . . . . . . . 30 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Investigation of combinatorial eects of DNA methylation 35 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.2 Proling of DNA methylation patterns . . . . . . . . . . . . . 39 4.2.3 Identifying differentially methylated/expressed genes by information theoretic analysis . . . . . . . . . . . . . . . . . . . . 39 4.2.4 Identifying downregulated genes in each subtype for integrative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.5 Correlation between DNA methylation and gene expression . 41 4.2.6 Combinatorial effects of DNA methylation in various genomic regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.7 Analysis of transcription factor binding regions possibly blocked by DNA methylation . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3.1 DNA methylation in 30 ICBP cell lines . . . . . . . . . . . . 44 4.3.2 Information theoretic analysis of phenotype-differentially methylated and expressed genes . . . . . . . . . . . . . . . . . . . . 45 4.3.3 Integrated analysis of DNA methylation and gene expression 47 4.3.4 Investigation of the combinatorial eects of DNA methylation in various regions on downstream gene expression levels . . . 52 4.3.5 Integrative analysis of transcription factors, DNA methylation and gene expression . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5 Detecting multiple SNP interaction via evolutionary learning 63 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.2.1 Identifying higher-order interaction of SNPs . . . . . . . . . . 65 5.2.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . 66 5.2.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3.1 Identifying interaction between features in simulation data . 72 5.3.2 Identifying higher-order SNP interactions in Korean population 74 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6 Identifying DNA methylation modules by probabilistic evolution- ary learning 85 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2.1 Evolutionary learning procedure to identify a set of DNA methylation sites associated to disease . . . . . . . . . . . . . . . . 87 6.2.2 Learning dependency graph . . . . . . . . . . . . . . . . . . . 88 6.2.3 Fitness evaluation in population . . . . . . . . . . . . . . . . 90 6.2.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.3.1 DNA methylation modules associated to breast cancer . . . 92 6.3.2 Modules associated to colorectal cancer using high-throughput sequencing data . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7 Conclusion 104 Bibliography 106 초록 133Docto

    Cell Type-specific Analysis of Human Interactome and Transcriptome

    Get PDF
    Cells are the fundamental building block of complex tissues in higher-order organisms. These cells take different forms and shapes to perform a broad range of functions. What makes a cell uniquely eligible to perform a task, however, is not well-understood; neither is the defining characteristic that groups similar cells together to constitute a cell type. Even for known cell types, underlying pathways that mediate cell type-specific functionality are not readily available. These functions, in turn, contribute to cell type-specific susceptibility in various disorders
    corecore