7 research outputs found

    Inference of gene regulatory networks from genome-wide knockout fitness data

    Get PDF
    Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A stateโ€“space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrRโ€“LiuR network in bacteria Shewanella oneidensis

    RNA-Seq ๋ฐ์ดํ„ฐ์—์„œ ์œ ์ „์ž์˜ ๋žญํ‚น์„ ์ฑ…์ •ํ•˜๊ธฐ ์œ„ํ•œ ๋„คํŠธ์›Œํฌ ์ ‘๊ทผ๋ฒ•์„ ์‚ฌ์šฉํ•œ ์ •๋ณด ๊ณผํ•™ ์‹œ์Šคํ…œ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต,2019. 8. ๊น€์„ .RNA-seq ๊ธฐ์ˆ ์€ ๊ฒŒ๋†ˆ ๊ทœ๋ชจ์˜ ์ „์‚ฌ์ฒด๋ฅผ ๊ณ ํ•ด์ƒ๋„๋กœ ๋ถ„์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“ค์—ˆ์œผ๋‚˜, ์ผ๋ฐ˜์ ์œผ๋กœ ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ์—์„œ ๋‚˜ํƒ€๋‚˜๋Š” ์œ ์ „์ž์˜ ์ˆ˜๋Š” ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๊ฐ€ ๋ถ„์„ ์—†์ด ์—ฐ๊ตฌ ๋ชฉํ‘œ์™€ ๊ด€๋ จ๋œ ์œ ์ „์ž๋ฅผ ์‹๋ณ„ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๋‹ค. ๋”ฐ๋ผ์„œ ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ ๋ถ„์„์€ ์ข…์ข… ์ƒ๋ฌผ ๋„คํŠธ์›Œํฌ, ์œ ์ „์ž ์ •๋ณด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค, ๋ฌธํ—Œ ์ •๋ณด ๊ฐ™์ด ์„œ๋กœ ๋‹ค๋ฅธ ์ž์›์„ ํ™œ์šฉํ•˜์—ฌ ๋ถ„์„ํ•˜๊ฒŒ ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ž์›๋“ค ๊ฐ„์˜ ๊ด€๊ณ„๋Š” ์ด์งˆ์ ์ธ ๋ถ€๋ถ„์ด ์กด์žฌํ•˜์—ฌ ์„œ๋กœ ์ง์ ‘์ ์œผ๋กœ ์—ฐ๊ฒฐํ•˜์—ฌ ํ•ด์„ํ•˜๊ธฐ ์–ด๋ ค์šฐ๋ฉฐ ์–ด๋– ํ•œ ์œ ์ „์ž๊ฐ€ ์‹คํ—˜ ๋ชฉํ‘œ์™€ ๊ด€๋ จ์ด ์žˆ๋Š”์ง€๋ฅผ ๊ตฌ์ฒด์ ์œผ๋กœ ์ดํ•ดํ•˜๊ธฐ ํž˜๋“ค๋‹ค. ๋”ฐ๋ผ์„œ ํŠน์ • ์—ฐ๊ตฌ ๋ชฉํ‘œ์™€ ๊ด€๋ จ ์žˆ๋Š” ํ•ต์‹ฌ ์œ ์ „์ž๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒฐ์ •ํ•˜๊ณ  ์„ค๋ช…ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด๋Ÿฌํ•œ ์ด์งˆ์ ์ธ ์ž์›์„ ํšจ๊ณผ์ ์œผ๋กœ ํ†ตํ•ฉํ•  ๊ฐ•๋ ฅํ•œ ์ „์‚ฐ ๊ธฐ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๊ณ  ์‹คํ—˜ ๋ชฉํ‘œ์™€ ๊ด€๋ จ ์žˆ๋Š” ์œ ์ „์ž๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ์„ธ ๊ฐ€์ง€ ์ƒ๋ฌผ ์ •๋ณด ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ–ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” RNA-Seq ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์„ ํ™œ์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ์ ์€ ์œ ์ „์ž ๋…น์•„์›ƒ (KO) ๋งˆ์šฐ์Šค ์‹คํ—˜์—์„œ ์ค‘์š”ํ•œ ์œ ์ „์ž๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ์ •๋ณดํ•™ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ด ์‹œ์Šคํ…œ์€ ์œ ์ „์ž ์กฐ์ ˆ ๋„คํŠธ์›Œํฌ (GRN)์™€ ํŒจ์Šค์›จ์ด ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์œ ์˜ํ•จ์ด ์ ์€ Differentially Expressed Gene (DEG)๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ๋‹จ์ผ ์—ผ๊ธฐ ๋ณ€์ด (SNV) ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ ๊ฐ„ ์œ ์ „์  ์ฐจ์ด๋กœ ์ธํ•ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋Š” ์œ ์ „์ž๋ฅผ ์ œ๊ฑฐํ•œ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ๋„คํŠธ์›Œํฌ์™€ SNV ์ •๋ณด์˜ ํ†ตํ•ฉ์„ ํ†ตํ•ด์„œ ํ›„๋ณด ์œ ์ „์ž์˜ ์ˆ˜๋ฅผ ์œ ์˜๋ฏธํ•˜๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์‚ฌ์šฉ์ž์˜ ์‹คํ—˜ ๋ชฉํ‘œ๋ฅผ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋Š” ์œ ์ „์ž ๋žญํ‚น ์‹œ์Šคํ…œ์ธ CLIP-GENE์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. CLIP-GENE์€ ์ฅ์˜ ์ „์‚ฌ์ธ์ž KO ์‹คํ—˜์—์„œ ์œ ์ „์ž๋ฅผ ๋žญํ‚นํ•˜๊ธฐ ์œ„ํ•œ ํ†ตํ•ฉ ๋ถ„์„ ์›น ์„œ๋น„์Šค์ด๋‹ค. CLIP-GENE์€ ํ›„๋ณด ์œ ์ „์ž์— ๋žญํ‚น์„ ๋ถ€์—ฌํ•˜๊ธฐ ์œ„ํ•ด GRN, SNV ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ ๊ฐœ์ฒด ๊ฐ„์˜ ์ฐจ์ด๊ฐ€ ์žˆ๊ณ  ๋œ ์œ ์˜๋ฏธํ•œ ํ›„๋ณด ์œ ์ „์ž๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ํ…์ŠคํŠธ ๋งˆ์ด๋‹ ๊ธฐ์ˆ ๊ณผ ๋‹จ๋ฐฑ์งˆ-๋‹จ๋ฐฑ์งˆ ์ƒํ˜ธ์ž‘์šฉ ๋„คํŠธ์›Œํฌ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์‹คํ—˜ ๋ชฉํ‘œ์™€ ๊ด€๋ จ๋œ ์œ ์ „์ž๋ฅผ ๋žญํ‚นํ•œ๋‹ค. ๋งˆ์ง€๋ง‰ ์—ฐ๊ตฌ๋Š” ๋ฒค ๋‹ค์ด์–ด๊ทธ๋žจ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์ˆ˜์˜ RNA-Seq ์‹คํ—˜์„ ๋น„๊ต๋ถ„์„ ํ• ์ˆ˜ ์žˆ๋Š” ์ •๋ณด ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. RNA-Seq ์‹คํ—˜์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋น„๊ต ๋ฐ ๋Œ€์กฐ๊ตฐ์˜ ์ƒ˜ํ”Œ์„ ๋น„๊ตํ•˜์—ฌ DEG๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋ฒค ๋‹ค์ด์–ด๊ทธ๋žจ์„ ํ†ตํ•˜์—ฌ ์ƒ˜ํ”Œ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๋ถ„์„ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฒค ๋‹ค์ด์–ด๊ทธ๋žจ ์ƒ์—์„œ์˜ ๊ฐ ์˜์—ญ์€ ๋‹ค์–‘ํ•œ ๋น„์œจ์˜ DEG๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ํŠน์ • ์˜์—ญ์˜ DEG๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ๋น„๊ต๊ตฐ(ํ˜น์€ ๋Œ€์กฐ๊ตฐ)์— ์˜ํ•œ DEG์ด๊ธฐ์— ๋‹จ์ˆœํžˆ ์œ ์ „์ž ๋ชฉ๋ก ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์€ ์ ์ ˆํ•˜์ง€ ๋ชปํ•˜๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ฒค ๋‹ค์ด์–ด๊ทธ๋žจ๊ณผ ๋„คํŠธ์›Œํฌ ์ „ํŒŒ(Network Propagation)๋ฅผ ์‚ฌ์šฉํ•œ ํ†ตํ•ฉ ๋ถ„์„ ํ”„๋ ˆ์ž„์›Œํฌ์ธ Venn-diaNet์ด ๊ฐœ๋ฐœํ–ˆ๋‹ค. Venn-diaNet์€ ๋‹ค์ˆ˜์˜ DEG ๋ชฉ๋ก์ด ์žˆ๋Š” ์‹คํ—˜์˜ ์œ ์ „์ž๋ฅผ ๋žญํ‚นํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด ์‹œ์Šคํ…œ์ด๋‹ค. ์šฐ๋ฆฌ๋Š” Venn-diaNet์ด ์„œ๋กœ ๋‹ค๋ฅธ ์กฐ๊ฑด์—์„œ ์ƒ๋ฌผํ•™์  ์‹คํ—˜์„ ๋น„๊ตํ•จ์œผ๋กœ์จ ์›๋ณธ ๋…ผ๋ฌธ์— ๋ณด๊ณ ๋œ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋ฅผ ์žฌํ˜„ ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ์ •๋ฆฌํ•˜๋ฉด ์ด ๋…ผ๋ฌธ์€ ์ „์‚ฌ์ฒด ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์œ ์ „์ž๋ฅผ ๋žญํ‚นํ•  ์ˆ˜์žˆ๋Š” ์ •๋ณด ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•ด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ๋ถ„์„๋ฒ•์„ ๋‹ค์–‘ํ•œ ์ž์›๋“ค๊ณผ ๊ฒฐํ•ฉํ•˜์˜€์œผ๋ฉฐ, ๋‹ค๋ฅธ ์—ฐ๊ตฌ์ž์˜ ํŽธ๋ฆฌํ•œ ์‚ฌ์šฉ ๊ฒฝํ—˜์„ ์œ„ํ•ด ์นœํ™”์ ์ธ UI๋ฅผ ๊ฐ€์ง„ ์›น๋„๊ตฌ ๋˜๋Š” ์†Œํ”„ํŠธ์›จ์–ด ํŒจํ‚ค์ง€๋กœ ์ œ์ž‘ ๋ฐ ๋ฐฐํฌํ•˜์˜€๋‹ค.Transcriptomic analysis, the measurement of transcripts on the genome scale, is now routinely performed in high resolution. Since the number of genes obtained in the transcriptome data is usually large, it is difficult for researchers to identify genes that are relevant to their research goals, without additional analysis. Analysis of transcriptome data is often performed utilizing heterogeneous resources such as biological networks, annotated gene information, and published literature. However, the relationship among heterogeneous resources is often too complicated to decipher which genes are relevant to the experimental design. Therefore, powerful computational methods should be coupled with these heterogeneous resources in order to effectively determine and illustrate key genes that are relevant to specific research goals. In my doctoral study, I have developed three bioinformatics systems that use network approaches to analyze transcriptome data and rank genes that are relevant to the experimental design. The first study was conducted to develop a bioinformatics system that could be used to analyze RNA-Seq data of gene knockout (KO) mice, where the sample number is small. In this case, the main objectives were to investigate how the KO gene affects the expression of other genes and identify the key genes that contribute significantly to the phenotypic difference. To address these questions, I developed a gene prioritization system that utilizes the characteristics of RNA-Seq data. The system prioritizes genes by removing the less informative differentially expressed genes (DEGs) using gene regulatory network (GRN) and biological pathways. Next, it filters out genes that might be different due to genetic differences between samples using single nucleotide variant (SNV) information. Consequently, this study demonstrated that the integration of networks and SNV information was able to increase the performance of gene prioritization. The second study was conducted to develop a gene prioritization system that allows the user to specify the context of the experiment. This study was inspired by the fact that the currently available analysis methods for transcriptome data do not fully consider the experimental design of gene KO studies. Therefore, I envisaged that users would prefer an analysis method that took into consideration the characteristics of the KO experiments and could be guided by the context of the researcher who has designed and performed the biological experiment. Therefore, I developed CLIP-GENE, a web service of the condition-specific context-laid integrative analysis for prioritizing genes in mouse TF KO experiments. CLIP-GENE prioritizes genes of KO experiments by removing the less informative DEGs using GRN, discards genes that might have sample variance, using SNV information, and ranks genes that are related to the user's context using the text-mining technique, as well as considering the shortest path of protein-protein interaction (PPI) from the KO gene to the target genes. The last study was conducted to develop an informative system that could be used to compare multiple RNA-Seq experiments using Venn diagrams. In general, RNA-Seq experiments are performed to compare samples between control and treated groups, producing a set of DEGs. Each region in a Venn diagram (a subset of DEGs) generally contains a large number of genes that could complicate the determination of the important and relevant genes. Moreover, simply comparing the list of DEGs from different experiments could be misleading because some of the DEG lists may have been measured using different controls. To address these issues, Venn-diaNet was developed, an analysis framework that integrates Venn diagram and network propagation to prioritize genes for experiments that have multiple DEG lists. We demonstrated that Venn-diaNet was able to reproduce research findings reported in the original papers by comparing two, three, and eight biological experiments measured in different conditions. I believe that Venn-diaNet can be very useful for researchers to determine genes for their follow-up studies. In summary, my doctoral study aimed to develop computational tools that can prioritize genes from transcriptome data. To achieve this goal, I combined network approaches with multiple heterogeneous resources in a single computational environment. All three informatics systems are deployed as software packages or web tools to support convenient access to researchers, eliminating the need for installation or learning any additional software packages.Abstract Chapter 1 Introduction 1.1 Challenges of analyzing RNA-seq data 1.1.1 Excessive amount of databases and analysis methods 1.1.2 Knowledge bias that prioritizes less relevant genes 1.1.3 Complicated experiment designs 1.2 My approach to address the challenges for the analysis of RNA-seq data 1.3 Background 1.3.1 Differentially expressed genes 1.3.2 Gene prioritization 1.4 Outline of the thesis Chapter 2 A filtering strategy that combines GRN, SNV information to enhances the gene prioritization in mouse KO studies with small number of samples 2.1 Background 2.2 Methods 2.2.1 First filter: DEG 2.2.2 Second filter: GRN 2.2.3 Third filter: Biological Pathway 2.2.4 Final filter: SNV 2.3 Results and Discussion 2.4 Discussion Chapter 3 An integration of data-fusion and text-mining strategy to prioritize context-laid genes in mouse TF KO experiments 3.1 Background 3.2 Methods 3.2.1 Selection of initial candidate genes 3.2.2 Prioritizing genes with the user context and PPI 3.3 Results and Discussion 3.3.1 Performance with the best context 3.3.2 Performance with the worst context 3.4 Discussion Chapter 4 Integrating Venn diagram to the network-based strategy for comparing multiple biological experiments 4.1 Background 4.2 Methods 4.2.1 Taking input data 4.2.2 Generating Venn diagram of DEG sets 4.2.3 Network propagation and gene ranking 4.3 Results and Discussion 4.3.1 Venn-diaNet for two experiments 4.3.2 Venn-diaNet for three experiments 4.3.3 Venn-diaNet for eight experiments 4.3.4Venn-diaNet performance with different PPI network 4.4 Discussion Chapter 5 Conclusion Bibliography ์ดˆ๋กDocto

    Inferring biological networks from genome-wide transcriptional and fitness data

    Get PDF
    In the last 15 years, the increased use of high throughput biology techniques such as genome-wide gene expression profiling, fitness profiling and protein interactomics has led to the generation of an extraordinary amount of data. The abundance of such diverse data has proven to be an essential foundation for understanding the complexities of molecular mechanisms and underlying pathways within a biological system. This thesis demonstrates the capabilities and applications of using biological networks to extrapolate biological information from the wealth of data available in the yeast species Saccharomyces cerevisiae and Schizosaccharomyces pombe. This study marks the first time a mutual information based network inference approach has been applied to a set of specific genome-wide expression and fitness compendia. In particular, this work has generated hypotheses in S. pombe that have led to a deeper understanding of the relationship between ribosomal proteins and energy metabolism, a recently discovered pathway termed riboneogenesis. Experimental validation of this hypothesis has led to new theories on the role of energy metabolism enzymes in controlling ribosome biogenesis in S. pombe, including the novel finding that fructose-1, 6-bisphosphatase (FBP1) may have roles in both gluconeogenesis and riboneogenesis. This thesis also demonstrates how the use of multi-level data allows for comprehensive insight into nuclear functions of the S. pombe nonsense-mediated mRNA decay protein, UPF1. This study provides substantial evidence demonstrating the role of UPF1 in DNA replication. The applicability of fitness data in identifying targets of metal and metalloid toxicity in S. cerevisiae has also been investigated
    corecore