Search CORE

7 research outputs found

Inference of gene regulatory networks from genome-wide knockout fitness data

Author: Arkin Adam P.
Samoilov Michael S.
Wang Liming
Wang Xiaodong
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2012
Field of study

Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis

Crossref

Columbia University Academic Commons

PubMed Central

eScholarship - University of California

Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems

Author: Faridah Hani Mohamed Salleh
Shereena M. Arif
Suhaila Zainudin
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

RNA-Seq 데이터에서 유전자의 랭킹을 책정하기 위한 네트워크 접근법을 사용한 정보 과학 시스템

Author: Benjamin Hur
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(박사)--서울대학교 대학원 :자연과학대학 협동과정 생물정보학전공,2019. 8. 김선.RNA-seq 기술은 게놈 규모의 전사체를 고해상도로 분석 가능하게 만들었으나, 일반적으로 전사체 데이터에서 나타나는 유전자의 수는 많기 때문에 추가 분석 없이 연구 목표와 관련된 유전자를 식별하기가 어렵다. 따라서 전사체 데이터 분석은 종종 생물 네트워크, 유전자 정보 데이터베이스, 문헌 정보 같이 서로 다른 자원을 활용하여 분석하게 된다. 그러나 자원들 간의 관계는 이질적인 부분이 존재하여 서로 직접적으로 연결하여 해석하기 어려우며 어떠한 유전자가 실험 목표와 관련이 있는지를 구체적으로 이해하기 힘들다. 따라서 특정 연구 목표와 관련 있는 핵심 유전자를 효과적으로 결정하고 설명하기 위해서는 이러한 이질적인 자원을 효과적으로 통합할 강력한 전산 기법이 필요하다. 본 논문에서는 네트워크 기반 접근법을 사용하여 전사체 데이터를 분석하고 실험 목표와 관련 있는 유전자를 찾기 위한 세 가지 생물 정보 시스템을 개발했다. 첫 번째 연구는 RNA-Seq 데이터의 특성을 활용하여 샘플 수가 적은 유전자 녹아웃 (KO) 마우스 실험에서 중요한 유전자를 찾기 위한 정보학 시스템을 개발하였다. 이 시스템은 유전자 조절 네트워크 (GRN)와 패스웨이 정보를 사용하여 유의함이 적은 Differentially Expressed Gene (DEG)를 제거하고 단일 염기 변이 (SNV) 정보를 사용하여 샘플 간 유전적 차이로 인해 다를 수 있는 유전자를 제거한다. 이 연구는 네트워크와 SNV 정보의 통합을 통해서 후보 유전자의 수를 유의미하게 줄일 수 있음을 보여주었다. 두 번째 연구는 사용자의 실험 목표를 반영할 수 있는 유전자 랭킹 시스템인 CLIP-GENE을 개발하였다. CLIP-GENE은 쥐의 전사인자 KO 실험에서 유전자를 랭킹하기 위한 통합 분석 웹 서비스이다. CLIP-GENE은 후보 유전자에 랭킹을 부여하기 위해 GRN, SNV 정보를 이용하여 샘플 개체 간의 차이가 있고 덜 유의미한 후보 유전자를 제거하고 텍스트 마이닝 기술과 단백질-단백질 상호작용 네트워크 정보를 이용하여 사용자의 실험 목표와 관련된 유전자를 랭킹한다. 마지막 연구는 벤 다이어그램을 사용하여 다수의 RNA-Seq 실험을 비교분석 할수 있는 정보 시스템을 개발하였다. RNA-Seq 실험은 일반적으로 비교 및 대조군의 샘플을 비교하여 DEG를 생성하고 벤 다이어그램을 통하여 샘플 간의 차이를 분석한다. 그러나 벤 다이어그램 상에서의 각 영역은 다양한 비율의 DEG를 포함하고 있으며, 특정 영역의 DEG는 서로 다른 비교군(혹은 대조군)에 의한 DEG이기에 단순히 유전자 목록 간의 차이를 비교하는 것은 적절하지 못하다. 이러한 문제를 해결하기 위해 벤 다이어그램과 네트워크 전파(Network Propagation)를 사용한 통합 분석 프레임워크인 Venn-diaNet이 개발했다. Venn-diaNet은 다수의 DEG 목록이 있는 실험의 유전자를 랭킹할 수 있는 정보 시스템이다. 우리는 Venn-diaNet이 서로 다른 조건에서 생물학적 실험을 비교함으로써 원본 논문에 보고된 연구 결과를 재현 할 수 있음을 보여주었다. 정리하면 이 논문은 전사체 데이터로부터 유전자를 랭킹할 수있는 정보 시스템을 개발하기 위해 네트워크 기반 분석법을 다양한 자원들과 결합하였으며, 다른 연구자의 편리한 사용 경험을 위해 친화적인 UI를 가진 웹도구 또는 소프트웨어 패키지로 제작 및 배포하였다.Transcriptomic analysis, the measurement of transcripts on the genome scale, is now routinely performed in high resolution. Since the number of genes obtained in the transcriptome data is usually large, it is difficult for researchers to identify genes that are relevant to their research goals, without additional analysis. Analysis of transcriptome data is often performed utilizing heterogeneous resources such as biological networks, annotated gene information, and published literature. However, the relationship among heterogeneous resources is often too complicated to decipher which genes are relevant to the experimental design. Therefore, powerful computational methods should be coupled with these heterogeneous resources in order to effectively determine and illustrate key genes that are relevant to specific research goals. In my doctoral study, I have developed three bioinformatics systems that use network approaches to analyze transcriptome data and rank genes that are relevant to the experimental design. The first study was conducted to develop a bioinformatics system that could be used to analyze RNA-Seq data of gene knockout (KO) mice, where the sample number is small. In this case, the main objectives were to investigate how the KO gene affects the expression of other genes and identify the key genes that contribute significantly to the phenotypic difference. To address these questions, I developed a gene prioritization system that utilizes the characteristics of RNA-Seq data. The system prioritizes genes by removing the less informative differentially expressed genes (DEGs) using gene regulatory network (GRN) and biological pathways. Next, it filters out genes that might be different due to genetic differences between samples using single nucleotide variant (SNV) information. Consequently, this study demonstrated that the integration of networks and SNV information was able to increase the performance of gene prioritization. The second study was conducted to develop a gene prioritization system that allows the user to specify the context of the experiment. This study was inspired by the fact that the currently available analysis methods for transcriptome data do not fully consider the experimental design of gene KO studies. Therefore, I envisaged that users would prefer an analysis method that took into consideration the characteristics of the KO experiments and could be guided by the context of the researcher who has designed and performed the biological experiment. Therefore, I developed CLIP-GENE, a web service of the condition-specific context-laid integrative analysis for prioritizing genes in mouse TF KO experiments. CLIP-GENE prioritizes genes of KO experiments by removing the less informative DEGs using GRN, discards genes that might have sample variance, using SNV information, and ranks genes that are related to the user's context using the text-mining technique, as well as considering the shortest path of protein-protein interaction (PPI) from the KO gene to the target genes. The last study was conducted to develop an informative system that could be used to compare multiple RNA-Seq experiments using Venn diagrams. In general, RNA-Seq experiments are performed to compare samples between control and treated groups, producing a set of DEGs. Each region in a Venn diagram (a subset of DEGs) generally contains a large number of genes that could complicate the determination of the important and relevant genes. Moreover, simply comparing the list of DEGs from different experiments could be misleading because some of the DEG lists may have been measured using different controls. To address these issues, Venn-diaNet was developed, an analysis framework that integrates Venn diagram and network propagation to prioritize genes for experiments that have multiple DEG lists. We demonstrated that Venn-diaNet was able to reproduce research findings reported in the original papers by comparing two, three, and eight biological experiments measured in different conditions. I believe that Venn-diaNet can be very useful for researchers to determine genes for their follow-up studies. In summary, my doctoral study aimed to develop computational tools that can prioritize genes from transcriptome data. To achieve this goal, I combined network approaches with multiple heterogeneous resources in a single computational environment. All three informatics systems are deployed as software packages or web tools to support convenient access to researchers, eliminating the need for installation or learning any additional software packages.Abstract Chapter 1 Introduction 1.1 Challenges of analyzing RNA-seq data 1.1.1 Excessive amount of databases and analysis methods 1.1.2 Knowledge bias that prioritizes less relevant genes 1.1.3 Complicated experiment designs 1.2 My approach to address the challenges for the analysis of RNA-seq data 1.3 Background 1.3.1 Differentially expressed genes 1.3.2 Gene prioritization 1.4 Outline of the thesis Chapter 2 A filtering strategy that combines GRN, SNV information to enhances the gene prioritization in mouse KO studies with small number of samples 2.1 Background 2.2 Methods 2.2.1 First filter: DEG 2.2.2 Second filter: GRN 2.2.3 Third filter: Biological Pathway 2.2.4 Final filter: SNV 2.3 Results and Discussion 2.4 Discussion Chapter 3 An integration of data-fusion and text-mining strategy to prioritize context-laid genes in mouse TF KO experiments 3.1 Background 3.2 Methods 3.2.1 Selection of initial candidate genes 3.2.2 Prioritizing genes with the user context and PPI 3.3 Results and Discussion 3.3.1 Performance with the best context 3.3.2 Performance with the worst context 3.4 Discussion Chapter 4 Integrating Venn diagram to the network-based strategy for comparing multiple biological experiments 4.1 Background 4.2 Methods 4.2.1 Taking input data 4.2.2 Generating Venn diagram of DEG sets 4.2.3 Network propagation and gene ranking 4.3 Results and Discussion 4.3.1 Venn-diaNet for two experiments 4.3.2 Venn-diaNet for three experiments 4.3.3 Venn-diaNet for eight experiments 4.3.4Venn-diaNet performance with different PPI network 4.4 Discussion Chapter 5 Conclusion Bibliography 초록Docto

SNU Open Repository and Archive

Inference of gene regulatory networks from genome-wide knockout fitness data.

Author: Wang Liming,
Publication venue
Publication date: 15/05/2020
Field of study

Ezid

Recommended from our members

Inference of gene regulatory networks from genome-wide knockout fitness data.

Author: Arkin Adam P
Samoilov Michael S
Wang Liming
Wang Xiaodong
Publication venue: eScholarship, University of California
Publication date: 01/02/2013
Field of study

MotivationGenome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference.ResultsIn this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state-space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR-LiuR network in bacteria Shewanella oneidensis.AvailabilityMATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf

eScholarship - University of California

Inference of gene regulatory networks from genome-wide knockout fitness data

Author: A. P. Arkin
Bonneau
Bornholdt
Chou
Cook
de Jong
Deutschbauer
Flick
Friedman
Giaever
Hanai
Hendrickx
Hillenmeyer
Hillenmeyer
Holter
Huang
Johnston
L. Wang
Lecca
Lohr
Luscombe
M. S. Samoilov
Mischel
Ostergaard
Pierce
Reiss
Samoilov
Steinmetz
Winzeler
X. Wang
Yeung
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Inferring biological networks from genome-wide transcriptional and fitness data

Author: Varsally Wazeer Mohammad
Publication venue
Publication date: 01/07/2014
Field of study

In the last 15 years, the increased use of high throughput biology techniques such as genome-wide gene expression profiling, fitness profiling and protein interactomics has led to the generation of an extraordinary amount of data. The abundance of such diverse data has proven to be an essential foundation for understanding the complexities of molecular mechanisms and underlying pathways within a biological system. This thesis demonstrates the capabilities and applications of using biological networks to extrapolate biological information from the wealth of data available in the yeast species Saccharomyces cerevisiae and Schizosaccharomyces pombe. This study marks the first time a mutual information based network inference approach has been applied to a set of specific genome-wide expression and fitness compendia. In particular, this work has generated hypotheses in S. pombe that have led to a deeper understanding of the relationship between ribosomal proteins and energy metabolism, a recently discovered pathway termed riboneogenesis. Experimental validation of this hypothesis has led to new theories on the role of energy metabolism enzymes in controlling ribosome biogenesis in S. pombe, including the novel finding that fructose-1, 6-bisphosphatase (FBP1) may have roles in both gluconeogenesis and riboneogenesis. This thesis also demonstrates how the use of multi-level data allows for comprehensive insight into nuclear functions of the S. pombe nonsense-mediated mRNA decay protein, UPF1. This study provides substantial evidence demonstrating the role of UPF1 in DNA replication. The applicability of fitness data in identifying targets of metal and metalloid toxicity in S. cerevisiae has also been investigated

University of Birmingham Research Archive, E-theses Repository