5,376 research outputs found
Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review
A variety of genome-wide profiling techniques are available to probe
complementary aspects of genome structure and function. Integrative analysis of
heterogeneous data sources can reveal higher-level interactions that cannot be
detected based on individual observations. A standard integration task in
cancer studies is to identify altered genomic regions that induce changes in
the expression of the associated genes based on joint analysis of genome-wide
gene expression and copy number profiling measurements. In this review, we
provide a comparison among various modeling procedures for integrating
genome-wide profiling data of gene copy number and transcriptional alterations
and highlight common approaches to genomic data integration. A transparent
benchmarking procedure is introduced to quantitatively compare the cancer gene
prioritization performance of the alternative methods. The benchmarking
algorithms and data sets are available at http://intcomp.r-forge.r-project.orgComment: PDF file including supplementary material. 9 pages. Preprin
SPOT: a web-based tool for using biological databases to prioritize SNPs after a genome-wide association study
SPOT (http://spot.cgsmd.isi.edu), the SNP prioritization online tool, is a web site for integrating biological databases into the prioritization of single nucleotide polymorphisms (SNPs) for further study after a genome-wide association study (GWAS). Typically, the next step after a GWAS is to genotype the top signals in an independent replication sample. Investigators will often incorporate information from biological databases so that biologically relevant SNPs, such as those in genes related to the phenotype or with potentially non-neutral effects on gene expression such as a splice sites, are given higher priority. We recently introduced the genomic information network (GIN) method for systematically implementing this kind of strategy. The SPOT web site allows users to upload a list of SNPs and GWAS P-values and returns a prioritized list of SNPs using the GIN method. Users can specify candidate genes or genomic regions with custom levels of prioritization. The results can be downloaded or viewed in the browser where users can interactively explore the details of each SNP, including graphical representations of the GIN method. For investigators interested in incorporating biological databases into a post-GWAS SNP selection strategy, the SPOT web tool is an easily implemented and flexible solution
Recommended from our members
Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits.
The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell-derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes
GeneTIER: prioritization of candidate disease genes using tissue-specific gene expression profiles.
Motivation In attempts to determine the genetic causes of human disease, researchers are often faced with a large number of candidate genes. Linkage studies can point to a genomic region containing hundreds of genes, while the high-throughput sequencing approach will often identify a great number of non-synonymous genetic variants. Since systematic experimental verification of each such candidate gene is not feasible, a method is needed to decide which genes are worth investigating further. Computational gene prioritization presents itself as a solution to this problem, systematically analyzing and sorting each gene from the most to least likely to be the disease-causing gene, in a fraction of the time it would take a researcher to perform such queries manually. Results Here we present GeneTIER (Gene TIssue Expression Ranker), a new web-based application for candidate gene prioritization. GeneTIER replaces knowledge-based inference traditionally used in candidate disease gene prioritization applications with experimental data from tissue-specific gene expression datasets and thus largely overcomes the bias towards the better characterized genes/diseases that commonly afflict other methods. We show that our approach is capable of accurate candidate gene prioritization and illustrate its strengths and weaknesses using case study examples. Availability and Implementation Freely available on the web at http://dna.leeds.ac.uk/GeneTIER/ Contact: [email protected]
RNA-Seq ๋ฐ์ดํฐ์์ ์ ์ ์์ ๋ญํน์ ์ฑ ์ ํ๊ธฐ ์ํ ๋คํธ์ํฌ ์ ๊ทผ๋ฒ์ ์ฌ์ฉํ ์ ๋ณด ๊ณผํ ์์คํ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :์์ฐ๊ณผํ๋ํ ํ๋๊ณผ์ ์๋ฌผ์ ๋ณดํ์ ๊ณต,2019. 8. ๊น์ .RNA-seq ๊ธฐ์ ์ ๊ฒ๋ ๊ท๋ชจ์ ์ ์ฌ์ฒด๋ฅผ ๊ณ ํด์๋๋ก ๋ถ์ ๊ฐ๋ฅํ๊ฒ ๋ง๋ค์์ผ๋, ์ผ๋ฐ์ ์ผ๋ก ์ ์ฌ์ฒด ๋ฐ์ดํฐ์์ ๋ํ๋๋ ์ ์ ์์ ์๋ ๋ง๊ธฐ ๋๋ฌธ์ ์ถ๊ฐ ๋ถ์ ์์ด ์ฐ๊ตฌ ๋ชฉํ์ ๊ด๋ จ๋ ์ ์ ์๋ฅผ ์๋ณํ๊ธฐ๊ฐ ์ด๋ ต๋ค. ๋ฐ๋ผ์ ์ ์ฌ์ฒด ๋ฐ์ดํฐ ๋ถ์์ ์ข
์ข
์๋ฌผ ๋คํธ์ํฌ, ์ ์ ์ ์ ๋ณด ๋ฐ์ดํฐ๋ฒ ์ด์ค, ๋ฌธํ ์ ๋ณด ๊ฐ์ด ์๋ก ๋ค๋ฅธ ์์์ ํ์ฉํ์ฌ ๋ถ์ํ๊ฒ ๋๋ค. ๊ทธ๋ฌ๋ ์์๋ค ๊ฐ์ ๊ด๊ณ๋ ์ด์ง์ ์ธ ๋ถ๋ถ์ด ์กด์ฌํ์ฌ ์๋ก ์ง์ ์ ์ผ๋ก ์ฐ๊ฒฐํ์ฌ ํด์ํ๊ธฐ ์ด๋ ค์ฐ๋ฉฐ ์ด๋ ํ ์ ์ ์๊ฐ ์คํ ๋ชฉํ์ ๊ด๋ จ์ด ์๋์ง๋ฅผ ๊ตฌ์ฒด์ ์ผ๋ก ์ดํดํ๊ธฐ ํ๋ค๋ค.
๋ฐ๋ผ์ ํน์ ์ฐ๊ตฌ ๋ชฉํ์ ๊ด๋ จ ์๋ ํต์ฌ ์ ์ ์๋ฅผ ํจ๊ณผ์ ์ผ๋ก ๊ฒฐ์ ํ๊ณ ์ค๋ช
ํ๊ธฐ ์ํด์๋ ์ด๋ฌํ ์ด์ง์ ์ธ ์์์ ํจ๊ณผ์ ์ผ๋ก ํตํฉํ ๊ฐ๋ ฅํ ์ ์ฐ ๊ธฐ๋ฒ์ด ํ์ํ๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ ๋คํธ์ํฌ ๊ธฐ๋ฐ ์ ๊ทผ๋ฒ์ ์ฌ์ฉํ์ฌ ์ ์ฌ์ฒด ๋ฐ์ดํฐ๋ฅผ ๋ถ์ํ๊ณ ์คํ ๋ชฉํ์ ๊ด๋ จ ์๋ ์ ์ ์๋ฅผ ์ฐพ๊ธฐ ์ํ ์ธ ๊ฐ์ง ์๋ฌผ ์ ๋ณด ์์คํ
์ ๊ฐ๋ฐํ๋ค.
์ฒซ ๋ฒ์งธ ์ฐ๊ตฌ๋ RNA-Seq ๋ฐ์ดํฐ์ ํน์ฑ์ ํ์ฉํ์ฌ ์ํ ์๊ฐ ์ ์ ์ ์ ์ ๋
น์์ (KO) ๋ง์ฐ์ค ์คํ์์ ์ค์ํ ์ ์ ์๋ฅผ ์ฐพ๊ธฐ ์ํ ์ ๋ณดํ ์์คํ
์ ๊ฐ๋ฐํ์๋ค. ์ด ์์คํ
์ ์ ์ ์ ์กฐ์ ๋คํธ์ํฌ (GRN)์ ํจ์ค์จ์ด ์ ๋ณด๋ฅผ ์ฌ์ฉํ์ฌ ์ ์ํจ์ด ์ ์ Differentially Expressed Gene (DEG)๋ฅผ ์ ๊ฑฐํ๊ณ ๋จ์ผ ์ผ๊ธฐ ๋ณ์ด (SNV) ์ ๋ณด๋ฅผ ์ฌ์ฉํ์ฌ ์ํ ๊ฐ ์ ์ ์ ์ฐจ์ด๋ก ์ธํด ๋ค๋ฅผ ์ ์๋ ์ ์ ์๋ฅผ ์ ๊ฑฐํ๋ค. ์ด ์ฐ๊ตฌ๋ ๋คํธ์ํฌ์ SNV ์ ๋ณด์ ํตํฉ์ ํตํด์ ํ๋ณด ์ ์ ์์ ์๋ฅผ ์ ์๋ฏธํ๊ฒ ์ค์ผ ์ ์์์ ๋ณด์ฌ์ฃผ์๋ค.
๋ ๋ฒ์งธ ์ฐ๊ตฌ๋ ์ฌ์ฉ์์ ์คํ ๋ชฉํ๋ฅผ ๋ฐ์ํ ์ ์๋ ์ ์ ์ ๋ญํน ์์คํ
์ธ CLIP-GENE์ ๊ฐ๋ฐํ์๋ค. CLIP-GENE์ ์ฅ์ ์ ์ฌ์ธ์ KO ์คํ์์ ์ ์ ์๋ฅผ ๋ญํนํ๊ธฐ ์ํ ํตํฉ ๋ถ์ ์น ์๋น์ค์ด๋ค. CLIP-GENE์ ํ๋ณด ์ ์ ์์ ๋ญํน์ ๋ถ์ฌํ๊ธฐ ์ํด GRN, SNV ์ ๋ณด๋ฅผ ์ด์ฉํ์ฌ ์ํ ๊ฐ์ฒด ๊ฐ์ ์ฐจ์ด๊ฐ ์๊ณ ๋ ์ ์๋ฏธํ ํ๋ณด ์ ์ ์๋ฅผ ์ ๊ฑฐํ๊ณ ํ
์คํธ ๋ง์ด๋ ๊ธฐ์ ๊ณผ ๋จ๋ฐฑ์ง-๋จ๋ฐฑ์ง ์ํธ์์ฉ ๋คํธ์ํฌ ์ ๋ณด๋ฅผ ์ด์ฉํ์ฌ ์ฌ์ฉ์์ ์คํ ๋ชฉํ์ ๊ด๋ จ๋ ์ ์ ์๋ฅผ ๋ญํนํ๋ค.
๋ง์ง๋ง ์ฐ๊ตฌ๋ ๋ฒค ๋ค์ด์ด๊ทธ๋จ์ ์ฌ์ฉํ์ฌ ๋ค์์ RNA-Seq ์คํ์ ๋น๊ต๋ถ์ ํ ์ ์๋ ์ ๋ณด ์์คํ
์ ๊ฐ๋ฐํ์๋ค. RNA-Seq ์คํ์ ์ผ๋ฐ์ ์ผ๋ก ๋น๊ต ๋ฐ ๋์กฐ๊ตฐ์ ์ํ์ ๋น๊ตํ์ฌ DEG๋ฅผ ์์ฑํ๊ณ ๋ฒค ๋ค์ด์ด๊ทธ๋จ์ ํตํ์ฌ ์ํ ๊ฐ์ ์ฐจ์ด๋ฅผ ๋ถ์ํ๋ค. ๊ทธ๋ฌ๋ ๋ฒค ๋ค์ด์ด๊ทธ๋จ ์์์์ ๊ฐ ์์ญ์ ๋ค์ํ ๋น์จ์ DEG๋ฅผ ํฌํจํ๊ณ ์์ผ๋ฉฐ, ํน์ ์์ญ์ DEG๋ ์๋ก ๋ค๋ฅธ ๋น๊ต๊ตฐ(ํน์ ๋์กฐ๊ตฐ)์ ์ํ DEG์ด๊ธฐ์ ๋จ์ํ ์ ์ ์ ๋ชฉ๋ก ๊ฐ์ ์ฐจ์ด๋ฅผ ๋น๊ตํ๋ ๊ฒ์ ์ ์ ํ์ง ๋ชปํ๋ค.
์ด๋ฌํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ๋ฒค ๋ค์ด์ด๊ทธ๋จ๊ณผ ๋คํธ์ํฌ ์ ํ(Network Propagation)๋ฅผ ์ฌ์ฉํ ํตํฉ ๋ถ์ ํ๋ ์์ํฌ์ธ Venn-diaNet์ด ๊ฐ๋ฐํ๋ค. Venn-diaNet์ ๋ค์์ DEG ๋ชฉ๋ก์ด ์๋ ์คํ์ ์ ์ ์๋ฅผ ๋ญํนํ ์ ์๋ ์ ๋ณด ์์คํ
์ด๋ค. ์ฐ๋ฆฌ๋ Venn-diaNet์ด ์๋ก ๋ค๋ฅธ ์กฐ๊ฑด์์ ์๋ฌผํ์ ์คํ์ ๋น๊ตํจ์ผ๋ก์จ ์๋ณธ ๋
ผ๋ฌธ์ ๋ณด๊ณ ๋ ์ฐ๊ตฌ ๊ฒฐ๊ณผ๋ฅผ ์ฌํ ํ ์ ์์์ ๋ณด์ฌ์ฃผ์๋ค.
์ ๋ฆฌํ๋ฉด ์ด ๋
ผ๋ฌธ์ ์ ์ฌ์ฒด ๋ฐ์ดํฐ๋ก๋ถํฐ ์ ์ ์๋ฅผ ๋ญํนํ ์์๋ ์ ๋ณด ์์คํ
์ ๊ฐ๋ฐํ๊ธฐ ์ํด ๋คํธ์ํฌ ๊ธฐ๋ฐ ๋ถ์๋ฒ์ ๋ค์ํ ์์๋ค๊ณผ ๊ฒฐํฉํ์์ผ๋ฉฐ, ๋ค๋ฅธ ์ฐ๊ตฌ์์ ํธ๋ฆฌํ ์ฌ์ฉ ๊ฒฝํ์ ์ํด ์นํ์ ์ธ UI๋ฅผ ๊ฐ์ง ์น๋๊ตฌ ๋๋ ์ํํธ์จ์ด ํจํค์ง๋ก ์ ์ ๋ฐ ๋ฐฐํฌํ์๋ค.Transcriptomic analysis, the measurement of transcripts on the genome scale, is now routinely performed in high resolution. Since the number of genes obtained in the transcriptome data is usually large, it is difficult for researchers to identify genes that are relevant to their research goals, without additional analysis. Analysis of transcriptome data is often performed utilizing heterogeneous resources such as biological networks, annotated gene information, and published literature. However, the relationship among heterogeneous resources is often too complicated to decipher which genes are relevant to the experimental design. Therefore, powerful computational methods should be coupled with these heterogeneous resources in order to effectively determine and illustrate key genes that are relevant to specific research goals. In my doctoral study, I have developed three bioinformatics systems that use network approaches to analyze transcriptome data and rank genes that are relevant to the experimental design.
The first study was conducted to develop a bioinformatics system that could be used to analyze RNA-Seq data of gene knockout (KO) mice, where the sample number is small. In this case, the main objectives were to investigate how the KO gene affects the expression of other genes and identify the key genes that contribute significantly to the phenotypic difference. To address these questions, I developed a gene prioritization system that utilizes the characteristics of RNA-Seq data. The system prioritizes genes by removing the less informative differentially expressed genes (DEGs) using gene regulatory network (GRN) and biological pathways. Next, it filters out genes that might be different due to genetic differences between samples using single nucleotide variant (SNV) information. Consequently, this study demonstrated that the integration of networks and SNV information was able to increase the performance of gene prioritization.
The second study was conducted to develop a gene prioritization system that allows the user to specify the context of the experiment. This study was inspired by the fact that the currently available analysis methods for transcriptome data do not fully consider the experimental design of gene KO studies. Therefore, I envisaged that users would prefer an analysis method that took into consideration the characteristics of the KO experiments and could be guided by the context of the researcher who has designed and performed the biological experiment. Therefore, I developed CLIP-GENE, a web service of the condition-specific context-laid integrative analysis for prioritizing genes in mouse TF KO experiments. CLIP-GENE prioritizes genes of KO experiments by removing the less informative DEGs using GRN, discards genes that might have sample variance, using SNV information, and ranks genes that are related to the user's context using the text-mining technique, as well as considering the shortest path of protein-protein interaction (PPI) from the KO gene to the target genes.
The last study was conducted to develop an informative system that could be used to compare multiple RNA-Seq experiments using Venn diagrams. In general, RNA-Seq experiments are performed to compare samples between control and treated groups, producing a set of DEGs. Each region in a Venn diagram (a subset of DEGs) generally contains a large number of genes that could complicate the determination of the important and relevant genes. Moreover, simply comparing the list of DEGs from different experiments could be misleading because some of the DEG lists may have been measured using different controls. To address these issues, Venn-diaNet was developed, an analysis framework that integrates Venn diagram and network propagation to prioritize genes for experiments that have multiple DEG lists. We demonstrated that Venn-diaNet was able to reproduce research findings reported in the original papers by comparing two, three, and eight biological experiments measured in different conditions. I believe that Venn-diaNet can be very useful for researchers to determine genes for their follow-up studies.
In summary, my doctoral study aimed to develop computational tools that can prioritize genes from transcriptome data. To achieve this goal, I combined network approaches with multiple heterogeneous resources in a single computational environment. All three informatics systems are deployed as software packages or web tools to support convenient access to researchers, eliminating the need for installation or learning any additional software packages.Abstract
Chapter 1 Introduction
1.1 Challenges of analyzing RNA-seq data
1.1.1 Excessive amount of databases and analysis methods
1.1.2 Knowledge bias that prioritizes less relevant genes
1.1.3 Complicated experiment designs
1.2 My approach to address the challenges for the analysis of RNA-seq data
1.3 Background
1.3.1 Differentially expressed genes
1.3.2 Gene prioritization
1.4 Outline of the thesis
Chapter 2 A filtering strategy that combines GRN, SNV information to enhances the gene prioritization in mouse KO studies with small number of samples
2.1 Background
2.2 Methods
2.2.1 First filter: DEG
2.2.2 Second filter: GRN
2.2.3 Third filter: Biological Pathway
2.2.4 Final filter: SNV
2.3 Results and Discussion
2.4 Discussion
Chapter 3 An integration of data-fusion and text-mining strategy to prioritize context-laid genes in mouse TF KO experiments
3.1 Background
3.2 Methods
3.2.1 Selection of initial candidate genes
3.2.2 Prioritizing genes with the user context and PPI
3.3 Results and Discussion
3.3.1 Performance with the best context
3.3.2 Performance with the worst context
3.4 Discussion
Chapter 4 Integrating Venn diagram to the network-based strategy for comparing multiple biological experiments
4.1 Background
4.2 Methods
4.2.1 Taking input data
4.2.2 Generating Venn diagram of DEG sets
4.2.3 Network propagation and gene ranking
4.3 Results and Discussion
4.3.1 Venn-diaNet for two experiments
4.3.2 Venn-diaNet for three experiments
4.3.3 Venn-diaNet for eight experiments
4.3.4Venn-diaNet performance with different PPI network
4.4 Discussion
Chapter 5 Conclusion
Bibliography
์ด๋กDocto
PINTA: a web server for network-based gene prioritization from expression data
PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide proteinโprotein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user
plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters
Plant specialized metabolites are chemically highly diverse, play key roles in host-microbe interactions, have important nutritional value in crops and are frequently applied as medicines. It has recently become clear that plant biosynthetic pathway-encoding genes are sometimes densely clustered in specific genomic loci: Biosynthetic gene clusters (BGCs). Here, we introduce plantiSMASH, a versatile online analysis platform that automates the identification of candidate plant BGCs. Moreover, it allows integration of transcriptomic data to prioritize candidate BGCs based on the coexpression patterns of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery.</p
ICSNPathway: identify candidate causal SNPs and pathways from genome-wide association study by one analytical framework
Genome-wide association study (GWAS) is widely utilized to identify genes involved in human complex disease or some other trait. One key challenge for GWAS data interpretation is to identify causal SNPs and provide profound evidence on how they affect the trait. Currently, researches are focusing on identification of candidate causal variants from the most significant SNPs of GWAS, while there is lack of support on biological mechanisms as represented by pathways. Although pathway-based analysis (PBA) has been designed to identify disease-related pathways by analyzing the full list of SNPs from GWAS, it does not emphasize on interpreting causal SNPs. To our knowledge, so far there is no web server available to solve the challenge for GWAS data interpretation within one analytical framework. ICSNPathway is developed to identify candidate causal SNPs and their corresponding candidate causal pathways from GWAS by integrating linkage disequilibrium (LD) analysis, functional SNP annotation and PBA. ICSNPathway provides a feasible solution to bridge the gap between GWAS and disease mechanism study by generating hypothesis of SNP โ gene โ pathway(s). The ICSNPathway server is freely available at http://icsnpathway.psych.ac.cn/
- โฆ