Search CORE

5,376 research outputs found

Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review

Author: Akavia
Andrews
Baasiri
Chin
Dai
De Bie
Futreal
H.-U. Klein
Haverty
Hawkins
Hyman
Johnson
Kao
L. Lahti
M. Dugas
M. Schafer
McLendon
Menezes
Mullighan
Mullighan
Myllykangas
Olshen
Ortiz-Estevez
Phillips
Qin
S. Bicciato
Solvang
Soneson
Stranger
van Wieringen
van Wieringen
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/11/2011
Field of study

A variety of genome-wide profiling techniques are available to probe complementary aspects of genome structure and function. Integrative analysis of heterogeneous data sources can reveal higher-level interactions that cannot be detected based on individual observations. A standard integration task in cancer studies is to identify altered genomic regions that induce changes in the expression of the associated genes based on joint analysis of genome-wide gene expression and copy number profiling measurements. In this review, we provide a comparison among various modeling procedures for integrating genome-wide profiling data of gene copy number and transcriptional alterations and highlight common approaches to genomic data integration. A transparent benchmarking procedure is introduced to quantitatively compare the cancer gene prioritization performance of the alternative methods. The benchmarking algorithms and data sets are available at http://intcomp.r-forge.r-project.orgComment: PDF file including supplementary material. 9 pages. Preprin

arXiv.org e-Print Archive

Crossref

PubMed Central

Wageningen University & Research Publications

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

SPOT: a web-based tool for using biological databases to prioritize SNPs after a genome-wide association study

Author: Adzhubei
Berrettini
Cantor
Caporaso
Cartegni
Chanock
Chanock
Clayton
Dickson
E. Deelman
Frazer
Furberg
G. Mehta
Goren
Hampe
Howard
Hung
J. A. Tischfield
J. P. Rice
J. Quan
Kumar
Lis
Liu
Manolio
Monigatti
Ng
Nicolae
P. Thomas
Pillai
Portugal
R. Bolze
Ramensky
S. F. Saccone
Saccone
Sandelin
Thorgeirsson
Thorisson
Wacholder
Wang
Yuan
Yue
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

SPOT (http://spot.cgsmd.isi.edu), the SNP prioritization online tool, is a web site for integrating biological databases into the prioritization of single nucleotide polymorphisms (SNPs) for further study after a genome-wide association study (GWAS). Typically, the next step after a GWAS is to genotype the top signals in an independent replication sample. Investigators will often incorporate information from biological databases so that biologically relevant SNPs, such as those in genes related to the phenotype or with potentially non-neutral effects on gene expression such as a splice sites, are given higher priority. We recently introduced the genomic information network (GIN) method for systematically implementing this kind of strategy. The SPOT web site allows users to upload a list of SNPs and GWAS P-values and returns a prioritized list of SNPs using the GIN method. Users can specify candidate genes or genomic regions with custom levels of prioritization. The results can be downloaded or viewed in the browser where users can interactively explore the details of each SNP, including graphical representations of the GIN method. For investigators interested in incorporating biological databases into a post-GWAS SNP selection strategy, the SPOT web tool is an easily implemented and flexible solution

Crossref

PubMed Central

Digital Commons@Becker

Recommended from our members

Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits.

Author: Benaglio Paola
D'Antonio Matteo
D'Antonio-Chronowska Agnieszka
DeBoever Christopher
Donovan Margaret KR
Drees Frauke
Frazer Kelly A
Gaulton Kyle J
Li He
Ma Wubin
Matsui Hiroko
Rosenfeld Michael G
Singhal Sanghamitra
Smith Erin N
Sotoodehnia Nona
van Setten Jessica
Yang Feng
Young Greenwald William W
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell-derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes

eScholarship - University of California

GeneTIER: prioritization of candidate disease genes using tissue-specific gene expression profiles.

Author: Antanaviciute A
Bonthron DT
Carr IM
Crinnion LA
Daly C
Markham AF
Watson CM
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/04/2015
Field of study

Motivation In attempts to determine the genetic causes of human disease, researchers are often faced with a large number of candidate genes. Linkage studies can point to a genomic region containing hundreds of genes, while the high-throughput sequencing approach will often identify a great number of non-synonymous genetic variants. Since systematic experimental verification of each such candidate gene is not feasible, a method is needed to decide which genes are worth investigating further. Computational gene prioritization presents itself as a solution to this problem, systematically analyzing and sorting each gene from the most to least likely to be the disease-causing gene, in a fraction of the time it would take a researcher to perform such queries manually. Results Here we present GeneTIER (Gene TIssue Expression Ranker), a new web-based application for candidate gene prioritization. GeneTIER replaces knowledge-based inference traditionally used in candidate disease gene prioritization applications with experimental data from tissue-specific gene expression datasets and thus largely overcomes the bias towards the better characterized genes/diseases that commonly afflict other methods. We show that our approach is capable of accurate candidate gene prioritization and illustrate its strengths and weaknesses using case study examples. Availability and Implementation Freely available on the web at http://dna.leeds.ac.uk/GeneTIER/ Contact: [email protected]

PubMed Central

White Rose Research Online

RNA-Seq 데이터에서 유전자의 랭킹을 책정하기 위한 네트워크 접근법을 사용한 정보 과학 시스템

Author: Benjamin Hur
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(박사)--서울대학교 대학원 :자연과학대학 협동과정 생물정보학전공,2019. 8. 김선.RNA-seq 기술은 게놈 규모의 전사체를 고해상도로 분석 가능하게 만들었으나, 일반적으로 전사체 데이터에서 나타나는 유전자의 수는 많기 때문에 추가 분석 없이 연구 목표와 관련된 유전자를 식별하기가 어렵다. 따라서 전사체 데이터 분석은 종종 생물 네트워크, 유전자 정보 데이터베이스, 문헌 정보 같이 서로 다른 자원을 활용하여 분석하게 된다. 그러나 자원들 간의 관계는 이질적인 부분이 존재하여 서로 직접적으로 연결하여 해석하기 어려우며 어떠한 유전자가 실험 목표와 관련이 있는지를 구체적으로 이해하기 힘들다. 따라서 특정 연구 목표와 관련 있는 핵심 유전자를 효과적으로 결정하고 설명하기 위해서는 이러한 이질적인 자원을 효과적으로 통합할 강력한 전산 기법이 필요하다. 본 논문에서는 네트워크 기반 접근법을 사용하여 전사체 데이터를 분석하고 실험 목표와 관련 있는 유전자를 찾기 위한 세 가지 생물 정보 시스템을 개발했다. 첫 번째 연구는 RNA-Seq 데이터의 특성을 활용하여 샘플 수가 적은 유전자 녹아웃 (KO) 마우스 실험에서 중요한 유전자를 찾기 위한 정보학 시스템을 개발하였다. 이 시스템은 유전자 조절 네트워크 (GRN)와 패스웨이 정보를 사용하여 유의함이 적은 Differentially Expressed Gene (DEG)를 제거하고 단일 염기 변이 (SNV) 정보를 사용하여 샘플 간 유전적 차이로 인해 다를 수 있는 유전자를 제거한다. 이 연구는 네트워크와 SNV 정보의 통합을 통해서 후보 유전자의 수를 유의미하게 줄일 수 있음을 보여주었다. 두 번째 연구는 사용자의 실험 목표를 반영할 수 있는 유전자 랭킹 시스템인 CLIP-GENE을 개발하였다. CLIP-GENE은 쥐의 전사인자 KO 실험에서 유전자를 랭킹하기 위한 통합 분석 웹 서비스이다. CLIP-GENE은 후보 유전자에 랭킹을 부여하기 위해 GRN, SNV 정보를 이용하여 샘플 개체 간의 차이가 있고 덜 유의미한 후보 유전자를 제거하고 텍스트 마이닝 기술과 단백질-단백질 상호작용 네트워크 정보를 이용하여 사용자의 실험 목표와 관련된 유전자를 랭킹한다. 마지막 연구는 벤 다이어그램을 사용하여 다수의 RNA-Seq 실험을 비교분석 할수 있는 정보 시스템을 개발하였다. RNA-Seq 실험은 일반적으로 비교 및 대조군의 샘플을 비교하여 DEG를 생성하고 벤 다이어그램을 통하여 샘플 간의 차이를 분석한다. 그러나 벤 다이어그램 상에서의 각 영역은 다양한 비율의 DEG를 포함하고 있으며, 특정 영역의 DEG는 서로 다른 비교군(혹은 대조군)에 의한 DEG이기에 단순히 유전자 목록 간의 차이를 비교하는 것은 적절하지 못하다. 이러한 문제를 해결하기 위해 벤 다이어그램과 네트워크 전파(Network Propagation)를 사용한 통합 분석 프레임워크인 Venn-diaNet이 개발했다. Venn-diaNet은 다수의 DEG 목록이 있는 실험의 유전자를 랭킹할 수 있는 정보 시스템이다. 우리는 Venn-diaNet이 서로 다른 조건에서 생물학적 실험을 비교함으로써 원본 논문에 보고된 연구 결과를 재현 할 수 있음을 보여주었다. 정리하면 이 논문은 전사체 데이터로부터 유전자를 랭킹할 수있는 정보 시스템을 개발하기 위해 네트워크 기반 분석법을 다양한 자원들과 결합하였으며, 다른 연구자의 편리한 사용 경험을 위해 친화적인 UI를 가진 웹도구 또는 소프트웨어 패키지로 제작 및 배포하였다.Transcriptomic analysis, the measurement of transcripts on the genome scale, is now routinely performed in high resolution. Since the number of genes obtained in the transcriptome data is usually large, it is difficult for researchers to identify genes that are relevant to their research goals, without additional analysis. Analysis of transcriptome data is often performed utilizing heterogeneous resources such as biological networks, annotated gene information, and published literature. However, the relationship among heterogeneous resources is often too complicated to decipher which genes are relevant to the experimental design. Therefore, powerful computational methods should be coupled with these heterogeneous resources in order to effectively determine and illustrate key genes that are relevant to specific research goals. In my doctoral study, I have developed three bioinformatics systems that use network approaches to analyze transcriptome data and rank genes that are relevant to the experimental design. The first study was conducted to develop a bioinformatics system that could be used to analyze RNA-Seq data of gene knockout (KO) mice, where the sample number is small. In this case, the main objectives were to investigate how the KO gene affects the expression of other genes and identify the key genes that contribute significantly to the phenotypic difference. To address these questions, I developed a gene prioritization system that utilizes the characteristics of RNA-Seq data. The system prioritizes genes by removing the less informative differentially expressed genes (DEGs) using gene regulatory network (GRN) and biological pathways. Next, it filters out genes that might be different due to genetic differences between samples using single nucleotide variant (SNV) information. Consequently, this study demonstrated that the integration of networks and SNV information was able to increase the performance of gene prioritization. The second study was conducted to develop a gene prioritization system that allows the user to specify the context of the experiment. This study was inspired by the fact that the currently available analysis methods for transcriptome data do not fully consider the experimental design of gene KO studies. Therefore, I envisaged that users would prefer an analysis method that took into consideration the characteristics of the KO experiments and could be guided by the context of the researcher who has designed and performed the biological experiment. Therefore, I developed CLIP-GENE, a web service of the condition-specific context-laid integrative analysis for prioritizing genes in mouse TF KO experiments. CLIP-GENE prioritizes genes of KO experiments by removing the less informative DEGs using GRN, discards genes that might have sample variance, using SNV information, and ranks genes that are related to the user's context using the text-mining technique, as well as considering the shortest path of protein-protein interaction (PPI) from the KO gene to the target genes. The last study was conducted to develop an informative system that could be used to compare multiple RNA-Seq experiments using Venn diagrams. In general, RNA-Seq experiments are performed to compare samples between control and treated groups, producing a set of DEGs. Each region in a Venn diagram (a subset of DEGs) generally contains a large number of genes that could complicate the determination of the important and relevant genes. Moreover, simply comparing the list of DEGs from different experiments could be misleading because some of the DEG lists may have been measured using different controls. To address these issues, Venn-diaNet was developed, an analysis framework that integrates Venn diagram and network propagation to prioritize genes for experiments that have multiple DEG lists. We demonstrated that Venn-diaNet was able to reproduce research findings reported in the original papers by comparing two, three, and eight biological experiments measured in different conditions. I believe that Venn-diaNet can be very useful for researchers to determine genes for their follow-up studies. In summary, my doctoral study aimed to develop computational tools that can prioritize genes from transcriptome data. To achieve this goal, I combined network approaches with multiple heterogeneous resources in a single computational environment. All three informatics systems are deployed as software packages or web tools to support convenient access to researchers, eliminating the need for installation or learning any additional software packages.Abstract Chapter 1 Introduction 1.1 Challenges of analyzing RNA-seq data 1.1.1 Excessive amount of databases and analysis methods 1.1.2 Knowledge bias that prioritizes less relevant genes 1.1.3 Complicated experiment designs 1.2 My approach to address the challenges for the analysis of RNA-seq data 1.3 Background 1.3.1 Differentially expressed genes 1.3.2 Gene prioritization 1.4 Outline of the thesis Chapter 2 A filtering strategy that combines GRN, SNV information to enhances the gene prioritization in mouse KO studies with small number of samples 2.1 Background 2.2 Methods 2.2.1 First filter: DEG 2.2.2 Second filter: GRN 2.2.3 Third filter: Biological Pathway 2.2.4 Final filter: SNV 2.3 Results and Discussion 2.4 Discussion Chapter 3 An integration of data-fusion and text-mining strategy to prioritize context-laid genes in mouse TF KO experiments 3.1 Background 3.2 Methods 3.2.1 Selection of initial candidate genes 3.2.2 Prioritizing genes with the user context and PPI 3.3 Results and Discussion 3.3.1 Performance with the best context 3.3.2 Performance with the worst context 3.4 Discussion Chapter 4 Integrating Venn diagram to the network-based strategy for comparing multiple biological experiments 4.1 Background 4.2 Methods 4.2.1 Taking input data 4.2.2 Generating Venn diagram of DEG sets 4.2.3 Network propagation and gene ranking 4.3 Results and Discussion 4.3.1 Venn-diaNet for two experiments 4.3.2 Venn-diaNet for three experiments 4.3.3 Venn-diaNet for eight experiments 4.3.4Venn-diaNet performance with different PPI network 4.4 Discussion Chapter 5 Conclusion Bibliography 초록Docto

SNU Open Repository and Archive

PINTA: a web server for network-based gene prioritization from expression data

Author: Arner
Baldi
Chen
D. Nitsch
Franke
Hristovski
Hutz
Irizarry
J. K. Vogt
J. P. Goncalves
Kohler
L.-C. Tranchevent
Nitsch
Radivojac
S. C. Madeira
Seelow
Y. Moreau
Yu
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user

Crossref

PubMed Central

Online Research Database In Technology

plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

Author: Blin Kai
Kautsar Satria A.
Medema Marnix H.
Osbourn Anne
Suarez Duran Hernando G.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Plant specialized metabolites are chemically highly diverse, play key roles in host-microbe interactions, have important nutritional value in crops and are frequently applied as medicines. It has recently become clear that plant biosynthetic pathway-encoding genes are sometimes densely clustered in specific genomic loci: Biosynthetic gene clusters (BGCs). Here, we introduce plantiSMASH, a versatile online analysis platform that automates the identification of candidate plant BGCs. Moreover, it allows integration of transcriptomic data to prioritize candidate BGCs based on the coexpression patterns of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery.</p

Wageningen University & Research Publications

Online Research Database In Technology

ICSNPathway: identify candidate causal SNPs and pathways from genome-wide association study by one analytical framework

Author: Adzhubei
Altshuler
Ashburner
Cantor
Gregersen
J. Wang
K. Zhang
Kochi
Kumar
L. Guo
L. Zhang
McCarthy
Musone
Reiner
S. Chang
S. Cui
Schadt
Stahl
Wang
Yuan
Yue
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Genome-wide association study (GWAS) is widely utilized to identify genes involved in human complex disease or some other trait. One key challenge for GWAS data interpretation is to identify causal SNPs and provide profound evidence on how they affect the trait. Currently, researches are focusing on identification of candidate causal variants from the most significant SNPs of GWAS, while there is lack of support on biological mechanisms as represented by pathways. Although pathway-based analysis (PBA) has been designed to identify disease-related pathways by analyzing the full list of SNPs from GWAS, it does not emphasize on interpreting causal SNPs. To our knowledge, so far there is no web server available to solve the challenge for GWAS data interpretation within one analytical framework. ICSNPathway is developed to identify candidate causal SNPs and their corresponding candidate causal pathways from GWAS by integrating linkage disequilibrium (LD) analysis, functional SNP annotation and PBA. ICSNPathway provides a feasible solution to bridge the gap between GWAS and disease mechanism study by generating hypothesis of SNP → gene → pathway(s). The ICSNPathway server is freely available at http://icsnpathway.psych.ac.cn/

CiteSeerX

Crossref

PubMed Central

Institute of Psychology,Chinese Academy Of Sciences