Search CORE

10,553 research outputs found

RNA-Seq optimization with eQTL gold standards.

Author: Arking Dan E
Ashar Foram N
Bader Joel S
Ellis Shannon E
Gupta Simone
West Andrew B
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

BackgroundRNA-Sequencing (RNA-Seq) experiments have been optimized for library preparation, mapping, and gene expression estimation. These methods, however, have revealed weaknesses in the next stages of analysis of differential expression, with results sensitive to systematic sample stratification or, in more extreme cases, to outliers. Further, a method to assess normalization and adjustment measures imposed on the data is lacking.ResultsTo address these issues, we utilize previously published eQTLs as a novel gold standard at the center of a framework that integrates DNA genotypes and RNA-Seq data to optimize analysis and aid in the understanding of genetic variation and gene expression. After detecting sample contamination and sequencing outliers in RNA-Seq data, a set of previously published brain eQTLs was used to determine if sample outlier removal was appropriate. Improved replication of known eQTLs supported removal of these samples in downstream analyses. eQTL replication was further employed to assess normalization methods, covariate inclusion, and gene annotation. This method was validated in an independent RNA-Seq blood data set from the GTEx project and a tissue-appropriate set of eQTLs. eQTL replication in both data sets highlights the necessity of accounting for unknown covariates in RNA-Seq data analysis.ConclusionAs each RNA-Seq experiment is unique with its own experiment-specific limitations, we offer an easily-implementable method that uses the replication of known eQTLs to guide each step in one's data analysis pipeline. In the two data sets presented herein, we highlight not only the necessity of careful outlier detection but also the need to account for unknown covariates in RNA-Seq experiments

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Robust identification of local adaptation from allele frequencies

Author: Graham Coop
Huxley
Jones
Lewontin
Limborg
Poncet
Reynolds
Robertson
Torsten Günther
Publication venue
Publication date: 13/09/2012
Field of study

Comparing allele frequencies among populations that differ in environment has long been a tool for detecting loci involved in local adaptation. However, such analyses are complicated by an imperfect knowledge of population allele frequencies and neutral correlations of allele frequencies among populations due to shared population history and gene flow. Here we develop a set of methods to robustly test for unusual allele frequency patterns, and correlations between environmental variables and allele frequencies while accounting for these complications based on a Bayesian model previously implemented in the software Bayenv. Using this model, we calculate a set of `standardized allele frequencies' that allows investigators to apply tests of their choice to multiple populations, while accounting for sampling and covariance due to population history. We illustrate this first by showing that these standardized frequencies can be used to calculate powerful tests to detect non-parametric correlations with environmental variables, which are also less prone to spurious results due to outlier populations. We then demonstrate how these standardized allele frequencies can be used to construct a test to detect SNPs that deviate strongly from neutral population structure. This test is conceptually related to FST but should be more powerful as we account for population history. We also extend the model to next-generation sequencing of population pools, which is a cost-efficient way to estimate population allele frequencies, but it implies an additional level of sampling noise. The utility of these methods is demonstrated in simulations and by re-analyzing human SNP data from the HGDP populations. An implementation of our method will be available from http://gcbias.org.Comment: 27 pages, 7 figure

arXiv.org e-Print Archive

Crossref

PubMed Central

eScholarship - University of California

Recommended from our members

Exome resequencing and GWAS for growth, ecophysiology, and chemical and metabolomic composition of wood of Populus trichocarpa.

Author: Davis Mark F
Famula Randi
Fiehn Oliver
Guerra Fernando P
Holliday Jason
Neale David B
Richards James H
Shuren Richard
Stanton Brian J
Suren Haktan
Sykes Robert
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

BackgroundPopulus trichocarpa is an important forest tree species for the generation of lignocellulosic ethanol. Understanding the genomic basis of biomass production and chemical composition of wood is fundamental in supporting genetic improvement programs. Considerable variation has been observed in this species for complex traits related to growth, phenology, ecophysiology and wood chemistry. Those traits are influenced by both polygenic control and environmental effects, and their genome architecture and regulation are only partially understood. Genome wide association studies (GWAS) represent an approach to advance that aim using thousands of single nucleotide polymorphisms (SNPs). Genotyping using exome capture methodologies represent an efficient approach to identify specific functional regions of genomes underlying phenotypic variation.ResultsWe identified 813 K SNPs, which were utilized for genotyping 461 P. trichocarpa clones, representing 101 provenances collected from Oregon and Washington, and established in California. A GWAS performed on 20 traits, considering single SNP-marker tests identified a variable number of significant SNPs (p-value < 6.1479E-8) in association with diameter, height, leaf carbon and nitrogen contents, and δ15N. The number of significant SNPs ranged from 2 to 220 per trait. Additionally, multiple-marker analyses by sliding-windows tests detected between 6 and 192 significant windows for the analyzed traits. The significant SNPs resided within genes that encode proteins belonging to different functional classes as such protein synthesis, energy/metabolism and DNA/RNA metabolism, among others.ConclusionsSNP-markers within genes associated with traits of importance for biomass production were detected. They contribute to characterize the genomic architecture of P. trichocarpa biomass required to support the development and application of marker breeding technologies

eScholarship - University of California

Evaluation of polygenic determinants of non-alcoholic fatty liver disease (NAFLD) by a candidate genes resequencing strategy

Author: Angelico F
Angeloni A
Arca M
Bailetti D
Baratta F
Belardinilli F
Ceci F
D'Erasmo L
De Masi B
Del Ben M
Di Costanzo A
Giannini G
Girelli G
Montali A
Pastori D
Polimeni L
Sponziello M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

NAFLD is a polygenic condition but the individual and cumulative contribution of identified genes remains to be established. To get additional insight into the genetic architecture of NAFLD, GWAS-identified GCKR, PPP1R3B, NCAN, LYPLAL1 and TM6SF2 genes were resequenced by next generation sequencing in a cohort of 218 NAFLD subjects and 227 controls, where PNPLA3 rs738409 and MBOAT7 rs641738 genotypes were also obtained. A total of 168 sequence variants were detected and 47 were annotated as functional. When all functional variants within each gene were considered, only those in TM6SF2 accumulate in NAFLD subjects compared to controls (P = 0.04). Among individual variants, rs1260326 in GCKR and rs641738 in MBOAT7 (recessive), rs58542926 in TM6SF2 and rs738409 in PNPLA3 (dominant) emerged as associated to NAFLD, with PNPLA3 rs738409 being the strongest predictor (OR 3.12, 95% CI, 1.8-5.5, P 0.28 was associated with a 3-fold increased risk of NAFLD. Interestingly, rs61756425 in PPP1R3B and rs641738 in MBOAT7 genes were predictors of NAFLD severity. Overall, TM6SF2, GCKR, PNPLA3 and MBOAT7 were confirmed to be associated with NAFLD and a score based on these genes was highly predictive of this condition. In addition, PPP1R3B and MBOAT7 might influence NAFLD severity

Archivio della ricerca- Università di Roma La Sapienza

Evaluation of gene-based family-based methods to detect novel genes associated with familial late onset Alzheimer disease

Author: Budde John
Cruchaga Carlos
Del-Aguila Jorge L
Deming Yuetiva
Fernandez Maria V
Goate Alison M
Harari Oscar
Ibanez Laura
Morris John C
NCRAD
NIA-LOAD family study group
Norton Joanne
Publication venue: Digital Commons@Becker
Publication date: 01/01/2018
Field of study

Digital Commons@Becker

Frontiers - Publisher Connector

Genome-wide association study in two cohorts from a multi-generational mouse advanced intercross line highlights the difficulty of replication due to study-specific heterogeneity

Author: Cheng Riyan
Chitre Apurva S
Gonzales Natalia M
Palmer Abraham A
Sokoloff Greta
St Pierre Celine L
Zhou Xinzhu
Zou Jennifer
Publication venue: Digital Commons@Becker
Publication date: 01/03/2020
Field of study

There has been extensive discussion of the Replication Crisis in many fields, including genome-wide association studies

Directory of Open Access Journals

Digital Commons@Becker

eScholarship - University of California

The Decay of Disease Association with Declining Linkage Disequilibrium: A Fine Mapping Theorem

Author: Bansal Naveen K.
Farazi Manzur R.
He Max M.
Hebbring Scott J.
Li Xiang
Maadooliat Mehdi
Schrodi Steven J.
Upadhya Jibal
Ye Zhan
Publication venue: e-Publications@Marquette
Publication date: 01/01/2016
Field of study

Several important and fundamental aspects of disease genetics models have yet to be described. One such property is the relationship of disease association statistics at a marker site closely linked to a disease causing site. A complete description of this two-locus system is of particular importance to experimental efforts to fine map association signals for complex diseases. Here, we present a simple relationship between disease association statistics and the decline of linkage disequilibrium from a causal site. Specifically, the ratio of Chi-square disease association statistics at a marker site and causal site is equivalent to the standard measure of pairwise linkage disequilibrium, r2. A complete derivation of this relationship from a general disease model is shown. Quite interestingly, this relationship holds across all modes of inheritance. Extensive Monte Carlo simulations using a disease genetics model applied to chromosomes subjected to a standard model of recombination are employed to better understand the variation around this fine mapping theorem due to sampling effects. We also use this relationship to provide a framework for estimating properties of a non-interrogated causal site using data at closely linked markers. Lastly, we apply this way of examining association data from high-density genotyping in a large, publicly-available data set investigating extreme BMI. We anticipate that understanding the patterns of disease association decay with declining linkage disequilibrium from a causal site will enable more powerful fine mapping methods and provide new avenues for identifying causal sites/genes from fine-mapping studies

epublications@Marquette

Frontiers - Publisher Connector

PubMed Central

Recommended from our members

Kevlar: A Mapping-Free Framework for Accurate Discovery of De Novo Variants.

Author: Brown C Titus
Hormozdiari Fereydoun
Standage Daniel S
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

De novo genetic variants are an important source of causative variation in complex genetic disorders. Many methods for variant discovery rely on mapping reads to a reference genome, detecting numerous inherited variants irrelevant to the phenotype of interest. To distinguish between inherited and de novo variation, sequencing of families (parents and siblings) is commonly pursued. However, standard mapping-based approaches tend to have a high false-discovery rate for de novo variant prediction. Kevlar is a mapping-free method for de novo variant discovery, based on direct comparison of sequences between related individuals. Kevlar identifies high-abundance k-mers unique to the individual of interest. Reads containing these k-mers are partitioned into disjoint sets by shared k-mer content for variant calling, and preliminary variant predictions are sorted using a probabilistic score. We evaluated Kevlar on simulated and real datasets, demonstrating its ability to detect both de novo single-nucleotide variants and indels with high accuracy

eScholarship - University of California

The Genomic HyperBrowser: inferential genomics at the sequence level

Author: Clancy Trevor
Ferkingstad Egil
Frigessi Arnoldo
Glad Ingrid K.
Gundersen Sveinung
Holden Lars
Holden Marit
Hovig Eivind
Johansen Morten
Liestøl Knut
Nygaard Vegard
Rydbeck Halfdan
Sandve Geir K.
Tøstesen Eivind
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no

arXiv.org e-Print Archive

Springer - Publisher Connector

PubMed Central

NORA - Norwegian Open Research Archives