284 research outputs found
Semiparametric Estimation of Spectral Density With Irregular Observations
We propose a semiparametric method to estimate spectral densities of isotropic Gaussian processes with scattered data. The spectral density function (Fourier transform of the covariance function) is modeled as a linear combination of B-splines up to a cutoff frequency and, from this point, a truncated algebraic tail. We calculate an analytic expression for the covariance function and tackle several numerical issues that arise when calculating the likelihood. The parameters are estimated by maximizing the likelihood using the simulated annealing method. Our method directly estimates the tail behavior of the spectral density, which has the biggest impact on interpolation properties. The use of the likelihood in parameter estimation takes fully into account the correlations between observations. We compare our method with a kernel method proposed by Hall et al. (1994) and a parametric method using the Matern model. Simulation results show that our method outperforms the other two by several criteria. Application to rainfall data shows that our method outperforms the kernel method
Polygenic transcriptome risk scores (PTRS) can improve portability of polygenic risk scores across ancestries
Background: Polygenic risk scores (PRS) are valuable to translate the results of genome-wide association studies (GWAS) into clinical practice. To date, most GWAS have been based on individuals of European-ancestry leading to poor performance in populations of non-European ancestry. Results: We introduce the polygenic transcriptome risk score (PTRS), which is based on predicted transcript levels (rather than SNPs), and explore the portability of PTRS across populations using UK Biobank data. Conclusions: We show that PTRS has a significantly higher portability (Wilcoxon p=0.013) in the African-descent samples where the loss of performance is most acute with better performance than PRS when used in combination
Recommended from our members
Mixed Effects Modeling of Proliferation Rates in Cell-Based Models: Consequence for Pharmacogenomics and Cancer
The International HapMap project has made publicly available extensive genotypic data on a number of lymphoblastoid cell lines (LCLs). Building on this resource, many research groups have generated a large amount of phenotypic data on these cell lines to facilitate genetic studies of disease risk or drug response. However, one problem that may reduce the usefulness of these resources is the biological noise inherent to cellular phenotypes. We developed a novel method, termed Mixed Effects Model Averaging (MEM), which pools data from multiple sources and generates an intrinsic cellular growth rate phenotype. This intrinsic growth rate was estimated for each of over 500 HapMap cell lines. We then examined the association of this intrinsic growth rate with gene expression levels and found that almost 30% (2,967 out of 10,748) of the genes tested were significant with FDR less than 10%. We probed further to demonstrate evidence of a genetic effect on intrinsic growth rate by determining a significant enrichment in growth-associated genes among genes targeted by top growth-associated SNPs (as eQTLs). The estimated intrinsic growth rate as well as the strength of the association with genetic variants and gene expression traits are made publicly available through a cell-based pharmacogenomics database, PACdb. This resource should enable researchers to explore the mediating effects of proliferation rate on other phenotypes.</p
Recommended from our members
ExprTarget: An Integrative Approach to Predicting Human MicroRNA Targets
Variation in gene expression has been observed in natural populations and associated with complex traits or phenotypes such as disease susceptibility and drug response. Gene expression itself is controlled by various genetic and non-genetic factors. The binding of a class of small RNA molecules, microRNAs (miRNAs), to mRNA transcript targets has recently been demonstrated to be an important mechanism of gene regulation. Because individual miRNAs may regulate the expression of multiple gene targets, a comprehensive and reliable catalogue of miRNA-regulated targets is critical to understanding gene regulatory networks. Though experimental approaches have been used to identify many miRNA targets, due to cost and efficiency, current miRNA target identification still relies largely on computational algorithms that aim to take advantage of different biochemical/thermodynamic properties of the sequences of miRNAs and their gene targets. A novel approach, ExprTarget, therefore, is proposed here to integrate some of the most frequently invoked methods (miRanda, PicTar, TargetScan) as well as the genome-wide HapMap miRNA and mRNA expression datasets generated in our laboratory. To our knowledge, this dataset constitutes the first miRNA expression profiling in the HapMap lymphoblastoid cell lines. We conducted diagnostic tests of the existing computational solutions using the experimentally supported targets in TarBase as gold standard. To gain insight into the biases that arise from such an analysis, we investigated the effect of the choice of gold standard on the evaluation of the various computational tools. We analyzed the performance of ExprTarget using both ROC curve analysis and cross-validation. We show that ExprTarget greatly improves miRNA target prediction relative to the individual prediction algorithms in terms of sensitivity and specificity. We also developed an online database, ExprTargetDB, of human miRNA targets predicted by our approach that integrates gene expression profiling into a broader framework involving important features of miRNA target site predictions.</p
Genetic Architecture of Gene Expression Traits Across Diverse Populations
For many complex traits, gene regulation is likely to play a crucial mechanistic role. How the genetic architectures of complex traits vary between populations and subsequent effects on genetic prediction are not well understood, in part due to the historical paucity of GWAS in populations of non-European ancestry. We used data from the MESA (Multi-Ethnic Study of Atherosclerosis) cohort to characterize the genetic architecture of gene expression within and between diverse populations. Genotype and monocyte gene expression were available in individuals with African American (AFA, n = 233), Hispanic (HIS, n = 352), and European (CAU, n = 578) ancestry. We performed expression quantitative trait loci (eQTL) mapping in each population and show genetic correlation of gene expression depends on shared ancestry proportions. Using elastic net modeling with cross validation to optimize genotypic predictors of gene expression in each population, we show the genetic architecture of gene expression for most predictable genes is sparse. We found the best predicted gene in each population, TACSTD2 in AFA and CHURC1 in CAU and HIS, had similar prediction performance across populations with R2 \u3e 0.8 in each population. However, we identified a subset of genes that are well-predicted in one population, but poorly predicted in another. We show these differences in predictive performance are due to allele frequency differences between populations. Using genotype weights trained in MESA to predict gene expression in independent populations showed that a training set with ancestry similar to the test set is better at predicting gene expression in test populations, demonstrating an urgent need for diverse population sampling in genomics. Our predictive models and performance statistics in diverse cohorts are made publicly available for use in transcriptome mapping methods at https://github.com/WheelerLab/DivPop
Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx
Gene expression and its regulation can vary substantially across tissue types. In order to generate knowledge about gene expression in human tissues, the Genotype-Tissue Expression (GTEx) program has collected transcriptome data in a wide variety of tissue types from post-mortem donors. However, many tissue types are difficult to access and are not collected in every GTEx individual. Furthermore, in non-GTEx studies, the accessibility of certain tissue types greatly limits the feasibility and scale of studies of multi-tissue expression. In this work, we developed multi-tissue imputation methods to impute gene expression in uncollected or inaccessible tissues. Via simulation studies, we showed that the proposed methods outperform existing imputation methods in multi-tissue expression imputation and that incorporating imputed expression data can improve power to detect phenotype-expression correlations. By analyzing data from nine selected tissue types in the GTEx pilot project, we demonstrated that harnessing expression quantitative trait loci (eQTLs) and tissue-tissue expression-level correlations can aid imputation of transcriptome data from uncollected GTEx tissues. More importantly, we showed that by using GTEx data as a reference, one can impute expression levels in inaccessible tissues in non-GTEx expression studies
Transcriptome Prediction Performance Across Machine Learning Models and Diverse Ancestries
Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits
- …