15 research outputs found
Genetic effects on gene expression across human tissues
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of diseas
Genetic effects on gene expression across human tissues
Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease
Recommended from our members
RNAget: an API to securely retrieve RNA quantifications
Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays. Further, it generalizes to quantification matrices of other sequence-based genomics such as ATAC-seq and ChIP-seq
Manifest-based DRS import: A practical solution for cross-DCC dataset analysis to empower translational discovery using Kids First and GTEx data
A key challenge in data discovery is the coordination and assembly of datasets from across Common Fund Data Ecosystem (CFDE) Data Coordination Centers (DCC) in an easy to use and meaningful manner to accelerate usage by researchers. We have implemented a manifest-based import on our CAVATICA platform for a user to create a cross-Common Fund dataset cohort and combine the results with their own data in order to accelerate platform-based discovery and clinical translation. We propose the required field: drs_uri followed by these optional fields: file_name, study_registration, study_id, participant_id, specimen_id, experimental_strategy, file_format and fhir_document_reference. The study_registration is the external source of the study_id (e.g. dbGaP). The study_id, participant_id and specimen_id fields are unique identifiers that can be used to retrieve more information. The experimental_strategy and file_format fields are based on the Genomics Data Commons definitions. The fhir_document_reference points to the FHIR Document Reference, if metadata is available on a FHIR server. This process provides an efficient method to import a list of DRS URIs along with relevant metadata. In this use case, a manifest is created from the Common Fund Data Ecosystem portal with GTEx and Kids First (KF) neuroblastoma RNA sequencing assays and brought into a collaborative CAVATICA workspace. The data authorization aspect is managed by CAVATICA. For KF and GTEx datasets which have controlled access, the user’s dbGaP access authorizations are checked and the data becomes accessible only if the user has proper authorization. Authorized users can choose to run their own pipelines or use a KF standard pipeline to harmonize and analyze the combined data set. This use case demonstrates how a user can easily search for and generate a cohort across a federated DCC resource framework followed by DRS-based import into CAVATICA collaborative workspace for democratized access and translational knowledge mining.</p
FAIRshake: toolkit to evaluate the findability, accessibility, interoperability, and reusability of research digital resources
As more datasets, tools, workflows, APIs, and other digital resources are produced by the research community, it is becoming increasingly difficult to harmonize and organize these efforts for maximal synergistic integrated utilization. The Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles have prompted many stakeholders to consider strategies for tackling this challenge by making these digital resources follow common standards and best practices so that they can become more integrated and organized. Faced with the question of how to make digital resources more FAIR, it has become imperative to measure what it means to be FAIR. The diversity of resources, communities, and stakeholders have different goals and use cases and this makes assessment of FAIRness particularly challenging. To begin resolving this challenge, the FAIRshake toolkit was developed to enable the establishment of community-driven FAIR metrics and rubrics paired with manual, semi- and fully-automated FAIR assessment capabilities. The FAIRshake toolkit contains a database that lists registered digital resources, with their associated metrics, rubrics, and assessments. The FAIRshake toolkit also has a browser extension and a bookmarklet that enables viewing and submitting assessments from any website. The FAIR assessment results are visualized as an insignia that can be viewed on the FAIRshake website, or embedded within hosting websites. Using FAIRshake, a variety of bioinformatics tools, datasets listed on dbGaP, APIs registered in SmartAPI, workflows in Dockstore, and other biomedical digital resources were manually and automatically assessed for FAIRness. In each case, the assessments revealed room for improvement, which prompted enhancements that significantly upgraded FAIRness scores of several digital resources
Effect of predicted protein-truncating genetic variants on the human transcriptome
Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants
Recommended from our members
Landscape of X chromosome inactivation across human tissues
X chromosome inactivation (XCI) silences transcription from one of the two X chromosomes in female mammalian cells to balance expression dosage between XX females and XY males. XCI is, however, incomplete in humans: up to one-third of X-chromosomal genes are expressed from both the active and inactive X chromosomes (Xa and Xi, respectively) in female cells, with the degree of 'escape' from inactivation varying between genes and individuals. The extent to which XCI is shared between cells and tissues remains poorly characterized, as does the degree to which incomplete XCI manifests as detectable sex differences in gene expression and phenotypic traits. Here we describe a systematic survey of XCI, integrating over 5,500 transcriptomes from 449 individuals spanning 29 tissues from GTEx (v6p release) and 940 single-cell transcriptomes, combined with genomic sequence data. We show that XCI at 683 X-chromosomal genes is generally uniform across human tissues, but identify examples of heterogeneity between tissues, individuals and cells. We show that incomplete XCI affects at least 23% of X-chromosomal genes, identify seven genes that escape XCI with support from multiple lines of evidence and demonstrate that escape from XCI results in sex biases in gene expression, establishing incomplete XCI as a mechanism that is likely to introduce phenotypic diversity. Overall, this updated catalogue of XCI across human tissues helps to increase our understanding of the extent and impact of the incompleteness in the maintenance of XCI
Recommended from our members
Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis
The impact of inherited genetic variation on gene expression in humans is well-established. The majority of known expression quantitative trait loci (eQTLs) impact expression of local genes (cis-eQTLs). More research is needed to identify effects of genetic variation on distant genes (trans-eQTLs) and understand their biological mechanisms. One common trans-eQTLs mechanism is "mediation" by a local (cis) transcript. Thus, mediation analysis can be applied to genome-wide SNP and expression data in order to identify transcripts that are "cis-mediators" of trans-eQTLs, including those "cis-hubs" involved in regulation of many trans-genes. Identifying such mediators helps us understand regulatory networks and suggests biological mechanisms underlying trans-eQTLs, both of which are relevant for understanding susceptibility to complex diseases. The multitissue expression data from the Genotype-Tissue Expression (GTEx) program provides a unique opportunity to study cis-mediation across human tissue types. However, the presence of complex hidden confounding effects in biological systems can make mediation analyses challenging and prone to confounding bias, particularly when conducted among diverse samples. To address this problem, we propose a new method: Genomic Mediation analysis with Adaptive Confounding adjustment (GMAC). It enables the search of a very large pool of variables, and adaptively selects potential confounding variables for each mediation test. Analyses of simulated data and GTEx data demonstrate that the adaptive selection of confounders by GMAC improves the power and precision of mediation analysis. Application of GMAC to GTEx data provides new insights into the observed patterns of cis-hubs and trans-eQTL regulation across tissue types
Recommended from our members
Co-expression networks reveal the tissue-specific regulation of transcription and splicing
Gene co-expression networks capture biologically important patterns in gene expression data, enabling functional analyses of genes, discovery of biomarkers, and interpretation of genetic variants. Most network analyses to date have been limited to assessing correlation between total gene expression levels in a single tissue or small sets of tissues. Here, we built networks that additionally capture the regulation of relative isoform abundance and splicing, along with tissue-specific connections unique to each of a diverse set of tissues. We used the Genotype-Tissue Expression (GTEx) project v6 RNA sequencing data across 50 tissues and 449 individuals. First, we developed a framework called Transcriptome-Wide Networks (TWNs) for combining total expression and relative isoform levels into a single sparse network, capturing the interplay between the regulation of splicing and transcription. We built TWNs for 16 tissues and found that hubs in these networks were strongly enriched for splicing and RNA binding genes, demonstrating their utility in unraveling regulation of splicing in the human transcriptome. Next, we used a Bayesian biclustering model that identifies network edges unique to a single tissue to reconstruct Tissue-Specific Networks (TSNs) for 26 distinct tissues and 10 groups of related tissues. Finally, we found genetic variants associated with pairs of adjacent nodes in our networks, supporting the estimated network structures and identifying 20 genetic variants with distant regulatory impact on transcription and splicing. Our networks provide an improved understanding of the complex relationships of the human transcriptome across tissues
Recommended from our members
Dynamic landscape and regulation of RNA editing in mammals
Adenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules. Although many editing sites have recently been discovered, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples. We show that editing levels in non-repetitive coding regions vary more between tissues than editing levels in repetitive regions. Globally, ADAR1 is the primary editor of repetitive sites and ADAR2 is the primary editor of non-repetitive coding sites, whereas the catalytically inactive ADAR3 predominantly acts as an inhibitor of editing. Cross-species analysis of RNA editing in several tissues revealed that species, rather than tissue type, is the primary determinant of editing levels, suggesting stronger cis-directed regulation of RNA editing for most sites, although the small set of conserved coding sites is under stronger trans-regulation. In addition, we curated an extensive set of ADAR1 and ADAR2 targets and showed that many editing sites display distinct tissue-specific regulation by the ADAR enzymes in vivo. Further analysis of the GTEx data revealed several potential regulators of editing, such as AIMP2, which reduces editing in muscles by enhancing the degradation of the ADAR proteins. Collectively, our work provides insights into the complex cis- and trans-regulation of A-to-I editing