79 research outputs found
Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications.
Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA binding domain sequencing (MBD-seq). We applied all four methods to biological replicates of human embryonic stem cells to assess their genome-wide CpG coverage, resolution, cost, concordance and the influence of CpG density and genomic context. The methylation levels assessed by the two bisulfite methods were concordant (their difference did not exceed a given threshold) for 82% for CpGs and 99% of the non-CpG cytosines. Using binary methylation calls, the two enrichment methods were 99% concordant and regions assessed by all four methods were 97% concordant. We combined MeDIP-seq with methylation-sensitive restriction enzyme (MRE-seq) sequencing for comprehensive methylome coverage at lower cost. This, along with RNA-seq and ChIP-seq of the ES cells enabled us to detect regions with allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression
Rare loss of function variants in candidate genes and risk of colorectal cancer
Although ~ 25% of colorectal cancer or polyp (CRC/P) cases show familial aggregation, current germline genetic testing identifies a causal genotype in the 16 major genes associated with high penetrance CRC/P in only 20% of these cases. As there are likely other genes underlying heritable CRC/P, we evaluated the association of variation at novel loci with CRC/P. We evaluated 158 a priori selected candidate genes by comparing the number of rare potentially disruptive variants (PDVs) found in 84 CRC/P cases without an identified CRC/P risk-associated variant and 2440 controls. We repeated this analysis using an additional 73 CRC/P cases. We also compared the frequency of PDVs in select genes among CRC/P cases with two publicly available data sets. We found a significant enrichment of PDVs in cases vs. controls: 20% of cases vs. 11.5% of controls with ≥ 1 PDV (OR = 1.9, p = 0.01) in the original set of cases. Among the second cohort of CRC/P cases, 18% had a PDV, significantly different from 11.5% (p = 0.02). Logistic regression, adjusting for ancestry and multiple testing, indicated association between CRC/P and PDVs in NTHL1 (p = 0.0001), BRCA2 (p = 0.01) and BRIP1 (p = 0.04). However, there was no significant difference in the frequency of PDVs at each of these genes between all 157 CRC/P cases and two publicly available data sets. These results suggest an increased presence of PDVs in CRC/P cases and support further investigation of the association of NTHL1, BRCA2 and BRIP1 variation with CRC/P
Actionable, Pathogenic Incidental Findings in 1,000 Participants’ Exomes
The incorporation of genomics into medicine is stimulating interest on the return of incidental findings (IFs) from exome and genome sequencing. However, no large-scale study has yet estimated the number of expected actionable findings per individual; therefore, we classified actionable pathogenic single-nucleotide variants in 500 European- and 500 African-descent participants randomly selected from the National Heart, Lung, and Blood Institute Exome Sequencing Project. The 1,000 individuals were screened for variants in 114 genes selected by an expert panel for their association with medically actionable genetic conditions possibly undiagnosed in adults. Among the 1,000 participants, 585 instances of 239 unique variants were identified as disease causing in the Human Gene Mutation Database (HGMD). The primary literature supporting the variants’ pathogenicity was reviewed. Of the identified IFs, only 16 unique autosomal-dominant variants in 17 individuals were assessed to be pathogenic or likely pathogenic, and one participant had two pathogenic variants for an autosomal-recessive disease. Furthermore, one pathogenic and four likely pathogenic variants not listed as disease causing in HGMD were identified. These data can provide an estimate of the frequency (∼3.4% for European descent and ∼1.2% for African descent) of the high-penetrance actionable pathogenic or likely pathogenic variants in adults. The 23 participants with pathogenic or likely pathogenic variants were disproportionately of European (17) versus African (6) descent. The process of classifying these variants underscores the need for a more comprehensive and diverse centralized resource to provide curated information on pathogenicity for clinical use to minimize health disparities in genomic medicine
Camtrap DP: an open standard for the FAIR exchange and archiving of camera trap data
Camera trapping has revolutionized wildlife ecology and conservation by providing automated data acquisition, leading to the accumulation of massive amounts of camera trap data worldwide. Although management and processing of camera trap-derived Big Data are becoming increasingly solvable with the help of scalable cyber-infrastructures, harmonization and exchange of the data remain limited, hindering its full potential. There is currently no widely accepted standard for exchanging camera trap data. The only existing proposal, “Camera Trap Metadata Standard” (CTMS), has several technical shortcomings and limited adoption. We present a new data exchange format, the Camera Trap Data Package (Camtrap DP), designed to allow users to easily exchange, harmonize and archive camera trap data at local to global scales. Camtrap DP structures camera trap data in a simple yet flexible data model consisting of three tables (Deployments, Media and Observations) that supports a wide range of camera deployment designs, classification techniques (e.g., human and AI, media-based and event-based) and analytical use cases, from compiling species occurrence data through distribution, occupancy and activity modeling to density estimation. The format further achieves interoperability by building upon existing standards, Frictionless Data Package in particular, which is supported by a suite of open software tools to read and validate data. Camtrap DP is the consensus of a long, in-depth, consultation and outreach process with standard and software developers, the main existing camera trap data management platforms, major players in the field of camera trapping and the Global Biodiversity Information Facility (GBIF). Under the umbrella of the Biodiversity Information Standards (TDWG), Camtrap DP has been developed openly, collaboratively and with version control from the start. We encourage camera trapping users and developers to join the discussion and contribute to the further development and adoption of this standard. Biodiversity data, camera traps, data exchange, data sharing, information standardspublishedVersio
Characterization of the Contradictory Chromatin Signatures at the 3′ Exons of Zinc Finger Genes
The H3K9me3 histone modification is often found at promoter regions, where it functions to repress transcription. However, we have previously shown that 3′ exons of zinc finger genes (ZNFs) are marked by high levels of H3K9me3. We have now further investigated this unusual location for H3K9me3 in ZNF genes. Neither bioinformatic nor experimental approaches support the hypothesis that the 3′ exons of ZNFs are promoters. We further characterized the histone modifications at the 3′ ZNF exons and found that these regions also contain H3K36me3, a mark of transcriptional elongation. A genome-wide analysis of ChIP-seq data revealed that ZNFs constitute the majority of genes that have high levels of both H3K9me3 and H3K36me3. These results suggested the possibility that the ZNF genes may be imprinted, with one allele transcribed and one allele repressed. To test the hypothesis that the contradictory modifications are due to imprinting, we used a SNP analysis of RNA-seq data to demonstrate that both alleles of certain ZNF genes having H3K9me3 and H3K36me3 are transcribed. We next analyzed isolated ZNF 3′ exons using stably integrated episomes. We found that although the H3K36me3 mark was lost when the 3′ ZNF exon was removed from its natural genomic location, the isolated ZNF 3′ exons retained the H3K9me3 mark. Thus, the H3K9me3 mark at ZNF 3′ exons does not impede transcription and it is regulated independently of the H3K36me3 mark. Finally, we demonstrate a strong relationship between the number of tandemly repeated domains in the 3′ exons and the H3K9me3 mark. We suggest that the H3K9me3 at ZNF 3′ exons may function to protect the genome from inappropriate recombination rather than to regulate transcription
Camtrap DP: an open standard for the FAIR exchange and archiving of camera trap data
Camera trapping has revolutionized wildlife ecology and conservation by providing automated data acquisition, leading to the accumulation of massive amounts of camera trap data worldwide. Although management and processing of camera trap-derived Big Data are becoming increasingly solvable with the help of scalable cyber-infrastructures, harmonization and exchange of the data remain limited, hindering its full potential. There is currently no widely accepted standard for exchanging camera trap data. The only existing proposal, “Camera Trap Metadata Standard” (CTMS), has several technical shortcomings and limited adoption. We present a new data exchange format, the Camera Trap Data Package (Camtrap DP), designed to allow users to easily exchange, harmonize and archive camera trap data at local to global scales. Camtrap DP structures camera trap data in a simple yet flexible data model consisting of three tables (Deployments, Media and Observations) that supports a wide range of camera deployment designs, classification techniques (e.g., human and AI, media-based and event-based) and analytical use cases, from compiling species occurrence data through distribution, occupancy and activity modeling to density estimation. The format further achieves interoperability by building upon existing standards, Frictionless Data Package in particular, which is supported by a suite of open software tools to read and validate data. Camtrap DP is the consensus of a long, in-depth, consultation and outreach process with standard and software developers, the main existing camera trap data management platforms, major players in the field of camera trapping and the Global Biodiversity Information Facility (GBIF). Under the umbrella of the Biodiversity Information Standards (TDWG), Camtrap DP has been developed openly, collaboratively and with version control from the start. We encourage camera trapping users and developers to join the discussion and contribute to the further development and adoption of this standar
Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas
This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing
molecular features of squamous cell carcinomas (SCCs) from five sites associated with smokin
Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context
Long noncoding RNAs (lncRNAs) are commonly dys-regulated in tumors, but only a handful are known toplay pathophysiological roles in cancer. We inferredlncRNAs that dysregulate cancer pathways, onco-genes, and tumor suppressors (cancer genes) bymodeling their effects on the activity of transcriptionfactors, RNA-binding proteins, and microRNAs in5,185 TCGA tumors and 1,019 ENCODE assays.Our predictions included hundreds of candidateonco- and tumor-suppressor lncRNAs (cancerlncRNAs) whose somatic alterations account for thedysregulation of dozens of cancer genes and path-ways in each of 14 tumor contexts. To demonstrateproof of concept, we showed that perturbations tar-geting OIP5-AS1 (an inferred tumor suppressor) andTUG1 and WT1-AS (inferred onco-lncRNAs) dysre-gulated cancer genes and altered proliferation ofbreast and gynecologic cancer cells. Our analysis in-dicates that, although most lncRNAs are dysregu-lated in a tumor-specific manner, some, includingOIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergis-tically dysregulate cancer pathways in multiple tumorcontexts
Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas
Although theMYConcogene has been implicated incancer, a systematic assessment of alterations ofMYC, related transcription factors, and co-regulatoryproteins, forming the proximal MYC network (PMN),across human cancers is lacking. Using computa-tional approaches, we define genomic and proteo-mic features associated with MYC and the PMNacross the 33 cancers of The Cancer Genome Atlas.Pan-cancer, 28% of all samples had at least one ofthe MYC paralogs amplified. In contrast, the MYCantagonists MGA and MNT were the most frequentlymutated or deleted members, proposing a roleas tumor suppressors.MYCalterations were mutu-ally exclusive withPIK3CA,PTEN,APC,orBRAFalterations, suggesting that MYC is a distinct onco-genic driver. Expression analysis revealed MYC-associated pathways in tumor subtypes, such asimmune response and growth factor signaling; chro-matin, translation, and DNA replication/repair wereconserved pan-cancer. This analysis reveals insightsinto MYC biology and is a reference for biomarkersand therapeutics for cancers with alterations ofMYC or the PMN
Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images
Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images
of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL
maps are derived through computational staining using a convolutional neural network trained to
classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and
correlation with overall survival. TIL map structural patterns were grouped using standard
histopathological parameters. These patterns are enriched in particular T cell subpopulations
derived from molecular measures. TIL densities and spatial structure were differentially enriched
among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial
infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic
patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for
the TCGA image archives with insights into the tumor-immune microenvironment
- …