13 research outputs found
NovoGraph: Human genome graph construction from multiple long-read de novo assemblies [version 2; referees: 2 approved]
Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped
Exome Sequencing Implicates Ancestry-Related Mendelian Variation at SYNE1 in Childhood-Onset Essential Hypertension
Childhood-onset essential hypertension (COEH) is an uncommon form of hypertension that manifests in childhood or adolescence and, in the United States, disproportionately affects children of African ancestry. The etiology of COEH is unknown, but its childhood onset, low prevalence, high heritability, and skewed ancestral demography suggest the potential to identify rare genetic variation segregating in a Mendelian manner among affected individuals and thereby implicate genes important to disease pathogenesis. However, no COEH genes have been reported to date. Here, we identify recessive segregation of rare and putatively damaging missense variation in the spectrin domain of spectrin repeat containing nuclear envelope protein 1 (SYNE1), a cardiovascular candidate gene, in 3 of 16 families with early-onset COEH without an antecedent family history. By leveraging exome sequence data from an additional 48 COEH families, 1,700 in-house trios, and publicly available data sets, we demonstrate that compound heterozygous SYNE1 variation in these COEH individuals occurred more often than expected by chance and that this class of biallelic rare variation was significantly enriched among individuals of African genetic ancestry. Using in vitro shRNA knockdown of SYNE1, we show that reduced SYNE1 expression resulted in a substantial decrease in the elasticity of smooth muscle vascular cells that could be rescued by pharmacological inhibition of the downstream RhoA/Rho-associated protein kinase pathway. These results provide insights into the molecular genetics and underlying pathophysiology of COEH and suggest a role for precision therapeutics in the future
Epigenome-wide association studies identify novel DNA methylation sites associated with PTSD: A meta-analysis of 23 military and civilian cohorts
BACKGROUND: The occurrence of post-traumatic stress disorder (PTSD) following a traumatic event is associated with biological differences that can represent the susceptibility to PTSD, the impact of trauma, or the sequelae of PTSD itself. These effects include differences in DNA methylation (DNAm), an important form of epigenetic gene regulation, at multiple CpG loci across the genome. Moreover, these effects can be shared or specific to both central and peripheral tissues. Here, we aim to identify blood DNAm differences associated with PTSD and characterize the underlying biological mechanisms by examining the extent to which they mirror associations across multiple brain regions. METHODS: As the Psychiatric Genomics Consortium (PGC) PTSD Epigenetics Workgroup, we conducted the largest cross-sectional meta-analysis of epigenome-wide association studies (EWASs) of PTSD to date, involving 5077 participants (2156 PTSD cases and 2921 trauma-exposed controls) from 23 civilian and military studies. PTSD diagnosis assessments were harmonized following the standardized guidelines established by the PGC-PTSD Workgroup. DNAm was assayed from blood using either Illumina HumanMethylation450 or MethylationEPIC (850K) BeadChips. A common QC pipeline was applied. Within each cohort, DNA methylation was regressed on PTSD, sex (if applicable), age, blood cell proportions, and ancestry. An inverse variance-weighted meta-analysis was performed. We conducted replication analyses in tissue from multiple brain regions, neuronal nuclei, and a cellular model of prolonged stress. RESULTS: We identified 11 CpG sites associated with PTSD in the overall meta-analysis (1.44e-09 < p < 5.30e-08), as well as 14 associated in analyses of specific strata (military vs civilian cohort, sex, and ancestry), including CpGs in AHRR and CDC42BPB. Many of these loci exhibit blood-brain correlation in methylation levels and cross-tissue associations with PTSD in multiple brain regions. Methylation at most CpGs correlated with their annotated gene expression levels. CONCLUSIONS: This study identifies 11 PTSD-associated CpGs, also leverages data from postmortem brain samples, GWAS, and genome-wide expression data to interpret the biology underlying these associations and prioritize genes whose regulation differs in those with PTSD
Recommended from our members
NovoGraph: Human genome graph construction from multiple long-read de novo assemblies
Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped
Optimal Diffeomorphic Matching in Biomedical Image Processing
We consider optimal matching of submanifolds such as curves and surfaces by a variational approach based on Hilbert spaces of diffeomorphic transformations. In an abstract setting, the optimal matching is formulated as a minimization problem involving actions of diffeomorphisms on regular Borel measures considered as supporting measures of the reference and the target submanifolds. The objective functional consists of two parts measuring the elastic energy of the dynamically deformed surfaces and the quality of the matching. To make the problem computationally accessible, we use reproducing kernel Hilbert spaces with radial kernels and weighted sums of Dirac measures which gives rise to diffeomorphic point matching and amounts to the solution of a finite dimensional minimization problem. We present a matching algorithm based on the first order necessary optimality conditions which include an initial-value problem for a dynamical system in the trajectories describing the deformation of the surfaces and a final-time problem associated with the adjoint equations. The performance of the algorithm is illustrated by numerical results for examples from medical image analysis
Diffeomorphic Matching and Dynamic Deformable Surfaces in 3D Medical Imaging
We consider optimal matching of submanifolds such as curves and surfaces by a variational approach based on Hilbert spaces of diffeomorphic transformations. In an abstract setting, the optimal matching is formulated as a minimization problem involving actions of diffeomorphisms on regular Borel measures considered as supporting measures of the reference and the target submanifolds. The objective functional consists of two parts measuring the elastic energy of the dynamically deformed surfaces and the quality of the matching. To make the problem computationally accessible, we use reproducing kernel Hilbert spaces with radial kernels and weighted sums of Dirac measures which gives rise to diffeomorphic point matching and amounts to the solution of a finite dimensional minimization problem. We present a matching algorithm based on the first order necessary optimality conditions which include an initial-value problem for a dynamical system in the trajectories describing the deformation of the surfaces and a final-time problem associated with the adjoint equations. The performance of the algorithm is illustrated by numerical results for examples from medical image analysis
Recommended from our members
NovoGraph: Human genome graph construction from multiple long-read de novo assemblies.
Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped
Recommended from our members
Systems biology dissection of PTSD and MDD across brain regions, cell types, and blood
The molecular pathology of stress-related disorders remains elusive. Our brain multiregion, multiomic study of posttraumatic stress disorder (PTSD) and major depressive disorder (MDD) included the central nucleus of the amygdala, hippocampal dentate gyrus, and medial prefrontal cortex (mPFC). Genes and exons within the mPFC carried most disease signals replicated across two independent cohorts. Pathways pointed to immune function, neuronal and synaptic regulation, and stress hormones. Multiomic factor and gene network analyses provided the underlying genomic structure. Single nucleus RNA sequencing in dorsolateral PFC revealed dysregulated (stress-related) signals in neuronal and non-neuronal cell types. Analyses of brain-blood intersections in >50,000 UK Biobank participants were conducted along with fine-mapping of the results of PTSD and MDD genome-wide association studies to distinguish risk from disease processes. Our data suggest shared and distinct molecular pathology in both disorders and propose potential therapeutic targets and biomarkers