123 research outputs found
A framework for the detection of de novo mutations in family-based sequencing data
Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father's age at conception and the number of DNMs in female offspring's X chromosome, consistent with previous literature reports
Clonal Hematopoiesis and Blood-Cancer Risk Inferred from Blood DNA Sequence
Background
Cancers arise from multiple acquired mutations, which presumably occur over many years. Early stages in cancer development might be present years before cancers become clinically apparent.
Methods
We analyzed data from whole-exome sequencing of DNA in peripheral-blood cells from 12,380 persons, unselected for cancer or hematologic phenotypes. We identified somatic mutations on the basis of unusual allelic fractions. We used data from Swedish national patient registers to follow health outcomes for 2 to 7 years after DNA sampling.
Results
Clonal hematopoiesis with somatic mutations was observed in 10% of persons older than 65 years of age but in only 1% of those younger than 50 years of age. Detectable clonal expansions most frequently involved somatic mutations in three genes (DNMT3A, ASXL1, and TET2) that have previously been implicated in hematologic cancers. Clonal hematopoiesis was a strong risk factor for subsequent hematologic cancer (hazard ratio, 12.9; 95% confidence interval, 5.8 to 28.7). Approximately 42% of hematologic cancers in this cohort arose in persons who had clonality at the time of DNA sampling, more than 6 months before a first diagnosis of cancer. Analysis of bone marrow–biopsy specimens obtained from two patients at the time of diagnosis of acute myeloid leukemia revealed that their cancers arose from the earlier clones.
Conclusions
Clonal hematopoiesis with somatic mutations is readily detected by means of DNA sequencing, is increasingly common as people age, and is associated with increased risks of hematologic cancer and death. A subset of the genes that are mutated in patients with myeloid cancers is frequently mutated in apparently healthy persons; these mutations may represent characteristic early events in the development of hematologic cancers. (Funded by the National Human Genome Research Institute and others.)National Human Genome Research Institute (U.S.) (Grant U54 HG003067)National Human Genome Research Institute (U.S.) (Grant R01 HG006855)Stanley Center for Psychiatric ResearchAlexander and Margaret Stewart TrustNational Institute of Mental Health (U.S.) (Grant R01 MH 077139)National Institute of Mental Health (U.S.) (Grant RC2 MH089905)Sylvan C. Herman Foundatio
Tradeoff Between Stability and Multispecificity in the Design of Promiscuous Proteins
Natural proteins often partake in several highly specific protein-protein interactions. They are thus subject to multiple opposing forces during evolutionary selection. To be functional, such multispecific proteins need to be stable in complex with each interaction partner, and, at the same time, to maintain affinity toward all partners. How is this multispecificity acquired through natural evolution? To answer this compelling question, we study a prototypical multispecific protein, calmodulin (CaM), which has evolved to interact with hundreds of target proteins. Starting from high-resolution structures of sixteen CaM-target complexes, we employ state-of-the-art computational methods to predict a hundred CaM sequences best suited for interaction with each individual CaM target. Then, we design CaM sequences most compatible with each possible combination of two, three, and all sixteen targets simultaneously, producing almost 70,000 low energy CaM sequences. By comparing these sequences and their energies, we gain insight into how nature has managed to find the compromise between the need for favorable interaction energies and the need for multispecificity. We observe that designing for more partners simultaneously yields CaM sequences that better match natural sequence profiles, thus emphasizing the importance of such strategies in nature. Furthermore, we show that the CaM binding interface can be nicely partitioned into positions that are critical for the affinity of all CaM-target complexes and those that are molded to provide interaction specificity. We reveal several basic categories of sequence-level tradeoffs that enable the compromise necessary for the promiscuity of this protein. We also thoroughly quantify the tradeoff between interaction energetics and multispecificity and find that facilitating seemingly competing interactions requires only a small deviation from optimal energies. We conclude that multispecific proteins have been subjected to a rigorous optimization process that has fine-tuned their sequences for interactions with a precise set of targets, thus conferring their multiple cellular functions
Discovery and Statistical Genotyping of Copy-Number Variation from Whole-Exome Sequencing Depth
Sequencing of gene-coding regions (the exome) is increasingly used for studying human disease, for which copy-number variants (CNVs) are a critical genetic component. However, detecting copy number from exome sequencing is challenging because of the noncontiguous nature of the captured exons. This is compounded by the complex relationship between read depth and copy number; this results from biases in targeted genomic hybridization, sequence factors such as GC content, and batching of samples during collection and sequencing. We present a statistical tool (exome hidden Markov model [XHMM]) that uses principal-component analysis (PCA) to normalize exome read depth and a hidden Markov model (HMM) to discover exon-resolution CNV and genotype variation across samples. We evaluate performance on 90 schizophrenia trios and 1,017 case-control samples. XHMM detects a median of two rare
Arc requires PSD95 for assembly into postsynaptic complexes involved with neural dysfunction and intelligence
Arc is an activity-regulated neuronal protein, but little is known about its interactions, assembly into multiprotein complexes, and role in human disease and cognition. We applied an integrated proteomic and genetic strategy by targeting a tandem affinity purification (TAP) tag and Venus fluorescent protein into the endogenous Arc gene in mice. This allowed biochemical and proteomic characterization of native complexes in wild-type and knockout mice. We identified many Arc-interacting proteins, of which PSD95 was the most abundant. PSD95 was essential for Arc assembly into 1.5-MDa complexes and activity-dependent recruitment to excitatory synapses. Integrating human genetic data with proteomic data showed that Arc-PSD95 complexes are enriched in schizophrenia, intellectual disability, autism, and epilepsy mutations and normal variants in intelligence. We propose that Arc-PSD95 postsynaptic complexes potentially affect human cognitive function
zCall: a rare variant caller for array-based genotyping
Summary: zCall is a variant caller specifically designed for calling rare single-nucleotide polymorphisms from array-based technology. This caller is implemented as a post-processing step after a default calling algorithm has been applied. The algorithm uses the intensity profile of the common allele homozygote cluster to define the location of the other two genotype clusters. We demonstrate improved detection of rare alleles when applying zCall to samples that have both Illumina Infinium HumanExome BeadChip and exome sequencing data available
Characterization of single gene copy number variants in schizophrenia
Background
Genetic studies of schizophrenia have implicated numerous risk loci including several copy number variants (CNVs) of large effect and hundreds of loci of small effect. In only a few cases has a specific gene been clearly identified. Rare CNVs affecting a single gene offer a potential avenue to discovering schizophrenia risk genes.
Methods
CNVs were generated from exome-sequencing of 4,913 schizophrenia cases and 6,188 controls from Sweden. We integrated multiple CNV calling methods (XHMM and ExomeDepth) to expand our set of single-gene CNVs and leveraged two different approaches for validating these variants (qPCR and Nanostring).
Results
We found a significant excess of all rare CNVs (deletions p=0.0004, duplications p=0.0006) and single-gene CNVs (deletions p=0.04, duplications p=0.03) in schizophrenia cases compared to controls. An expanded set of CNVs generated from integrating multiple approaches showed a significant burden of deletions in 11/21 gene-sets previously implicated in schizophrenia and across all genes in those sets (p=0.008), although no tests survived correction. We performed an extensive validation of all deletions in the significant set of voltage-gated calcium channels among CNVs called from both exome-sequencing and genotyping arrays. In total, 4 exonic, single-gene deletions validated in cases and none in controls (p=0.039), of which all were identified by exome-sequencing.
Conclusions
These results point to the potential contribution of single-gene CNVs to schizophrenia, that the utility of exome-sequencing for CNV calling has yet to be maximized and single-gene CNVs should be included in gene focused studies using other classes of variation
A framework for the detection of de novo mutations in family-based sequencing data
Francioli LC, Cretu-Stancu M, Garimella KV, et al. A framework for the detection of de novo mutations in family-based sequencing data. European Journal of Human Genetics. 2016;25(2):227-233
Recommended from our members
Genome-wide Association Analysis Identifies 14 New Risk Loci for Schizophrenia
Schizophrenia is a heritable disorder with substantial public health impact. We conducted a multi-stage genome-wide association study (GWAS) for schizophrenia beginning with a Swedish national sample (5,001 cases, 6,243 controls) followed by meta-analysis with prior schizophrenia GWAS (8,832 cases, 12,067 controls) and finally by replication of SNPs in 168 genomic regions in independent samples (7,413 cases, 19,762 controls, and 581 trios). In total, 22 regions met genome-wide significance (14 novel and one previously implicated in bipolar disorder). The results strongly implicate calcium signaling in the etiology of schizophrenia, and include genome-wide significant results for CACNA1C and CACNB2 whose protein products interact. We estimate that ∼8,300 independent and predominantly common SNPs contribute to risk for schizophrenia and that these collectively account for most of its heritability. Common genetic variation plays an important role in the etiology of schizophrenia, and larger studies will allow more detailed understanding of this devastating disorder
The AURORA Study: A Longitudinal, Multimodal Library of Brain Biology and Function after Traumatic Stress Exposure
Adverse posttraumatic neuropsychiatric sequelae (APNS) are common among civilian trauma survivors and military veterans. These APNS, as traditionally classified, include posttraumatic stress, postconcussion syndrome, depression, and regional or widespread pain. Traditional classifications have come to hamper scientific progress because they artificially fragment APNS into siloed, syndromic diagnoses unmoored to discrete components of brain functioning and studied in isolation. These limitations in classification and ontology slow the discovery of pathophysiologic mechanisms, biobehavioral markers, risk prediction tools, and preventive/treatment interventions. Progress in overcoming these limitations has been challenging because such progress would require studies that both evaluate a broad spectrum of posttraumatic sequelae (to overcome fragmentation) and also perform in-depth biobehavioral evaluation (to index sequelae to domains of brain function). This article summarizes the methods of the Advancing Understanding of RecOvery afteR traumA (AURORA) Study. AURORA conducts a large-scale (n = 5000 target sample) in-depth assessment of APNS development using a state-of-the-art battery of self-report, neurocognitive, physiologic, digital phenotyping, psychophysical, neuroimaging, and genomic assessments, beginning in the early aftermath of trauma and continuing for 1 year. The goals of AURORA are to achieve improved phenotypes, prediction tools, and understanding of molecular mechanisms to inform the future development and testing of preventive and treatment interventions
- …