20 research outputs found
Clustering of genes hit by <i>de novo</i> nonsynonymous substitutions.
<p>(A) We have examined the network properties of whole sets of genes with nonsynonymous mutations implicated by recent exome-sequencing studies in autism (ASD), severe intellectual disability (ID), epilepsy or schizophrenia (S). We calculated the sum of link weights among genes from a set and compared this sum to that calculated for randomized gene sets in order to assess the degree of functional clustering. (B and C) The implicated genes are significantly more strongly interconnected with each other by means of functional genomics data than random gene sets of the same size, but controlling for coding sequence (CDS) length considerably affects the p-values. The genes mutated in the same disease cluster most significantly in the integrated phenotypic-linkage network, while genes mutated in healthy controls do not cluster.</p
Processing and comparison of functional genomics data.
<p>(A) Terms in a phenotype ontology have an information content (IC) which is inversely proportional to the number of genes annotated with them. The semantic similarity between any two terms equals to the IC of their closest common ancestor term(s). (B) Gene–gene linkages derived from a data type are assessed and rescored according to the semantic similarity of the linked genes' mouse phenotype annotations. (C) The similarity in human phenotype annotations from the HPO is a benchmark on which all the data types can be compared, revealing their relative accuracy and coverage.</p
Coding sequence (CDS) lengths of genes with <i>de novo</i> variants.
<p>(A) ‘All genes’ denotes all translated human genes, ‘Siblings’ denotes genes with <i>de novo</i> mutations in non-autistic siblings of ASD cases published by O'Roak <i>et al.</i> and Sanders <i>et al.</i> Even the genes mutated in the healthy siblings are significantly longer than all coding genes (Mann–Whitney U test, P<2×10<sup>−16</sup>). The box plots depict the values between the 1<sup>st</sup> and 3<sup>rd</sup> quartile of a distribution, the 2<sup>nd</sup> quartile (thick band) represents the median. (B) Mutational burden strongly correlates with coding sequence length in the Exome Variant Server (Spearman's ρ = 0.710, P<2×10<sup>−16</sup>; <a href="http://evs.gs.washington.edu/EVS" target="_blank">http://evs.gs.washington.edu/EVS</a>). All nonsynonymous mutations were considered across all human chromosomes. (C) The median CDS length of a gene's connections correlates with its CDS length (Spearman's ρ = 0.508, P<2×10<sup>−16</sup>). We considered the strongest 100,000 links from the integrated phenotypic-linkage network.</p
Model-based inference of turnover by functional class.
<p>Schematic summary of the fraction of constrained sequence that has been retained (saturated colours) or turned over (pastel colours) in the human lineage over time (X-axis, divergence time) and how it has been distributed across various categories of functional element. In addition to showing the reduced quantity of preserved constrained sequence with increasing divergence, we infer the reciprocal quantity of sequence that is assumed to have been gained over human lineage evolution. For consistency this approach requires mutually exclusive annotation sets, in contrast to those used in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004525#pgen-1004525-g003" target="_blank">Figure 3</a>, making the results not directly comparable. Overlaps between the major different annotations are shown in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004525#pgen.1004525.s010" target="_blank">Figure S10</a>.</p
Estimated quantities of sequence constrained with respect to indels (α<sub>selIndel</sub>) between different species under different models.
<p>There is good agreement between the estimates inferred by NIM1 and NIM2, but previous estimates of <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004525#pgen.1004525-Meader1" target="_blank">[15]</a> are considerably higher, mainly owing to alignment artefacts.</p
The overlap of constrained sequence with pan-mammalian conserved sequences.
<p>The proportions A., and quantities B., of constrained sequence at the present for different types of biochemically annotated and un-annotated sequences, with and without PhastCons or GERP++ conserved elements, estimated using linear extrapolations (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004525#pgen.1004525.s023" target="_blank">Text S6</a>, <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004525#pgen.1004525.s024" target="_blank">Text S7</a>). The NIM1 has power to detect functional lineage-specific constrained sequence: NIM1 detects significantly higher fractions of linage-specific constrained sequence (defined as sequence identified by NIM1 but not annotated by PhastCons or GERP++ as being conserved across mammals) within 3 mutually exclusive classes of ENCODE biochemical annotations compared to sequence lacking such annotation; see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1004525#pgen.1004525.s023" target="_blank">Text S6</a> for details.</p
8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage
<div><p>Ten years on from the finishing of the human reference genome sequence, it remains unclear what fraction of the human genome confers function, where this sequence resides, and how much is shared with other mammalian species. When addressing these questions, functional sequence has often been equated with pan-mammalian conserved sequence. However, functional elements that are short-lived, including those contributing to species-specific biology, will not leave a footprint of long-lasting negative selection. Here, we address these issues by identifying and characterising sequence that has been constrained with respect to insertions and deletions for pairs of eutherian genomes over a range of divergences. Within noncoding sequence, we find increasing amounts of mutually constrained sequence as species pairs become more closely related, indicating that noncoding constrained sequence turns over rapidly. We estimate that half of present-day noncoding constrained sequence has been gained or lost in approximately the last 130 million years (half-life in units of divergence time, <i>d<sub>1/2</sub></i> = 0.25–0.31). While enriched with ENCODE biochemical annotations, much of the short-lived constrained sequences we identify are not detected by models optimized for wider pan-mammalian conservation. Constrained DNase 1 hypersensitivity sites, promoters and untranslated regions have been more evolutionarily stable than long noncoding RNA loci which have turned over especially rapidly. By contrast, protein coding sequence has been highly stable, with an estimated half-life of over a billion years (<i>d<sub>1/2</sub></i> = 2.1–5.0). From extrapolations we estimate that 8.2% (7.1–9.2%) of the human genome is presently subject to negative selection and thus is likely to be functional, while only 2.2% has maintained constraint in both human and mouse since these species diverged. These results reveal that the evolutionary history of the human genome has been highly dynamic, particularly for its noncoding yet biologically functional fraction.</p></div
Evolutionary turnover of constrained sequence.
<p>A. Quantity of constrained sequence (α<sub>selIndel</sub>) estimated by NIM1 (blue bars) and NIM2 (red bars) plotted against ancestral repeat divergence for different pairs of eutherian species genomes, with the simulated data (grey) shown under a non-turnover scenario. B. Coding sequence (blue squares) is seen to be broadly conserved, while constrained noncoding sequence (orange circles) shows a strong negative correlation between α<sub>selIndel</sub> and divergence, indicating rapid turnover.</p
Constraint and turnover for different classes of human functional element.
<p>A. The total quantities of constrained sequence estimated for the present day by extrapolation for different element types. B. The estimated rate of turnover (b parameter) for different types of constrained element.</p
Regulatory GO enrichments amongst ASD <i>dn</i> CNVs candidate genes.
<p>The set of candidate genes was defined as those CNV genes associated with <i>Abnormal Synaptic Transmission</i> mouse phenotypes (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003523#pgen-1003523-t001" target="_blank">Table 1</a>) and those CNV genes identified through direct protein-protein interactions (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003523#pgen-1003523-t002" target="_blank">Table 2</a>). Enrichments are given as the fold change over that expected by chance (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003523#s4" target="_blank">Materials and Methods</a>).</p