23 research outputs found

    Comparing protein-coding and noncoding genic intolerance scores.

    No full text
    <p>To enable a matched comparison, the estimates in this table are based on a set of 14,567 CCDS genes with assessable scores across RVIS-CHGV, ncRVIS and ncGERP formulations. Both RVIS-CHGV and ncRVIS are based on the same population of 690 whole-genome sequenced samples from the CHGV.</p><p><sup>a</sup>HI = Haploinsufficiency. To obtain the presented levels of significance, we used a logistic regression model to regress the presence or absence of a gene within the corresponding gene list on each of the genic scores.</p><p>Joint Model: The AUC of a combined logistic regression model that uses all three features. Correlation plots for the pairs of scores are available in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005492#pgen.1005492.s001" target="_blank">S1 Fig</a>.</p><p>Comparing protein-coding and noncoding genic intolerance scores.</p

    Receiver operating characteristic (ROC) curves to measure the ability of RVIS-CHGV, ncRVIS, pcGERP, ncGERP, ncCADD, ncGWAVA scores and two joint models to discriminate genes reported among ClinGen’s dosage sensitivity map from the rest of the human genome.

    No full text
    <p>Here, for a given score, all assessable genes were used. To obtain the presented levels of significance, we use a logistic regression model to regress the presence or absence of a gene among the ClinGen dosage sensitivity map list on each of the genic scores.</p

    Overlaid histograms of ncGERP (blue) and pcGERP (red).

    No full text
    <p>These data show that the two form very different genome-wide distributions (medians: ncGERP -0.02 versus pcGERP 2.64). Moreover, pcGERP tends to present with a slightly platykurtic, left-skewed distribution (Îł<sub>2</sub> = -0.10, Îł<sub>1</sub> = -0.66) compared to ncGERP, which reflects a more leptokurtic, right-skewed distribution (Îł<sub>2</sub> = 0.97, Îł<sub>1</sub> = 0.96).</p

    A regression plot that shows the regression of noncoding polymorphisms (Y) on an estimate of the noncoding sequence mutability (X) (S1 Data).

    No full text
    <p>Each dot represents the position of a gene in the regression plot and the corresponding regression line is provided. Annotations are made for the 5% extremes: red = 5% most intolerant, blue = 5% most tolerant.</p

    Recovery of Unknown TE

    No full text
    <p>Although not found in Repbase, we believe that this ReAS TE is a valid reconstruction, because it has a BlastX match with identity 98% over 869 amino acids to a TE-related protein (gi|34896386|ref|NP_909537.1| Putative mutator like transposase) that is annotated in a GenBank clone.</p

    TEs within Segmental Duplications

    No full text
    <p>If the duplication is of sufficiently high copy number, it will be assembled as a “ReAS TE,” and what we need to do afterwards is find the boundaries of the TEs within this assembled duplication. On the assumption that TEs have much higher copy numbers, TE boundaries can be identified by sudden changes in depth, accompanied by many partially aligned reads.</p

    The ReAS Algorithm

    No full text
    <p>We start by computing <i>K-</i>mer depth, which is the number of times that a <i>K-</i>mer appears in the shotgun data. Copy number refers to how often a <i>K-</i>mer appears in the assembled genome. Depth divided by copy number is the coverage. We seed the process using a randomly chosen high-depth <i>K-</i>mer. All shotgun reads containing this <i>K-</i>mer are retrieved and trimmed into 100-bp segments centered at that <i>K-</i>mer. When the sequence identity between them exceeds a preset threshold, they are assembled into an ICS using ClustalW. We perform an iterative extension by selecting high-depth <i>K-</i>mers at both ends of the ICS and repeating the above procedure. After all such extensions are done, clone-end pairing information is used to resolve ambiguous joins and to break misassemblies, but not to join fragmented assemblies. The final consensus is our ReAS TE.</p

    Fragmentation due to Low <i>K-</i>mer Depth

    No full text
    <p>SZ-43LTR is the LTR region from a TE that is found as one piece in Repbase, but is recovered by ReAS as two nonoverlapping pieces, with 98% and 97% nucleotide identity to the Repbase entry.</p
    corecore