Search CORE

39 research outputs found

Additional file 2: Table S1. of Regulatory and evolutionary signatures of sex-biased genes on both the X chromosome and the autosomes

Author: Jiangshan Shen (4569442)
Ting-You Wang (4569445)
Wanling Yang (50598)
Publication venue
Publication date
Field of study

Linear regression of dN/dS, or the amount of selection pressure, as a function of the following variables. Estimate refers to the coefficient of the covariate, Std.Error referes to the standard error of that estimate. T value refers to the test statistic of the estimate and Pr(>|t|) refer to the P value of that covariate. XYpair refers to whether the gene is part of an XY homologous pair. XAR or XCR refers to whether the gene belongs on the X added region on the X chromosome or the X conserved region strata, escape status refers to whether the gene is classified as an escape gene, and gene expression refers to the average level of gene expression among 264 female samples. Table S2. Linear regression of dN/dS, or the amount of selection pressure, as a function of the following variables. Estimate refers to the coefficient of the covariate, Std.Error referes to the standard error of that estimate. T value refers to the test statistic of the estimate and Pr(>|t|) refer to the p value of that covariate. Average gene expression refers to average level of gene expression among 462 samples. Gene bias_female refers to whether the gene is classified as female-biased, and the coefficient refers to the change in dnds between female-biased genes to genes without sex bias. Gene bias_male refers similarly to genes classified as male-biased. Gene expression breadth refers to the number of tissues the gene is expressed in. Table S3. Sex-biased genes (sDEG) (587 genes) from LCL data and GTex data (1308) and their relationship to replication timing data in different cell lines, based on the Spearman’s Rho between replication timing values and log2fc of gene expression between females and males. Table S4(a). GSEA results for gene regions that are enriched in female biased sDEGs. Table S4(b). ToppFun results for gene regions that are enriched in male-biased sDEGs. Table S5(a). Disease gene sets enriched in sex biased genes, as found by ToppFun. Table S5(b). GO terms enriched in sex biased genes, as found by ToppFun. Table S5 (c). Pubmed gene sets enriched in sex biased genes, as found by ToppFun. Table S5 (d). Pathway analysis of sex biased genes using gene sets from MSigDBC2, as found by ToppFun. Table S5 (e). Gene families enriched in sex biased genes, as found by ToppFun. Table S6 (a). Domains enriched in female biased genes, as found by ToppFun. Table S6 (b). KEGG pathways enriched in female biased genes, as found by GSEA. Table S7. Pathway analysis of sex biased genes in GSEA, using custom gene sets from [26–28, 43]. Table S8 (a). Gene families enriched in male biased genes. Table S8 (b). MSigDB gene sets enriched in male biased genes, as found by ToppFUn. Table S9. Pathways enriched in transcription factors enriched for female biased genes. (DOCX 49 kb

The Francis Crick Institute

Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies

Author: Johnny S. H. Kwan (101970)
Miao-Xin Li (101964)
Pak C. Sham (58020)
Shu-Leong Ho (101984)
Su-Ying Bao (101975)
Wanling Yang (50598)
Yong-Qiang Song (101991)
Publication venue
Publication date: 01/01/2013
Field of study

<div>Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ∼22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases. </div

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Mann–Whitney U test p values for the difference in prediction scores between autosomal dominant and autosomal recessive disease-causing mutations.

Author: Johnny S. H. Kwan (101970)
Miao-Xin Li (101964)
Pak C. Sham (58020)
Shu-Leong Ho (101984)
Su-Ying Bao (101975)
Wanling Yang (50598)
Yong-Qiang Song (101991)
Publication venue
Publication date
Field of study

Mann–Whitney U test p values for the difference in prediction scores between autosomal dominant and autosomal recessive disease-causing mutations.</p

The Francis Crick Institute

ROC and PR curves of prediction methods evaluated on the ExoVar dataset using a 10-fold cross-validation.

Author: Johnny S. H. Kwan (101970)
Miao-Xin Li (101964)
Pak C. Sham (58020)
Shu-Leong Ho (101984)
Su-Ying Bao (101975)
Wanling Yang (50598)
Yong-Qiang Song (101991)
Publication venue
Publication date
Field of study

(a) ROC and (b) PR. AUC is shown next to the name of each method.</p

The Francis Crick Institute

ROC and PR curves of combining a subset of the five individual methods in a logit model evaluated on the ExoVar dataset using a 10-fold cross-validation.

Author: Johnny S. H. Kwan (101970)
Miao-Xin Li (101964)
Pak C. Sham (58020)
Shu-Leong Ho (101984)
Su-Ying Bao (101975)
Wanling Yang (50598)
Yong-Qiang Song (101991)
Publication venue
Publication date
Field of study

(a) ROC and (b) PR. AUC is shown next to the name of each method.</p

The Francis Crick Institute

The relationship between prior and posterior probabilities of a rare nsSNV being pathogenic, given the prediction scores from SIFT, PolyPhen2, and MutationTaster.

Author: Johnny S. H. Kwan (101970)
Miao-Xin Li (101964)
Pak C. Sham (58020)
Shu-Leong Ho (101984)
Su-Ying Bao (101975)
Wanling Yang (50598)
Yong-Qiang Song (101991)
Publication venue
Publication date
Field of study

The white dashed lines indicate the estimated range of the prior (5%). We assume that there is no difference in prediction scores from the three methods for the same variant. The α, βSIFT, βPolyphen2 and βMutationTaster in a selected sample evaluated in the ExoVar dataset are used in the calculation of posteriors (See <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003143#pgen.1003143.e002" target="_blank">Eq. 2</a> and <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003143#pgen.1003143.e003" target="_blank">3</a> in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003143#s4" target="_blank">Materials and Methods</a>) and take the values of −3.53, 1.64, 1.48, and 2.47 respectively. The prior and posterior are equivalent to the quantity Pdisease in an individual genome in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003143#pgen.1003143.e003" target="_blank">Eq. 3</a> and P(Y = 1|X) in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003143#pgen.1003143.e002" target="_blank">Eq. 2</a> respectively.</p

The Francis Crick Institute

The numbers and proportion of nsSNVs removed by hard-filtering and functional prediction by the logit model in 3 Mendelian-disease patients with in-house exome sequencing data.

Author: Johnny S. H. Kwan (101970)
Miao-Xin Li (101964)
Pak C. Sham (58020)
Shu-Leong Ho (101984)
Su-Ying Bao (101975)
Wanling Yang (50598)
Yong-Qiang Song (101991)
Publication venue
Publication date
Field of study

aRelated cases with autosomal dominant spinocerebellar ataxia.bCase with neonatal-onset Crohn's disease.cnsSNVs in which prediction is unavailable due to missing scores.</p

The Francis Crick Institute

Theoretical inbreeding coefficient (F) and corresponding number of homozygous pathogenic variants in the children of various relationships, given that on average each individual carries 22 pathogenic derived alleles.

Author: Johnny S. H. Kwan (101970)
Miao-Xin Li (101964)
Pak C. Sham (58020)
Shu-Leong Ho (101984)
Su-Ying Bao (101975)
Wanling Yang (50598)
Yong-Qiang Song (101991)
Publication venue
Publication date
Field of study

Theoretical inbreeding coefficient (F) and corresponding number of homozygous pathogenic variants in the children of various relationships, given that on average each individual carries 22 pathogenic derived alleles.</p

The Francis Crick Institute

ROC and PR curves of prediction methods evaluated on the DomRec dataset using a 3-fold cross-validation.

Author: Johnny S. H. Kwan (101970)
Miao-Xin Li (101964)
Pak C. Sham (58020)
Shu-Leong Ho (101984)
Su-Ying Bao (101975)
Wanling Yang (50598)
Yong-Qiang Song (101991)
Publication venue
Publication date
Field of study

(a) ROC and (b) PR. AUC is shown next to the name of each method.</p

The Francis Crick Institute

ROC and PR curves of prediction methods evaluated on the HumVar dataset using a 10-fold cross-validation.

Author: Johnny S. H. Kwan (101970)
Miao-Xin Li (101964)
Pak C. Sham (58020)
Shu-Leong Ho (101984)
Su-Ying Bao (101975)
Wanling Yang (50598)
Yong-Qiang Song (101991)
Publication venue
Publication date
Field of study

(a) ROC and (b) PR. AUC is shown next to the name of each method.</p

The Francis Crick Institute

Additional file 2: Table S1. of Regulatory and evolutionary signatures of sex-biased genes on both the X chromosome and the autosomes

Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies

Mann–Whitney <i>U</i> test <i>p</i> values for the difference in prediction scores between autosomal dominant and autosomal recessive disease-causing mutations.

ROC and PR curves of prediction methods evaluated on the ExoVar dataset using a 10-fold cross-validation.

ROC and PR curves of combining a subset of the five individual methods in a logit model evaluated on the ExoVar dataset using a 10-fold cross-validation.

The relationship between prior and posterior probabilities of a rare nsSNV being pathogenic, given the prediction scores from SIFT, PolyPhen2, and MutationTaster.

The numbers and proportion of nsSNVs removed by hard-filtering and functional prediction by the logit model in 3 Mendelian-disease patients with in-house exome sequencing data.

Theoretical inbreeding coefficient (<i>F</i>) and corresponding number of homozygous pathogenic variants in the children of various relationships, given that on average each individual carries 22 pathogenic derived alleles.

ROC and PR curves of prediction methods evaluated on the DomRec dataset using a 3-fold cross-validation.

ROC and PR curves of prediction methods evaluated on the HumVar dataset using a 10-fold cross-validation.