7 research outputs found

    Role of duplicate genes in determining the tissue-selectivity of hereditary diseases

    No full text
    <div><p>A longstanding puzzle in human genetics is what limits the clinical manifestation of hundreds of hereditary diseases to certain tissues, while their causal genes are expressed throughout the human body. A general conception is that tissue-selective disease phenotypes emerge when masking factors operate in unaffected tissues, but are specifically absent or insufficient in disease-manifesting tissues. Although this conception has critical impact on the understanding of disease manifestation, it was never challenged in a systematic manner across a variety of hereditary diseases and affected tissues. Here, we address this gap in our understanding via rigorous analysis of the susceptibility of over 30 tissues to 112 tissue-selective hereditary diseases. We focused on the roles of paralogs of causal genes, which are presumably capable of compensating for their aberration. We show for the first time at large-scale via quantitative analysis of omics datasets that, preferentially in the disease-manifesting tissues, paralogs are under-expressed relative to causal genes in more than half of the diseases. This was observed for several susceptible tissues and for causal genes with varying number of paralogs, suggesting that imbalanced expression of paralogs increases tissue susceptibility. While for many diseases this imbalance stemmed from up-regulation of the causal gene in the disease-manifesting tissue relative to other tissues, it was often combined with down-regulation of its paralog. Notably in roughly 20% of the cases, this imbalance stemmed only from significant down-regulation of the paralog. Thus, dosage relationships between paralogs appear as important, yet currently under-appreciated, modifiers of disease manifestation.</p></div

    The expression of causal genes and their paralogs is highly imbalanced in their disease tissues compared to unaffected tissues.

    No full text
    <p>(A) The ratio between the expression levels of causal genes and their paralogs across tissues. Each point represents a single pair; the ratio in the disease tissue appears in red, the median ratio in unaffected tissues appears in gray. Ratios in the disease tissue (DT) were significantly higher than in unaffected tissues (UAT) when all pairs were considered ('All pairs', Mann-Whitney test, p<10<sup>−15</sup>); when only causal genes with single paralogs were considered ('Single CGP', Mann-Whitney test, p = 5.8*10<sup>−3</sup>); and when causal genes with multiple paralogs were considered and ratio was computed against their combined expression ('Multiple CGP', Mann-Whitney test, p = 1.2*10<sup>−3</sup>). ** refers to p<10<sup>−3</sup> and *** to p<10<sup>−5</sup>. (B)—(E) The ratios between the expression levels of causal genes and their paralogs shown separately for causal genes sharing the same disease tissue. Each point represents the ratio observed in the disease tissue (red) and in an unaffected tissue (gray). The panels show genes causal for diseases manifesting in the brain (B), heart (C), muscle (D), and skin (E). In all panels, the median ratio is highest for pairs in their respective disease tissues. Moreover, except for brain, the difference in the distribution of ratios between disease tissue and unaffected tissues was statistically significant (brain p = 0.29, heart p = 3*10<sup>−4</sup>, muscle p = 2.45*10<sup>−5</sup>, and skin p = 7.6*10<sup>−6</sup>, Kolmogorov-Smirnov test). Additional disease tissues appear in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007327#pgen.1007327.s005" target="_blank">S2 Fig</a>. In contrast to causal genes and their paralogs, the general difference in expression patterns between paralogs is very small (<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007327#pgen.1007327.s008" target="_blank">S5 Fig</a>). (F) The median expression ratio of causal genes and their paralogs for different subsets of diseases. Each row in the balloon plot corresponds to the subset of diseases that manifest in the tissue designated on the left, and entries per row show the median expression ratio of the respective causal genes and their paralogs across the different tissues. The median expression ratios were normalized to the highest median expression ratio observed in that row (reflected by circle size and color, with maximum value of 1). For most disease subsets, the highest median expression ratios were obtained in the respective disease-manifesting tissue. For brevity, 26 tissues were shown; a view of all tissues appears in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007327#pgen.1007327.s006" target="_blank">S3 Fig</a>.</p

    Specific examples for imbalanced expression of causal genes and their paralogs occurring preferentially in the disease tissues.

    No full text
    <p>Each panel shows the expression levels of the causal gene (left), the paralog (middle) and their ratio (right) in every tissue. Colored points indicate the respective values in the disease tissue. In each example the paralog of the causal gene is under-expressed in the disease tissue. (A) CHMP1A and CHMP1B are two paralogous members of the CHMP family of proteins, with 55.6% sequence identity and significantly overlapping interactions. CHMP1A is causal for pontocerebellar hypoplasia type 8 that manifests in the brain, where CHMP1B is down-regulated (log<sub>2</sub>FC = -1, p = 2.47*10<sup>−4</sup>; FC denotes fold-change). (B) VRK1 and VRK2 are two paralogous members of the vaccinia-related kinase family of serine/threonine protein kinases, with 45% sequence identity and significant co-expression. VRK1 is causal for pontocerebellar hypoplasia type 1A that manifests in the brain, where VRK1 is not up-regulated (log<sub>2</sub>FC = -0.42, p = 4*10<sup>−3</sup>), but VRK2 is strongly down-regulated (log<sub>2</sub>FC = -2.5, p = 1.3*10<sup>−18</sup>). (C.) LAMA2 and LAMA1 are two paralogous laminin alpha subunits with 45% sequence identity. LAMA2 is causal for congenital muscular dystrophy that manifests in skeletal muscle. LAMA2 is over-expressed in muscle (log<sub>2</sub>FC = 0.93, p = 4.4*10<sup>−10</sup>), whereas LAMA1 is very lowly expressed (log<sub>2</sub>FC = -2.47, p = 2.86*10<sup>−9</sup>). (D) OPHN1 and ARHGAP42 are two paralogous Rho-GTPase-activating proteins with 51% sequence identity. OPHN1 is causal for X-linked mental retardation with cerebellar hypoplasia that manifests in the brain, where ARHGAP42 is strongly down-regulated (log<sub>2</sub>FC = -2.9, p = 1.04*10<sup>−12</sup>). (E.) LDLR and VLDLR are two paralogs that cause distinct tissue-selective diseases and are down-regulated at each other’s disease tissue. LDLR is causal for familial hypercholesterolemia that manifests in the liver (left, marked by an arrow), where it is expressed at intermediate level, while its paralog, VLDLR (48% sequence identity and significant overlap in protein-protein interactions), is down-regulated (log<sub>2</sub>FC = -2.5, p = 1.32*10<sup>−9</sup>). VLDLR, in turn, is causal for cerebellar hypoplasia and mental retardation. VLDLR is expressed at intermediate level in the cerebellum (right, marked by an arrow), where LDLR is significantly under expressed (log<sub>2</sub>FC = -0.86, p = 0.02).</p

    The functional redundancy between causal genes and their paralogs is dosage-sensitive.

    No full text
    <p>(A). A network model representing the functional redundancy between a causal gene and its paralog. In the healthy state (left), the causal gene (marked C) and its paralog (marked P) have redundant functions, represented as common interactors. In the aberrant state, the casual gene has limited functionality (dashed edges). In unaffected tissues (middle), the limited functionality is masked by the presence of its paralog. In the disease tissue (right), masking is reduced (thin edges) due to relatively low expression of the paralog, and the limited functionality is exposed. (B.) Limited masking in the disease tissue caused by over-expression of the causal gene. Germline mutations in CAV3 lead to muscular dystrophy. In muscle (red arrow) CAV3 is expressed at its highest level whereas its paralog, CAV1, is relatively lowly expressed. (C.) Limited masking in a disease tissue may arise from under-expression of the paralog. Germline mutations in VRK1 cause pontocerebellar hypoplasia. In the disease tissue (cerebellum, red arrow), VRK1 is expressed at an intermediate level, whereas its paralog, VRK2, is significantly under-expressed. Expression data were downloaded from GTEx portal [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007327#pgen.1007327.ref044" target="_blank">44</a>] (see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007327#sec007" target="_blank">Methods</a>). Gene expression levels were normalized to their maximum levels across tissues (26 tissues shown). Expression in disease tissues appears as bold color bars.</p

    Evidence for functional overlap between causal genes and their paralogs.

    No full text
    <p>(A.) An overview of the manually-curated dataset of 112 tissue-selective hereditary diseases (gray bars) and their causal genes (red bars) by their disease tissues. (B.) The distributions of causal genes (red bars) and paralogs (blue bars) by number of expressing tissues, showing that most genes were expressed in all tissues (to avoid over-representation of the brain only 36 tissues were considered). (C.) The distribution of causal genes according to the number of their paralogs. Most causal genes have up to two paralogs. (D.) The fraction of functionally-overlapping pairs among causal-gene and paralog (CGP) pairs with similar sequence identity levels. Evidence for functional overlap included significant co-expression across tissues (Pearson correlation p<0.01, gray bars), significant overlap in physical interaction partners (Fisher exact test p<0.01, black bars), or both (striped bars). The fraction of functionally redundant pairs increased with their sequence identity. (E.) The cumulative distributions of essential genes among causal genes (red) and protein-coding singleton genes (black) show that the causal genes in our dataset are significantly less essential than singleton genes (Kolmogorov-Smirnov test, p = 0.02). The X axis shows cellular growth in the presence of inactivating mutations, denoted CRISPR score, where negative values mark essential genes [<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007327#pgen.1007327.ref016" target="_blank">16</a>].</p

    Differential expression analysis reveals causes for imbalanced expression of causal genes and their paralogs in the disease tissue.

    No full text
    <p>(A) Scenarios leading to imbalanced expression of causal genes and paralogs in the disease tissue. From left to right: a reference state; significant up-regulation of the causal gene in the disease tissue relative to its average expression in other tissues; significant down-regulation of the paralog in the disease tissue relative to its average expression in other tissues; significant up- and down- regulation of the causal gene and its paralog, respectively. (B) The ratio between the expression levels of causal genes and their paralogs visualized according to the differential expression of the causal gene and its paralog. Each point represents the ratio of a specific pair in their disease tissue (DT) and unaffected tissues (UAT). Colors indicate whether, relative to its average expression in other tissues, in the disease tissue the causal gene was significantly over-expressed (red), the paralog was significantly under-expressed (blue), both co-occurred (purple), or none occurred (gray). In most pairs at least one pair mate was differentially expressed in the disease tissue. (C) The ratios between causal genes and their paralogs for genes causal for heart diseases. Data for other disease tissues appear in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007327#pgen.1007327.s007" target="_blank">S4 Fig</a>. (D) A visualization of the ratios between causal genes and their paralogs for causal genes with a single paralog (single CGP) or multiple paralogs (multiple CGP). For multiple CGPs, the ratio was computed against the combined expression of paralogs. Blue indicates that at least one paralog was significantly under expressed in the disease tissue relative to its average expression in other tissues. (E) The frequency of imbalanced scenarios among the 45 causal genes with a single paralog. 29% of the causal genes were up-regulated in their disease tissues relative to other tissues (red, p<10<sup>−3</sup>), 20% of the paralogs were down-regulated in the disease tissue relative to other tissues (blue, p<10<sup>−3</sup>), and 7% of the causal genes had both (purple, p = 0.05). (F) The frequency of imbalanced scenarios among the 93 pairs of causal genes and disease tissues in our dataset. In 26% of the pairs, causal genes were up-regulated in their disease tissues relative to other tissues (red, p<10<sup>−3</sup>); in 19% of the pairs, causal genes had a down-regulated paralog in the disease tissue relative to other tissues (blue, p = 0.024); and in 24% of the pairs, causal genes had both (purple, p<10<sup>−3</sup>). * refers to p<0.05, ** to p<10<sup>−3</sup> and *** to p<10<sup>−5</sup>.</p
    corecore