12 research outputs found

    Identification and functional modelling of plausibly causative cis-regulatory variants in a highly-selected cohort with X-linked intellectual disability.

    Get PDF
    Identifying causative variants in cis-regulatory elements (CRE) in neurodevelopmental disorders has proven challenging. We have used in vivo functional analyses to categorize rigorously filtered CRE variants in a clinical cohort that is plausibly enriched for causative CRE mutations: 48 unrelated males with a family history consistent with X-linked intellectual disability (XLID) in whom no detectable cause could be identified in the coding regions of the X chromosome (chrX). Targeted sequencing of all chrX CRE identified six rare variants in five affected individuals that altered conserved bases in CRE targeting known XLID genes and segregated appropriately in families. Two of these variants, FMR1CRE and TENM1CRE, showed consistent site- and stage-specific differences of enhancer function in the developing zebrafish brain using dual-color fluorescent reporter assay. Mouse models were created for both variants. In male mice Fmr1CRE induced alterations in neurodevelopmental Fmr1 expression, olfactory behavior and neurophysiological indicators of FMRP function. The absence of another likely causative variant on whole genome sequencing further supported FMR1CRE as the likely basis of the XLID in this family. Tenm1CRE mice showed no phenotypic anomalies. Following the release of gnomAD 2.1, reanalysis showed that TENM1CRE exceeded the maximum plausible population frequency of a XLID causative allele. Assigning causative status to any ultra-rare CRE variant remains problematic and requires disease-relevant in vivo functional data from multiple sources. The sequential and bespoke nature of such analyses renders them time-consuming and challenging to scale for routine clinical use

    Annotation et hiérarchisation de variants non-codants dans le contexte de maladies humaines

    No full text
    Whole genome sequencing is increasingly used in patients with genetic diseases to diagnose the mutations responsible for the phenotype. However, for a large proportion of sequenced genomes, none of the genes associated with the phenotype have a coding mutation. In these cases, it is possible that a non-coding mutation, located in a cis-regulatory region, modifies the expression of a gene involved in the disease. Despite the existence of methods for annotating and predicting regulatory sequences on the basis of biochemical and epigenetic properties, it remains difficult to define objective criteria for effectively selecting candidate mutations from the millions of non-coding variants present in each patient. In addition, the target genes of these regulatory sequences are generally not known, making it difficult to associate a non-coding mutation with the patient's phenotype.I propose here a supervised learning strategy using random forests, adapted to complex and heterogeneous data sets, to classify and select non-coding mutations deregulating genes responsible for diseases. A notable innovation of my approach is to take into account data of associations between non-coding regions and target genes. In addition, I propose a method for extracting the biological rules identified by the model for each mutation evaluated, allowing an informed selection of candidate mutations.I discuss the functional properties identified by the learning model, based on examples of non-coding variations associated with Mendelian diseases. I also illustrate the potential of this method by analyzing 255,106 de novo variants identified by complete sequencing in 1,902 children with autism spectrum disorders, in whom no pathogenic coding mutations have been identified. This method thus makes it possible to prioritize mutations, the most promising of which become experimentally testable hypotheses to confirm their involvement in the development of the diseases in question. Thus, for whole genome sequencing projects of patient cohorts, a systematic application of our method would contribute to a better understanding of the mechanisms regulating gene expression, and to an improvement in patient diagnosis.Le sĂ©quençage de gĂ©nome complet est utilisĂ© de façon croissante chez les patients atteints de maladies gĂ©nĂ©tiques pour diagnostiquer les mutations responsables. Cependant, pour une grande proportion de gĂ©nomes de patients sĂ©quencĂ©s, aucun gĂšne associĂ© au phĂ©notype ne prĂ©sente de mutation codante. Dans ces cas, il est possible qu’une mutation non-codante, localisĂ©e dans une rĂ©gion cis-rĂ©gulatrice, modifie l'expression d'un gĂšne impliquĂ© dans la maladie. MalgrĂ© l’existence de mĂ©thodes pour annoter et prĂ©dire des sĂ©quences rĂ©gulatrices sur la base de propriĂ©tĂ©s biochimiques et Ă©pigĂ©nĂ©tiques, il reste difficile de dĂ©finir des critĂšres objectifs pour sĂ©lectionner efficacement des mutations candidates parmi les millions de variants non-codants prĂ©sents chez chaque patient. De plus, les gĂšnes cibles de ces sĂ©quences de rĂ©gulation ne sont gĂ©nĂ©ralement pas connus, si bien qu’il est difficile de croiser une mutation non-codante avec le phĂ©notype du patient. Je propose ici une stratĂ©gie d’apprentissage supervisĂ© par forĂȘts alĂ©atoires, adaptĂ©e aux jeux de donnĂ©es complexes et hĂ©tĂ©rogĂšnes, pour classer et sĂ©lectionner des mutations non-codantes dĂ©rĂ©gulant des gĂšnes responsables de maladies. Une innovation notable de mon approche est de prendre en compte des donnĂ©es d’associations entre rĂ©gions non-codantes et gĂšnes cibles. Par ailleurs, je propose une mĂ©thode d’extraction des rĂšgles biologiques identifiĂ©es par le modĂšle pour chaque mutation Ă©valuĂ©e, ce qui permet une sĂ©lection Ă©clairĂ©e des mutations hiĂ©rarchisĂ©es. Je discute les propriĂ©tĂ©s fonctionnelles identifiĂ©es par le modĂšle d'apprentissage, Ă  partir d'exemples de variations non-codantes associĂ©es Ă  des maladies mendĂ©liennes. J’illustre Ă©galement le potentiel de cette mĂ©thode notamment par une analyse de 255 106 variants de novo identifiĂ©s par sĂ©quençage complet chez 1902 enfants souffrant de troubles du spectre autistique, et chez lesquels aucune mutation codante pathogĂ©nique n’a Ă©tĂ© identifiĂ©e. Cette mĂ©thode permet ainsi de hiĂ©rarchiser des mutations, dont les plus prometteuses deviennent des hypothĂšses testables expĂ©rimentalement pour confirmer leur implication dans le dĂ©veloppement des maladies considĂ©rĂ©es. Ainsi pour les projets de sĂ©quençage gĂ©nome-complets de cohortes de patients, une application systĂ©matique de notre mĂ©thode contribuerait Ă  une meilleure comprĂ©hension des mĂ©canismes de rĂ©gulation de l’expression des gĂšnes, et Ă  une amĂ©lioration du diagnostic des patients

    Annotation and prioritization of non-coding variants in the context of human diseases

    No full text
    Le sĂ©quençage de gĂ©nome complet est utilisĂ© de façon croissante chez les patients atteints de maladies gĂ©nĂ©tiques pour diagnostiquer les mutations responsables. Cependant, pour une grande proportion de gĂ©nomes de patients sĂ©quencĂ©s, aucun gĂšne associĂ© au phĂ©notype ne prĂ©sente de mutation codante. Dans ces cas, il est possible qu’une mutation non-codante, localisĂ©e dans une rĂ©gion cis-rĂ©gulatrice, modifie l'expression d'un gĂšne impliquĂ© dans la maladie. MalgrĂ© l’existence de mĂ©thodes pour annoter et prĂ©dire des sĂ©quences rĂ©gulatrices sur la base de propriĂ©tĂ©s biochimiques et Ă©pigĂ©nĂ©tiques, il reste difficile de dĂ©finir des critĂšres objectifs pour sĂ©lectionner efficacement des mutations candidates parmi les millions de variants non-codants prĂ©sents chez chaque patient. De plus, les gĂšnes cibles de ces sĂ©quences de rĂ©gulation ne sont gĂ©nĂ©ralement pas connus, si bien qu’il est difficile de croiser une mutation non-codante avec le phĂ©notype du patient. Je propose ici une stratĂ©gie d’apprentissage supervisĂ© par forĂȘts alĂ©atoires, adaptĂ©e aux jeux de donnĂ©es complexes et hĂ©tĂ©rogĂšnes, pour classer et sĂ©lectionner des mutations non-codantes dĂ©rĂ©gulant des gĂšnes responsables de maladies. Une innovation notable de mon approche est de prendre en compte des donnĂ©es d’associations entre rĂ©gions non-codantes et gĂšnes cibles. Par ailleurs, je propose une mĂ©thode d’extraction des rĂšgles biologiques identifiĂ©es par le modĂšle pour chaque mutation Ă©valuĂ©e, ce qui permet une sĂ©lection Ă©clairĂ©e des mutations hiĂ©rarchisĂ©es. Je discute les propriĂ©tĂ©s fonctionnelles identifiĂ©es par le modĂšle d'apprentissage, Ă  partir d'exemples de variations non-codantes associĂ©es Ă  des maladies mendĂ©liennes. J’illustre Ă©galement le potentiel de cette mĂ©thode notamment par une analyse de 255 106 variants de novo identifiĂ©s par sĂ©quençage complet chez 1902 enfants souffrant de troubles du spectre autistique, et chez lesquels aucune mutation codante pathogĂ©nique n’a Ă©tĂ© identifiĂ©e. Cette mĂ©thode permet ainsi de hiĂ©rarchiser des mutations, dont les plus prometteuses deviennent des hypothĂšses testables expĂ©rimentalement pour confirmer leur implication dans le dĂ©veloppement des maladies considĂ©rĂ©es. Ainsi pour les projets de sĂ©quençage gĂ©nome-complets de cohortes de patients, une application systĂ©matique de notre mĂ©thode contribuerait Ă  une meilleure comprĂ©hension des mĂ©canismes de rĂ©gulation de l’expression des gĂšnes, et Ă  une amĂ©lioration du diagnostic des patients.Whole genome sequencing is increasingly used in patients with genetic diseases to diagnose the mutations responsible for the phenotype. However, for a large proportion of sequenced genomes, none of the genes associated with the phenotype have a coding mutation. In these cases, it is possible that a non-coding mutation, located in a cis-regulatory region, modifies the expression of a gene involved in the disease. Despite the existence of methods for annotating and predicting regulatory sequences on the basis of biochemical and epigenetic properties, it remains difficult to define objective criteria for effectively selecting candidate mutations from the millions of non-coding variants present in each patient. In addition, the target genes of these regulatory sequences are generally not known, making it difficult to associate a non-coding mutation with the patient's phenotype.I propose here a supervised learning strategy using random forests, adapted to complex and heterogeneous data sets, to classify and select non-coding mutations deregulating genes responsible for diseases. A notable innovation of my approach is to take into account data of associations between non-coding regions and target genes. In addition, I propose a method for extracting the biological rules identified by the model for each mutation evaluated, allowing an informed selection of candidate mutations.I discuss the functional properties identified by the learning model, based on examples of non-coding variations associated with Mendelian diseases. I also illustrate the potential of this method by analyzing 255,106 de novo variants identified by complete sequencing in 1,902 children with autism spectrum disorders, in whom no pathogenic coding mutations have been identified. This method thus makes it possible to prioritize mutations, the most promising of which become experimentally testable hypotheses to confirm their involvement in the development of the diseases in question. Thus, for whole genome sequencing projects of patient cohorts, a systematic application of our method would contribute to a better understanding of the mechanisms regulating gene expression, and to an improvement in patient diagnosis

    Towards in silico CLIP-seq

    Get PDF
    Abstract We present RBPNet, a novel deep learning method, which predicts CLIP-seq crosslink count distribution from RNA sequence at single-nucleotide resolution. By training on up to a million regions, RBPNet achieves high generalization on eCLIP, iCLIP and miCLIP assays, outperforming state-of-the-art classifiers. RBPNet performs bias correction by modeling the raw signal as a mixture of the protein-specific and background signal. Through model interrogation via Integrated Gradients, RBPNet identifies predictive sub-sequences that correspond to known and novel binding motifs and enables variant-impact scoring via in silico mutagenesis. Together, RBPNet improves imputation of protein-RNA interactions, as well as mechanistic interpretation of predictions

    Classification of non-coding variants with high pathogenic impact

    No full text
    International audienceWhole genome sequencing is increasingly used to diagnose medical conditions of genetic origin. While both coding and non-coding DNA variants contribute to a wide range of diseases, most patients who receive a WGS-based diagnosis today harbour a protein-coding mutation. Functional interpretation and prioritization of non-coding variants represents a persistent challenge, and disease-causing non-coding variants remain largely unidentified. Depending on the disease, WGS fails to identify a candidate variant in 20–80% of patients, severely limiting the usefulness of sequencing for personalised medicine. Here we present FINSURF, a machine-learning approach to predict the functional impact of non-coding variants in regulatory regions. FINSURF outperforms state-of-the-art methods, owing in particular to optimized control variants selection during training. In addition to ranking candidate variants, FINSURF breaks down the score for each variant into contributions from individual annotations, facilitating the evaluation of their functional relevance. We applied FINSURF to a diverse set of 30 diseases with described causative non-coding mutations, and correctly identified the disease-causative non-coding variant within the ten top hits in 22 cases. FINSURF is implemented as an online server to as well as custom browser tracks, and provides a quick and efficient solution to prioritize candidate non-coding variants in realistic clinical settings

    Classification of non-coding variants with high pathogenic impact

    No full text
    Abstract Whole genome sequencing is increasingly used to diagnose medical conditions of genetic origin. While both coding and non-coding DNA variants contribute to a wide range of diseases, most patients who receive a WGS-based diagnosis today harbour a protein-coding mutation. Functional interpretation and prioritization of non-coding variants represents a persistent challenge, and disease-causing non-coding variants remain largely unidentified. Depending on the disease, WGS fails to identify a candidate variant in 20-80% of patients, severely limiting the usefulness of sequencing for personalised medicine. Here we present FINSURF, a machine-learning approach to predict the functional impact of non-coding variants in regulatory regions. FINSURF outperforms state-of-the-art methods, owing to control optimisation during training. In addition to ranking candidate variants, FINSURF also delivers diagnostic information on functional consequences of mutations. We applied FINSURF to a diverse set of 30 diseases with described causative non-coding mutations, and correctly identified the disease-causative non-coding variant within the ten top hits in 22 cases. FINSURF is implemented as an online server to as well as custom browser tracks, and provides a quick and efficient solution to prioritize candidate non-coding variants in realistic clinical settings

    PCSK9 and lipoprotein (a) levels are two predictors of coronary artery calcification in asymptomatic patients with familial hypercholesterolemia

    No full text
    Background and aims: We aimed to assess whether elevated PCSK9 and lipoprotein (a) [Lp(a)] levels associate with coronary artery calcification (CAC), a good marker of atherosclerosis burden, in asymptomatic familial hypercholesterolemia. Methods: We selected 161 molecularly defined FH patients treated with stable doses of statins for more than a year. CAC was measured using the Agatston method and quantified as categorical variable. Fasting plasma samples were collected and analyzed for lipids and lipoproteins. PCSK9 was measured by ELISA, Lp(a) and apolipoprotein (a) concentrations by inmunoturbidimetry and LC-MS/MS, respectively. Results: Circulating PCSK9 levels were significantly reduced in patients without CAC (n = 63), compared to those with CAC (n = 99). Patients with the highest CAC scores (above 100) had the highest levels of circulating PCSK9 and Lp(a). In multivariable regression analyses, the main predictors for a positive CAC score was age and sex followed by circulating PCSK9 and Lp(a) levels. Conclusions: In statin treated asymptomatic FH patients, elevated PCSK9 and Lp(a) levels are independently associated with the presence and severity of CAC, a good predictor of coronary artery disease

    Stable Isotope Kinetic Study of ApoM (Apolipoprotein M)

    No full text
    Clinical and population studiesInternational audienceObjective—ApoM (apolipoprotein M) binds primarily to high-density lipoprotein before to be exchanged with apoB (apolipoprotein B)-containing lipoproteins. Low-density lipoprotein (LDL) receptor-mediated clearance of apoB-containing particles could influence plasma apoM kinetics and decrease its antiatherogenic properties. In humans, we aimed to describe the interaction of apoM kinetics with other components of lipid metabolism to better define its potential benefit on atherosclerosis. Approach and Results—Fourteen male subjects received a primed infusion of H-2(3)-leucine for 14 hours, and analyses were performed by liquid chromatography-tandem mass spectrometry from the hourly plasma samples. Fractional catabolic rates and production rates within lipoproteins were calculated using compartmental models. ApoM was found not only in high-density lipoprotein (59%) and LDL (4%) but also in a non-lipoprotein-related compartment (37%). The apoM distribution was heterogeneous within LDL and non-lipoprotein-related compartments according to plasma triglycerides (r=0.86; Pr range, 0.55-0.89; Pr=0.55; P=0.042). Significant correlations were found between triglycerides and production rates of LDL-apoM (r=0.73; PConclusions—In humans, LDL kinetics play a key role in apoM turnover. Plasma triglycerides act on both apoM and sphingosine-1-phosphate distributions between lipoproteins. These results confirmed that apoM could be bound to high-density lipoprotein after secretion and then quickly exchanged with a non-lipoprotein-related compartment and to LDL to be slowly catabolized

    VLDL (Very-Low-Density Lipoprotein)-Apo E (Apolipoprotein E) May Influence Lp(a) (Lipoprotein [a]) Synthesis or Assembly

    No full text
    International audienceOBJECTIVE: To clarify the association between PCSK9 (proprotein convertase subtilisin/kexin type 9) and Lp(a) (Lipoprotein [a]), we studied Lp(a) kinetics in patients with loss-of-function and gain-of-function PCSK9 mutations and in patients in whom extended-release niacin reduced Lp(a) and PCSK9 concentrations.APPROACH AND RESULTS: Six healthy controls, 9 heterozygous patients with familial hypercholesterolemia (5 with low-density lipoprotein receptor [LDLR] mutations and 4 with PCSK9 gain-of-function mutations) and 3 patients with heterozygous dominant-negative PCSK9 loss-of-function mutations were included in the preliminary study. Eight patients were enrolled in a second study assessing the effects of 2 g/day extended-release niacin. Apolipoprotein kinetics in VLDL (very-lowdensity lipoprotein), LDL (low-density lipoprotein), and Lp(a) were studied using stable isotope techniques. Plasma Lp(a) concentrations were increased in PCSK9-gain-of-function and familial hypercholesterolemia-LDLR groups compared with controls and PCSK9-loss-of-function groups (14±12 versus 5±4 mg/dL; P=0.04), but no change was observed in Lp(a) fractional catabolic rate. Subjects with PCSK9-loss-of-function mutations displayed reduced apoE (apolipoprotein E) concentrations associated with a VLDL-apoE absolute production rate reduction. Lp(a) and VLDL-apoE absolute production rates were correlated (r=0.50; P<0.05). ApoE-to-apolipoprotein (a) molar ratios in Lp(a) increased with plasma Lp(a) (r=0.96; P<0.001) but not with PCSK9 levels. Extended-release niacin-induced reductions in Lp(a) and VLDL-apoE absolute production rate were correlated (r=0.83; P=0.015). In contrast, PCSK9 reduction (−35%; P=0.008) was only correlated with that of VLDL-apoE absolute production rate (r=0.79; P=0.028).CONCLUSIONS: VLDL-apoE production could determine Lp(a) production and/or assembly. As PCSK9 inhibitors reduce plasma apoE and Lp(a) concentrations, apoE could be the link between PCSK9 and Lp(a)

    Screening for eukaryotic motifs in <i>Legionella pneumophila</i> reveals Smh1 as bacterial deacetylase of host histones

    No full text
    Legionella pneumophila (L.p.) is a bacterial pathogen which is a common causative agent of pneumonia. In humans, it infects alveolar macrophages and transfers hundreds of virulence factors that interfere with cellular signalling pathways and the transcriptomic landscape to sustain its own replication. By this interaction, it has acquired eukaryote-like protein motifs by gene transfer events that partake in the pathogenicity of Legionella. In a computational screening approach for eukaryotic motifs in the transcriptome of Legionella, we identified the L.p. strain Corby protein ABQ55614 as putative histone-deacetylase and named it “suppressing modifier of histones 1” (Smh1). During infection, Smh1 translocated from the Legionella vacuole into the host cytosol. When expressed in human macrophage THP-1 cells, Smh1 was localized predominantly in the nucleus, led to broad histone H3 and H4 deacetylation, blunted expression of a large number of genes (e.g. IL-1ÎČ and IL-8), and fostered intracellular bacterial replication. L.p. with a Smh1 knockdown grew normally in media but showed a slight growth defect inside the host cell. Furthermore, Smh1 showed a very potent histone deacetylation activity in vitro, e.g. at H3K14, that could be inhibited by targeted mutation of the putative catalytic centre inferred by analogy with eukaryotic HDAC8, and with the deacetylase inhibitor trichostatin A. In summary, Smh1 displays functional homology with class I/II type HDACs. We identified Smh1 as a new Legionella virulence factor with a eukaryote-like histone-deacetylase activity that moderates host gene expression and might pave the way for further histone modifications. Legionella pneumophila (L.p.) is a prominent bacterial pathogen which is a common causative agent of pneumonia. In order to survive inside the host cell, the human macrophage, it profoundly interacts with host cell processes to advance its own replication. In this study, we identify a bacterial factor, Smh1, with yet unknown function as a host histone deacetylase. The activity of this factor in the host cell leads to attenuated gene expression and increased intracellular bacterial replication.</p
    corecore