23 research outputs found

    Deep phenotyping: symptom annotation made simple with SAMS.

    Get PDF
    Precision medicine needs precise phenotypes. The Human Phenotype Ontology (HPO) uses clinical signs instead of diagnoses and has become the standard annotation for patients\u27 phenotypes when describing single gene disorders. Use of the HPO beyond human genetics is however still limited. With SAMS (Symptom Annotation Made Simple), we want to bring sign-based phenotyping to routine clinical care, to hospital patients as well as to outpatients. Our web-based application provides access to three widely used annotation systems: HPO, OMIM, Orphanet. Whilst data can be stored in our database, phenotypes can also be imported and exported as Global Alliance for Genomics and Health (GA4GH) Phenopackets without using the database. The web interface can easily be integrated into local databases, e.g. clinical information systems. SAMS offers users to share their data with others, empowering patients to record their own signs and symptoms (or those of their children) and thus provide their doctors with additional information. We think that our approach will lead to better characterised patients which is not only helpful for finding disease mutations but also to better understand the pathophysiology of diseases and to recruit patients for studies and clinical trials. SAMS is freely available at https://www.genecascade.org/SAMS/

    MutationTaster2021

    Get PDF
    Here we present an update to MutationTaster, our DNA variant effect prediction tool. The new version uses a different prediction model and attains higher accuracy than its predecessor, especially for rare benign variants. In addition, we have integrated many sources of data that only became available after the last release (such as gnomAD and ExAC pLI scores) and changed the splice site prediction model. To more easily assess the relevance of detected known disease mutations to the clinical phenotype of the patient, MutationTaster now provides information on the diseases they cause. Further changes represent a major overhaul of the interfaces to increase user-friendliness whilst many changes under the hood have been designed to accelerate the processing of uploaded VCF files. We also offer an API for the rapid automated query of smaller numbers of variants from within other software. MutationTaster2021 integrates our disease mutation search engine, MutationDistiller, to prioritise variants from VCF files using the patient's clinical phenotype. The novel version is available at https://www.genecascade.org/MutationTaster2021/. This website is free and open to all users and there is no login requirement

    RegEl corpus: identifying DNA regulatory elements in the scientific literature

    Get PDF
    High-throughput technologies led to the generation of a wealth of data on regulatory DNA elements in the human genome. However, results from disease-driven studies are primarily shared in textual form as scientific articles. Information extraction (IE) algorithms allow this information to be (semi-)automatically accessed. Their development, however, is dependent on the availability of annotated corpora. Therefore, we introduce RegEl (Regulatory Elements), the first freely available corpus annotated with regulatory DNA elements comprising 305 PubMed abstracts for a total of 2690 sentences. We focus on enhancers, promoters and transcription factor binding sites. Three annotators worked in two stages, achieving an overall 0.73 F1 inter-annotator agreement and 0.46 for regulatory elements. Depending on the entity type, IE baselines reach F1-scores of 0.48–0.91 for entity detection and 0.71–0.88 for entity normalization. Next, we apply our entity detection models to the entire PubMed collection and extract co-occurrences of genes or diseases with regulatory elements. This generates large collections of regulatory elements associated with 137 870 unique genes and 7420 diseases, which we make openly available.Database URL: https://zenodo.org/record/6418451#.YqcLHvexVqgPeer Reviewe

    MutationTaster2021

    Get PDF
    Here we present an update to MutationTaster, our DNA variant effect prediction tool. The new version uses a different prediction model and attains higher accuracy than its predecessor, especially for rare benign variants. In addition, we have integrated many sources of data that only became available after the last release (such as gnomAD and ExAC pLI scores) and changed the splice site prediction model. To more easily assess the relevance of detected known disease mutations to the clinical phenotype of the patient, MutationTaster now provides information on the diseases they cause. Further changes represent a major overhaul of the interfaces to increase user-friendliness whilst many changes under the hood have been designed to accelerate the processing of uploaded VCF files. We also offer an API for the rapid automated query of smaller numbers of variants from within other software. MutationTaster2021 integrates our disease mutation search engine, MutationDistiller, to prioritise variants from VCF files using the patient's clinical phenotype. The novel version is available at https://www.genecascade.org/MutationTaster2021/. This website is free and open to all users and there is no login requirement

    Photoactivation of titanium-oxo cluster [Ti 6 O 6 (OR) 6 (O 2 C t Bu) 6 ] : mechanism, photoactivated structures, and onward reactivity with O 2 to a peroxide complex

    Get PDF
    The molecular titanium-oxo cluster [Ti6O6(OiPr)6(O2CtBu)6] (1) can be photoactivated by UV light, resulting in a deeply coloured mixed valent (photoreduced) Ti (iii/iv) cluster, alongside alcohol and ketone (photooxidised) organic products. Mechanistic studies indicate that a two-electron (not free-radical) mechanism occurs in this process, which utilises the cluster structure to facilitate multielectron reactions. The photoreduced products [Ti6O6(OiPr)4(O2CtBu)6(sol)2], sol = iPrOH (2) or pyridine (3), can be isolated in good yield and are structurally characterized, each with two, uniquely arranged, antiferromagnetically coupled d-electrons. 2 and 3 undergo onward oxidation under air, with 3 cleanly transforming into peroxide complex, [Ti6O6(OiPr)4(O2CtBu)6(py)(O2)] (5). 5 reacts with isopropanol to regenerate the initial cluster (1) completing a closed cycle, and suggesting opportunities for the deployment of these easily made and tuneable clusters for sustainable photocatalytic processes using air and light. The redox reactivity described here is only possible in a cluster with multiple Ti sites, which can perform multi-electron processes and can adjust its shape to accommodate changes in electron density

    Diatom DNA metabarcoding for ecological assessment: Comparison among bioinformatics pipelines used in six European countries reveals the need for standardization

    Get PDF
    Ecological assessment of lakes and rivers using benthic diatom assemblages currently requires considerable taxonomic expertise to identify species using light microscopy. This traditional approach is also time-consuming. Diatom metabarcoding is a promising alternative and there is increasing interest in using this approach for routine assessment. However, until now, analysis protocols for diatom metabarcoding have been developed and optimised by research groups working in isolation. The diversity of existing bioinformatics methods highlights the need for an assessment of the performance and comparability of results of different methods. The aim of this study was to test the correspondence of outputs from six bioinformatics pipelines currently in use for diatom metabarcoding in different European countries. Raw sequence data from 29 biofilm samples were treated by each of the bioinformatics pipelines, five of them using the same curated reference database. The outputs of the pipelines were compared in terms of sequence unit assemblages, taxonomic assignment, biotic index score and ecological assessment outcomes. The three last components were also compared to outputs from traditional light microscopy, which is currently accepted for ecological assessment of phytobenthos, as required by the Water Framework Directive. We also tested the performance of the pipelines on the two DNA markers (rbcL and 185-V4) that are currently used by the working groups participating in this study. The sequence unit assemblages produced by different pipelines showed significant differences in terms of assigned and unassigned read numbers and sequence unit numbers. When comparing the taxonomic assignments at genus and species level, correspondence of the taxonomic assemblages between pipelines was weak. Most discrepancies were linked to differential detection or quantification of taxa, despite the use of the same reference database. Subsequent calculation of biotic index scores also showed significant differences between approaches, which were reflected in the final ecological assessment. Use of the rbcL marker always resulted in better correlation among molecular datasets and also in results closer to these generated using traditional microscopy. This study shows that decisions made in pipeline design have implications for the dataset's structure and the taxonomic assemblage, which in turn may affect biotic index calculation and ecological assessment. There is a need to define best-practice bioinformatics parameters in order to ensure the best representation of diatom assemblages. Only the use of similar parameters will ensure the compatibility of data from different working groups. The future of diatom metabarcoding for ecological assessment may also lie in the development of new metrics using, for example, presence/absence instead of relative abundance data. (C) 2020 The Authors. Published by Elsevier B.V

    Discovery of a non-canonical GRHL1 binding site using deep convolutional and recurrent neural networks

    No full text
    Abstract Background Transcription factors regulate gene expression by binding to transcription factor binding sites (TFBSs). Most models for predicting TFBSs are based on position weight matrices (PWMs), which require a specific motif to be present in the DNA sequence and do not consider interdependencies of nucleotides. Novel approaches such as Transcription Factor Flexible Models or recurrent neural networks consequently provide higher accuracies. However, it is unclear whether such approaches can uncover novel non-canonical, hitherto unexpected TFBSs relevant to human transcriptional regulation. Results In this study, we trained a convolutional recurrent neural network with HT-SELEX data for GRHL1 binding and applied it to a set of GRHL1 binding sites obtained from ChIP-Seq experiments from human cells. We identified 46 non-canonical GRHL1 binding sites, which were not found by a conventional PWM approach. Unexpectedly, some of the newly predicted binding sequences lacked the CNNG core motif, so far considered obligatory for GRHL1 binding. Using isothermal titration calorimetry, we experimentally confirmed binding between the GRHL1-DNA binding domain and predicted GRHL1 binding sites, including a non-canonical GRHL1 binding site. Mutagenesis of individual nucleotides revealed a correlation between predicted binding strength and experimentally validated binding affinity across representative sequences. This correlation was neither observed with a PWM-based nor another deep learning approach. Conclusions Our results show that convolutional recurrent neural networks may uncover unanticipated binding sites and facilitate quantitative transcription factor binding predictions

    Data to support: "Photoactivation of titanium-oxo cluster [Ti 6 O 6 (OR) 6 (O 2 C t Bu) 6 ] : mechanism, photoactivated structures, and onward reactivity with O 2 to a peroxide complex"

    No full text
    The molecular titanium-oxo cluster [Ti6O6(OiPr)6(O2CtBu)6] (1) can be photoactivated by UV light, resulting in a deeply coloured mixed valent (photoreduced) Ti (iii/iv) cluster, alongside alcohol and ketone (photooxidised) organic products. Mechanistic studies indicate that a two-electron (not free-radical) mechanism occurs in this process, which utilises the cluster structure to facilitate multielectron reactions. The photoreduced products [Ti6O6(OiPr)4(O2CtBu)6(sol)2], sol = iPrOH (2) or pyridine (3), can be isolated in good yield and are structurally characterized, each with two, uniquely arranged, antiferromagnetically coupled d-electrons. 2 and 3 undergo onward oxidation under air, with 3 cleanly transforming into peroxide complex, [Ti6O6(OiPr)4(O2CtBu)6(py)(O2)] (5). 5 reacts with isopropanol to regenerate the initial cluster (1) completing a closed cycle, and suggesting opportunities for the deployment of these easily made and tuneable clusters for sustainable photocatalytic processes using air and light. The redox reactivity described here is only possible in a cluster with multiple Ti sites, which can perform multi-electron processes and can adjust its shape to accommodate changes in electron density

    Diatom DNA metabarcoding for ecological assessment: Comparison among bioinformatics pipelines used in six European countries reveals the need for standardization

    Get PDF
    Ecological assessment of lakes and rivers using benthic diatom assemblages currently requires considerable taxonomic expertise to identify species using light microscopy. This traditional approach is also time-consuming. Diatom metabarcoding is a promising alternative and there is increasing interest in using this approach for routine assessment. However, until now, analysis protocols for diatom metabarcoding have been developed and optimised by research groups working in isolation. The diversity of existing bioinformatics methods highlights the need for an assessment of the performance and comparability of results of different methods. The aim of this study was to test the correspondence of outputs from six bioinformatics pipelines currently in use for diatom metabarcoding in different European countries. Raw sequence data from 29 biofilm samples were treated by each of the bioinformatics pipelines, five of them using the same curated reference database. The outputs of the pipelines were compared in terms of sequence unit assemblages, taxonomic assignment, biotic index score and ecological assessment outcomes. The three last components were also compared to outputs from traditional light microscopy, which is currently accepted for ecological assessment of phytobenthos, as required by the Water Framework Directive. We also tested the performance of the pipelines on the two DNA markers (rbcL and 185-V4) that are currently used by the working groups participating in this study. The sequence unit assemblages produced by different pipelines showed significant differences in terms of assigned and unassigned read numbers and sequence unit numbers. When comparing the taxonomic assignments at genus and species level, correspondence of the taxonomic assemblages between pipelines was weak. Most discrepancies were linked to differential detection or quantification of taxa, despite the use of the same reference database. Subsequent calculation of biotic index scores also showed significant differences between approaches, which were reflected in the final ecological assessment. Use of the rbcL marker always resulted in better correlation among molecular datasets and also in results closer to these generated using traditional microscopy. This study shows that decisions made in pipeline design have implications for the dataset's structure and the taxonomic assemblage, which in turn may affect biotic index calculation and ecological assessment. There is a need to define best-practice bioinformatics parameters in order to ensure the best representation of diatom assemblages. Only the use of similar parameters will ensure the compatibility of data from different working groups. The future of diatom metabarcoding for ecological assessment may also lie in the development of new metrics using, for example, presence/absence instead of relative abundance data
    corecore