7 research outputs found

    Machine learning approaches for the characterisation of biological systems

    No full text
    Sielemann J. Machine learning approaches for the characterisation of biological systems. Bielefeld: Universität Bielefeld; 2023

    plASgraph - using graph neural networks to detect plasmid contigs from an assembly graph

    Get PDF
    Sielemann J, Sielemann K, Brejová B, Vinař T, Chauve C. plASgraph - using graph neural networks to detect plasmid contigs from an assembly graph. bioRxiv. 2022

    Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana

    No full text
    Sielemann J, Wulf D, Schmidt R, Bräutigam A. Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana. Nature Communications. 2021;12(1): 6549.**Abstract** Understanding gene expression will require understanding where regulatory factors bind genomic DNA. The frequently used sequence-based motifs of protein-DNA binding are not predictive, since a genome contains many more binding sites than are actually bound and transcription factors of the same family share similar DNA-binding motifs. Traditionally, these motifs only depict sequence but neglect DNA shape. Since shape may contribute non-linearly and combinational to binding, machine learning approaches ought to be able to better predict transcription factor binding. Here we show that a random forest machine learning approach, which incorporates the 3D-shape of DNA, enhances binding prediction for all 216 testedArabidopsis thalianatranscription factors and improves the resolution of differential binding by transcription factor family members which share the same binding motif. We observed that DNA shape features were individually weighted for each transcription factor, even if they shared the same binding sequence

    Data_Sheet_1_plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph.PDF

    No full text
    Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph neural networks (GNNs) and the assembly graph to propagate the information from nearby nodes, which leads to more accurate classification, especially for short contigs that are difficult to classify based on sequence features or database searches alone. We trained plASgraph2 on a data set of samples from the ESKAPEE group of pathogens. plASgraph2 either outperforms or performs on par with a wide range of state-of-the-art methods on testing sets of independent ESKAPEE samples and samples from related pathogens. On one hand, our study provides a new accurate and easy to use tool for contig classification in bacterial isolates; on the other hand, it serves as a proof-of-concept for the use of GNNs in genomics. Our software is available at https://github.com/cchauve/plasgraph2 and the training and testing data sets are available at https://github.com/fmfi-compbio/plasgraph2-datasets.</p

    Distinct Myocardial Transcriptomic Profiles of Cardiomyopathies Stratified by the Mutant Genes

    No full text
    Sielemann K, Elbeck Z, Gärtner A, et al. Distinct Myocardial Transcriptomic Profiles of Cardiomyopathies Stratified by the Mutant Genes. Genes. 2020;11(12): 1430.Cardiovascular diseases are the number one cause of morbidity and mortality worldwide, but the underlying molecular mechanisms remain not well understood. Cardiomyopathies are primary diseases of the heart muscle and contribute to high rates of heart failure and sudden cardiac deaths. Here, we distinguished four different genetic cardiomyopathies based on gene expression signatures. In this study, RNA-Sequencing was used to identify gene expression signatures in myocardial tissue of cardiomyopathy patients in comparison to non-failing human hearts. Therefore, expression differences between patients with specific affected genes, namely LMNA (lamin A/C), RBM20 (RNA binding motif protein 20), TTN (titin) and PKP2 (plakophilin 2) were investigated. We identified genotype-specific differences in regulated pathways, Gene Ontology (GO) terms as well as gene groups like secreted or regulatory proteins and potential candidate drug targets revealing specific molecular pathomechanisms for the four subtypes of genetic cardiomyopathies. Some regulated pathways are common between patients with mutations in RBM20 and TTN as the splice factor RBM20 targets amongst other genes TTN, leading to a similar response on pathway level, even though many differentially expressed genes (DEGs) still differ between both sample types. The myocardium of patients with mutations in LMNA is widely associated with upregulated genes/pathways involved in immune response, whereas mutations in PKP2 lead to a downregulation of genes of the extracellular matrix. Our results contribute to further understanding of the underlying molecular pathomechanisms aiming for novel and better treatment of genetic cardiomyopathies

    Transcription factors mediating regulation of photosynthesis

    No full text
    Halpape W, Wulf D, Verwaaijen B, et al. Transcription factors mediating regulation of photosynthesis. bioRxiv. 2023.Photosynthesis by which plants convert carbon dioxide to sugars using the energy of light is fundamental to life as it forms the basis of nearly all food chains. Surprisingly, our knowledge about its transcriptional regulation remains incomplete. Effort for its agricultural optimization have mostly focused on post-translational regulatory processes1–3but photosynthesis is regulated at the post-transcriptional4and the transcriptional level5. Stacked transcription factor mutations remain photosynthetically active5,6and additional transcription factors have been difficult to identify possibly due to redundancy6or lethality. Using a random forest decision tree-based machine learning approach for gene regulatory network calculation7we determined ranked candidate transcription factors and validated five out of five tested transcription factors as controlling photosynthesisin vivo. The detailed analyses of previously published and newly identified transcription factors suggest that photosynthesis is transcriptionally regulated in a partitioned, non-hierarchical, interlooped network

    Distinct myocardial transcriptomic profiles of cardiomyopathies stratified by the mutant genes

    No full text
    Cardiovascular diseases are the number one cause of morbidity and mortality worldwide, but the underlying molecular mechanisms remain not well understood. Cardiomyopathies are primary diseases of the heart muscle and contribute to high rates of heart failure and sudden cardiac deaths. Here, we distinguished four different genetic cardiomyopathies based on gene expression signatures. In this study, RNA-Sequencing was used to identify gene expression signatures in myocardial tissue of cardiomyopathy patients in comparison to non-failing human hearts. Therefore, expression differences between patients with specific affected genes, namely LMNA\it LMNA (lamin A/C), RBM20\it RBM20 (RNA binding motif protein 20), TTN\it TTN (titin) and PKP2\it PKP2 (plakophilin 2) were investigated. We identified genotype-specific differences in regulated pathways, Gene Ontology (GO) terms as well as gene groups like secreted or regulatory proteins and potential candidate drug targets revealing specific molecular pathomechanisms for the four subtypes of genetic cardiomyopathies. Some regulated pathways are common between patients with mutations in RBM20\it RBM20 and TTN\it TTN as the splice factor RBM20 targets amongst other genes TTN\it TTN, leading to a similar response on pathway level, even though many differentially expressed genes (DEGs) still differ between both sample types. The myocardium of patients with mutations in LMNA\it LMNA is widely associated with upregulated genes/pathways involved in immune response, whereas mutations in PKP2\it PKP2 lead to a downregulation of genes of the extracellular matrix. Our results contribute to further understanding of the underlying molecular pathomechanisms aiming for novel and better treatment of genetic cardiomyopathies
    corecore