92 research outputs found

    Literature-based priors for gene regulatory networks

    Get PDF
    Motivation: The use of prior knowledge to improve gene regulatory network modelling has often been proposed. In this paper we present the first research on the massive incorporation of prior knowledge from literature for Bayesian network learning of gene networks. As the publication rate of scientific papers grows, updating online databases, which have been proposed as potential prior knowledge in past rese-arch, becomes increasingly challenging. The novelty of our approach lies in the use of gene-pair association scores that describe the over-lap in the contexts in which the genes are mentioned, generated from a large database of scientific literature, harnessing the information contained in a huge number of documents into a simple, clear format. Results: We present a method to transform such literature-based gene association scores to network prior probabilities, and apply it to learn gene sub-networks for yeast, E. coli and Human organisms. We also investigate the effect of weighting the influence of the prior know-ledge. Our findings show that literature-based priors can improve both the number of true regulatory interactions present in the network and the accuracy of expression value prediction on genes, in comparison to a network learnt solely from expression data. Networks learnt with priors also show an improved biological interpretation, with identified subnetworks that coincide with known biological pathways. Contact

    Rare disease research workflow using multilayer networks elucidates the molecular determinants of severity in Congenital Myasthenic Syndromes

    Get PDF
    \ua9 The Author(s) 2024.Exploring the molecular basis of disease severity in rare disease scenarios is a challenging task provided the limitations on data availability. Causative genes have been described for Congenital Myasthenic Syndromes (CMS), a group of diverse minority neuromuscular junction (NMJ) disorders; yet a molecular explanation for the phenotypic severity differences remains unclear. Here, we present a workflow to explore the functional relationships between CMS causal genes and altered genes from each patient, based on multilayer network community detection analysis of complementary biomedical information provided by relevant data sources, namely protein-protein interactions, pathways and metabolomics. Our results show that CMS severity can be ascribed to the personalized impairment of extracellular matrix components and postsynaptic modulators of acetylcholine receptor (AChR) clustering. This work showcases how coupling multilayer network analysis with personalized -omics information provides molecular explanations to the varying severity of rare diseases; paving the way for sorting out similar cases in other rare diseases

    Drug prioritization using the semantic properties of a knowledge graph

    Get PDF
    Abstract Compounds that are candidates for drug repurposing can be ranked by leveraging knowledge available in the biomedical literature and databases. This knowledge, spread across a variety of sources, can be integrated within a knowledge graph, which thereby comprehensively describes known relationships between biomedical concepts, such as drugs, diseases, genes, etc. Our work uses the semantic information between drug and disease concepts as features, which are extracted from an existing knowledge graph that integrates 200 different biological knowledge sources. RepoDB, a standard drug repurposing database which describes drug-disease combinations that were approved or that failed in clinical trials, is used to train a random forest classifier. The 10-times repeated 10-fold cross-validation performance of the classifier achieves a mean area under the receiver operating characteristic curve (AUC) of 92.2%. We apply the classifier to prioritize 21 preclinical drug repurposing candidates that have been suggested for Autosomal Dominant Polycystic Kidney Disease (ADPKD). Mozavaptan, a vasopressin V2 receptor antagonist is predicted to be the drug most likely to be approved after a clinical trial, and belongs to the same drug class as tolvaptan, the only treatment for ADPKD that is currently approved. We conclude that semantic properties of concepts in a knowledge graph can be exploited to prioritize drug repurposing candidates for testing in clinical trials

    Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases

    Get PDF
    For the first time in Europe hundreds of rare disease (RD) experts team up to actively share and jointly analyse existing patient’s data. Solve-RD is a Horizon 2020-supported EU flagship project bringing together >300 clinicians, scientists, and patient representatives of 51 sites from 15 countries. Solve-RD is built upon a core group of four European Reference Networks (ERNs; ERN-ITHACA, ERN-RND, ERN-Euro NMD, ERN-GENTURIS) which annually see more than 270,000 RD patients with respective pathologies. The main ambition is to solve unsolved rare diseases for which a molecular cause is not yet known. This is achieved through an innovative clinical research environment that introduces novel ways to organise expertise and data. Two major approaches are being pursued (i) massive data re-analysis of >19,000 unsolved rare disease patients and (ii) novel combined -omics approaches. The minimum requirement to be eligible for the analysis activities is an inconclusive exome that can be shared with controlled access. The first preliminary data re-analysis has already diagnosed 255 cases form 8393 exomes/genome datasets. This unprecedented degree of collaboration focused on sharing of data and expertise shall identify many new disease genes and enable diagnosis of many so far undiagnosed patients from all over Europe

    Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power.</p> <p>Results</p> <p>Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context.</p> <p>Conclusions</p> <p>Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.</p

    Annotating Transcriptional Effects of Genetic Variants in Disease-Relevant Tissue: Transcriptome-Wide Allelic Imbalance in Osteoarthritic Cartilage

    Get PDF
    Objective. Multiple single-nucleotide polymorphisms (SNPs) conferring susceptibility to osteoarthritis (OA) mark imbalanced expression of positional genes in articular cartilage, reflected by unequally expressed alleles among heterozygotes (allelic imbalance [AI]). We undertook this study to explore the articular cartilage transcriptome from OA patients for AI events to identify putative disease-driving genetic variation. Methods. AI was assessed in 42 preserved and 5 lesioned OA cartilage samples (from the Research Arthritis and Articular Cartilage study) for which RNA sequencing data were available. The count fraction of the alternative alleles among the alternative and reference alleles together (Ο†) was determined for heterozygous individuals. A meta-analysis was performed to generate a meta-Ο† and P value for each SNP with a false discovery rate (FDR) correction for multiple comparisons. To further validate AI events, we explored them as a function of multiple additional OA features. Results. We observed a total of 2,070 SNPs that consistently marked AI of 1,031 unique genes in articular cartilage. Of these genes, 49 were found to be significantly differentially expressed (fold change 2, FDR <0.05) between preserved and paired lesioned cartilage, and 18 had previously been reported to confer susceptibility to OA and/or related phenotypes. Moreover, we identified notable highly significant AI SNPs in the CRLF1, WWP2, and RPS3 genes that were related to multiple OA features. Conclusion. We present a framework and resulting data set for researchers in the OA research field to probe for disease-relevant genetic variation that affects gene expression in pivotal disease-affected tissue. This likely includes putative novel compelling OA risk genes such as CRLF1, WWP2, and RPS3

    Single-nucleotide resolution analysis of the transcriptome structure of Clostridium beijerinckii NCIMB 8052 using RNA-Seq

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Clostridium beijerinckii </it>is an important solvent producing microorganism. The genome of <it>C. beijerinckii </it>NCIMB 8052 has recently been sequenced. Although transcriptome structure is important in order to reveal the functional and regulatory architecture of the genome, the physical structure of transcriptome for this strain, such as the operon linkages and transcript boundaries are not well understood.</p> <p>Results</p> <p>In this study, we conducted a single-nucleotide resolution analysis of the <it>C. beijerinckii </it>NCIMB 8052 transcriptome using high-throughput RNA-Seq technology. We identified the transcription start sites and operon structure throughout the genome. We confirmed the structure of important gene operons involved in metabolic pathways for acid and solvent production in <it>C. beijerinckii </it>8052, including <it>pta</it>-<it>ack</it>, <it>ptb</it>-<it>buk</it>, <it>hbd</it>-<it>etfA</it>-<it>etfB</it>-<it>crt </it>(<it>bcs</it>) and <it>ald</it>-<it>ctfA</it>-<it>ctfB</it>-<it>adc </it>(<it>sol</it>) operons; we also defined important operons related to chemotaxis/motility, transcriptional regulation, stress response and fatty acids biosynthesis along with others. We discovered 20 previously non-annotated regions with significant transcriptional activities and 15 genes whose translation start codons were likely mis-annotated. As a consequence, the accuracy of existing genome annotation was significantly enhanced. Furthermore, we identified 78 putative silent genes and 177 putative housekeeping genes based on normalized transcription measurement with the sequence data. We also observed that more than 30% of pseudogenes had significant transcriptional activities during the fermentation process. Strong correlations exist between the expression values derived from RNA-Seq analysis and microarray data or qRT-PCR results.</p> <p>Conclusions</p> <p>Transcriptome structural profiling in this research provided important supplemental information on the accuracy of genome annotation, and revealed additional gene functions and regulation in <it>C. beijerinckii</it>.</p

    Comprehensive Gene-Expression Survey Identifies Wif1 as a Modulator of Cardiomyocyte Differentiation

    Get PDF
    During chicken cardiac development the proepicardium (PE) forms the epicardium (Epi), which contributes to several non-myocardial lineages within the heart. In contrast to Epi-explant cultures, PE explants can differentiate into a cardiomyocyte phenotype. By temporal microarray expression profiles of PE-explant cultures and maturing Epi cells, we identified genes specifically associated with differentiation towards either of these lineages and genes that are associated with the Epi-lineage restriction. We found a central role for Wnt signaling in the determination of the different cell lineages. Immunofluorescent staining after recombinant-protein incubation in PE-explant cultures indicated that the early upregulated Wnt inhibitory factor-1 (Wif1), stimulates cardiomyocyte differentiation in a similar manner as Wnt stimulation. Concordingly, in the mouse pluripotent embryogenic carcinoma cell line p19cl6, early and late Wif1 exposure enhances and attenuates differentiation, respectively. In ovo exposure of the HH12 chicken embryonic heart to Wif1 increases the Tbx18-positive cardiac progenitor pool. These data indicate that Wif1 enhances cardiomyogenesis
    • …
    corecore