31 research outputs found

    Applying negative rule mining to improve genome annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items.</p> <p>Results</p> <p>Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower.</p> <p>Conclusion</p> <p>Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.</p

    PhenomiR: a knowledgebase for microRNA expression in diseases and biological processes

    Get PDF
    PhenomiR is a comprehensive database of 542 studies reporting deregulation of miRNAs allowing large-scale statistical analysis of miRNA expression changes

    Identifying pathways modulating sleep duration: from genomics to transcriptomics

    Get PDF
    Recognizing that insights into the modulation of sleep duration can emerge by exploring the functional relationships among genes, we used this strategy to explore the genome-wide association results for this trait. We detected two major signalling pathways (ion channels and the ERBB signalling family of tyrosine kinases) that could be replicated across independent GWA studies meta-analyses. To investigate the significance of these pathways for sleep modulation, we performed transcriptome analyses of short sleeping flies' heads (knockdown for the ABCC9 gene homolog;dSur). We found significant alterations in gene-expression in the short sleeping knockdowns versus controls flies, which correspond to pathways associated with sleep duration in our human studies. Most notably, the expression of Rho and EGFR (members of the ERBB signalling pathway) genes was down-and up-regulated, respectively, consistently with the established role of these genes for sleep consolidation in Drosophila. Using a disease multifactorial interaction network, we showed that many of the genes of the pathways indicated to be relevant for sleep duration had functional evidence of their involvement with sleep regulation, circadian rhythms, insulin secretion, gluconeogenesis and lipogenesis

    Identifying pathways modulating sleep duration : from genomics to transcriptomics

    Get PDF
    Recognizing that insights into the modulation of sleep duration can emerge by exploring the functional relationships among genes, we used this strategy to explore the genome-wide association results for this trait. We detected two major signalling pathways (ion channels and the ERBB signalling family of tyrosine kinases) that could be replicated across independent GWA studies meta-analyses. To investigate the significance of these pathways for sleep modulation, we performed transcriptome analyses of short sleeping flies’ heads (knockdown for the ABCC9 gene homolog; dSur). We found significant alterations in gene-expression in the short sleeping knockdowns versus controls flies, which correspond to pathways associated with sleep duration in our human studies. Most notably, the expression of Rho and EGFR (members of the ERBB signalling pathway) genes was down- and upregulated, respectively, consistently with the established role of these genes for sleep consolidation in Drosophila. Using a disease multifactorial interaction network, we showed that many of the genes of the pathways indicated to be relevant for sleep duration had functional evidence of their involvement with sleep regulation, circadian rhythms, insulin secretion, gluconeogenesis and lipogenesis

    The Mouse Functional Genome Database (MfunGD): functional annotation of proteins in the light of their cellular context

    Get PDF
    MfunGD () provides a resource for annotated mouse proteins and their occurrence in protein networks. Manual annotation concentrates on proteins which are found to interact physically with other proteins. Accordingly, manually curated information from a protein–protein interaction database (MPPI) and a database of mammalian protein complexes is interconnected with MfunGD. Protein function annotation is performed using the Functional Catalogue (FunCat) annotation scheme which is widely used for the analysis of protein networks. The dataset is also supplemented with information about the literature that was used in the annotation process as well as links to the SIMAP Fasta database, the Pedant protein analysis system and cross-references to external resources. Proteins that so far were not manually inspected are annotated automatically by a graphical probabilistic model and/or superparamagnetic clustering. The database is continuously expanding to include the rapidly growing amount of functional information about gene products from mouse. MfunGD is implemented in GenRE, a J2EE-based component-oriented multi-tier architecture following the separation of concern principle

    The Negatome database: a reference set of non-interacting protein pairs

    Get PDF
    The Negatome is a collection of protein and domain pairs that are unlikely to be engaged in direct physical interactions. The database currently contains experimentally supported non-interacting protein pairs derived from two distinct sources: by manual curation of literature and by analyzing protein complexes with known 3D structure. More stringent lists of non-interacting pairs were derived from these two datasets by excluding interactions detected by high-throughput approaches. Additionally, non-interacting protein domains have been derived from the stringent manual and structural data, respectively. The Negatome is much less biased toward functionally dissimilar proteins than the negative data derived by randomly selecting proteins from different cellular locations. It can be used to evaluate protein and domain interactions from new experiments and improve the training of interaction prediction algorithms. The Negatome database is available at http://mips.helmholtz-muenchen.de/proj/ppi/negatome

    CORUM: the comprehensive resource of mammalian protein complexes—2009

    Get PDF
    CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing ∌16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a ‘Phylogenetic Conservation’ analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html)

    COVID19 Disease Map, a computational knowledge repository of virus-host interaction mechanisms.

    Get PDF
    Funder: Bundesministerium fĂŒr Bildung und ForschungFunder: Bundesministerium fĂŒr Bildung und Forschung (BMBF)We need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 molecular mechanisms. The COVID-19 Disease Map (C19DMap) is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. Notably, it is a computational resource for graph-based analyses and disease modelling. To this end, we established a framework of tools, platforms and guidelines necessary for a multifaceted community of biocurators, domain experts, bioinformaticians and computational biologists. The diagrams of the C19DMap, curated from the literature, are integrated with relevant interaction and text mining databases. We demonstrate the application of network analysis and modelling approaches by concrete examples to highlight new testable hypotheses. This framework helps to find signatures of SARS-CoV-2 predisposition, treatment response or prioritisation of drug candidates. Such an approach may help deal with new waves of COVID-19 or similar pandemics in the long-term perspective

    Genetic Differences in the Immediate Transcriptome Response to Stress Predict Risk-Related Brain Function and Psychiatric Disorders

    Get PDF
    Depression risk is exacerbated by genetic factors and stress exposure; however, the biological mechanisms through which these factors interact to confer depression risk are poorly understood. One putative biological mechanism implicates variability in the ability of cortisol, released in response to stress, to trigger a cascade of adaptive genomic and non-genomic processes through glucocorticoid receptor (GR) activation. Here, we demonstrate that common genetic variants in long-range enhancer elements modulate the immediate transcriptional response to GR activation in human blood cells. These functional genetic variants increase risk for depression and co-heritable psychiatric disorders. Moreover, these risk variants are associated with inappropriate amygdala reactivity, a transdiagnostic psychiatric endophenotype and an important stress hormone response trigger. Network modeling and animal experiments suggest that these genetic differences in GR-induced transcriptional activation may mediate the risk for depression and other psychiatric disorders by altering a network of functionally related stress-sensitive genes in blood and brain

    Mining sequence annotation databanks for association patterns

    No full text
    Data and text mining Vol. 21 Suppl. 3 2005, pages iii49–iii57 doi:10.1093/bioinformatics/bti1206 Mining sequence annotation databanks for association pattern
    corecore