40 research outputs found
Applying negative rule mining to improve genome annotation
<p>Abstract</p> <p>Background</p> <p>Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments are usually caused by unwarranted homology-based transfer of information from existing database entries to the new target sequences. We have previously demonstrated that data mining in large sequence annotation databanks can help identify annotation items that are strongly associated with each other, and that exceptions from strong positive association rules often point to potential annotation errors. Here we investigate the applicability of negative association rule mining to revealing erroneously assigned annotation items.</p> <p>Results</p> <p>Almost all exceptions from strong negative association rules are connected to at least one wrong attribute in the feature combination making up the rule. The fraction of annotation features flagged by this approach as suspicious is strongly enriched in errors and constitutes about 0.6% of the whole body of the similarity-transferred annotation in the PEDANT genome database. Positive rule mining does not identify two thirds of these errors. The approach based on exceptions from negative rules is much more specific than positive rule mining, but its coverage is significantly lower.</p> <p>Conclusion</p> <p>Mining of both negative and positive association rules is a potent tool for finding significant trends in protein annotation and flagging doubtful features for further inspection.</p
PhenomiR: a knowledgebase for microRNA expression in diseases and biological processes
PhenomiR is a comprehensive database of 542 studies reporting deregulation of miRNAs allowing large-scale statistical analysis of miRNA expression changes
Identifying pathways modulating sleep duration: from genomics to transcriptomics
Recognizing that insights into the modulation of sleep duration can emerge by exploring the functional relationships among genes, we used this strategy to explore the genome-wide association results for this trait. We detected two major signalling pathways (ion channels and the ERBB signalling family of tyrosine kinases) that could be replicated across independent GWA studies meta-analyses. To investigate the significance of these pathways for sleep modulation, we performed transcriptome analyses of short sleeping flies' heads (knockdown for the ABCC9 gene homolog;dSur). We found significant alterations in gene-expression in the short sleeping knockdowns versus controls flies, which correspond to pathways associated with sleep duration in our human studies. Most notably, the expression of Rho and EGFR (members of the ERBB signalling pathway) genes was down-and up-regulated, respectively, consistently with the established role of these genes for sleep consolidation in Drosophila. Using a disease multifactorial interaction network, we showed that many of the genes of the pathways indicated to be relevant for sleep duration had functional evidence of their involvement with sleep regulation, circadian rhythms, insulin secretion, gluconeogenesis and lipogenesis
Identifying pathways modulating sleep duration : from genomics to transcriptomics
Recognizing that insights into the modulation of sleep duration can emerge by exploring the functional relationships among genes, we used this strategy to explore the genome-wide association results for this trait. We detected two major signalling pathways (ion channels and the ERBB signalling family of tyrosine kinases) that could be replicated across independent GWA studies meta-analyses. To investigate the significance of these pathways for sleep modulation, we performed transcriptome analyses of short sleeping flies’ heads (knockdown for the ABCC9 gene homolog; dSur). We found significant alterations in gene-expression in the short sleeping knockdowns versus controls flies, which correspond to pathways associated with sleep duration in our human studies. Most notably, the expression of Rho and EGFR (members of the ERBB signalling pathway) genes was down- and upregulated, respectively, consistently with the established role of these genes for sleep consolidation in Drosophila. Using a disease multifactorial interaction network, we showed that many of the genes of the pathways indicated to be relevant for sleep duration had functional evidence of their involvement with sleep regulation, circadian rhythms, insulin secretion, gluconeogenesis and lipogenesis
The Mouse Functional Genome Database (MfunGD): functional annotation of proteins in the light of their cellular context
MfunGD () provides a resource for annotated mouse proteins and their occurrence in protein networks. Manual annotation concentrates on proteins which are found to interact physically with other proteins. Accordingly, manually curated information from a protein–protein interaction database (MPPI) and a database of mammalian protein complexes is interconnected with MfunGD. Protein function annotation is performed using the Functional Catalogue (FunCat) annotation scheme which is widely used for the analysis of protein networks. The dataset is also supplemented with information about the literature that was used in the annotation process as well as links to the SIMAP Fasta database, the Pedant protein analysis system and cross-references to external resources. Proteins that so far were not manually inspected are annotated automatically by a graphical probabilistic model and/or superparamagnetic clustering. The database is continuously expanding to include the rapidly growing amount of functional information about gene products from mouse. MfunGD is implemented in GenRE, a J2EE-based component-oriented multi-tier architecture following the separation of concern principle
The Negatome database: a reference set of non-interacting protein pairs
The Negatome is a collection of protein and domain pairs that are unlikely to be engaged in direct physical interactions. The database currently contains experimentally supported non-interacting protein pairs derived from two distinct sources: by manual curation of literature and by analyzing protein complexes with known 3D structure. More stringent lists of non-interacting pairs were derived from these two datasets by excluding interactions detected by high-throughput approaches. Additionally, non-interacting protein domains have been derived from the stringent manual and structural data, respectively. The Negatome is much less biased toward functionally dissimilar proteins than the negative data derived by randomly selecting proteins from different cellular locations. It can be used to evaluate protein and domain interactions from new experiments and improve the training of interaction prediction algorithms. The Negatome database is available at http://mips.helmholtz-muenchen.de/proj/ppi/negatome
CORUM: the comprehensive resource of mammalian protein complexes—2009
CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing ∼16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a ‘Phylogenetic Conservation’ analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html)
Extensive identification of genes involved in congenital and structural heart disorders and cardiomyopathy
Clinical presentation of congenital heart disease is heterogeneous, making identification of the disease-causing genes and their genetic pathways and mechanisms of action challenging. By using in vivo electrocardiography, transthoracic echocardiography and microcomputed tomography imaging to screen 3,894 single-gene-null mouse lines for structural and functional cardiac abnormalities, here we identify 705 lines with cardiac arrhythmia, myocardial hypertrophy and/or ventricular dilation. Among these 705 genes, 486 have not been previously associated with cardiac dysfunction in humans, and some of them represent variants of unknown relevance (VUR). Mice with mutations in Casz1, Dnajc18, Pde4dip, Rnf38 or Tmem161b genes show developmental cardiac structural abnormalities, with their human orthologs being categorized as VUR. Using UK Biobank data, we validate the importance of the DNAJC18 gene for cardiac homeostasis by showing that its loss of function is associated with altered left ventricular systolic function. Our results identify hundreds of previously unappreciated genes with potential function in congenital heart disease and suggest causal function of five VUR in congenital heart disease
Extensive Identification of Genes Involved in Congenital and Structural Heart Disorders and Cardiomyopathy
Clinical presentation of congenital heart disease is heterogeneous, making identification of the disease-causing genes and their genetic pathways and mechanisms of action challenging. By using in vivo electrocardiography, transthoracic echocardiography and microcomputed tomography imaging to screen 3,894 single-gene-null mouse lines for structural and functional cardiac abnormalities, here we identify 705 lines with cardiac arrhythmia, myocardial hypertrophy and/or ventricular dilation. Among these 705 genes, 486 have not been previously associated with cardiac dysfunction in humans, and some of them represent variants of unknown relevance (VUR). Mice with mutations in Casz1, Dnajc18, Pde4dip, Rnf38 or Tmem161b genes show developmental cardiac structural abnormalities, with their human orthologs being categorized as VUR. Using UK Biobank data, we validate the importance of the DNAJC18 gene for cardiac homeostasis by showing that its loss of function is associated with altered left ventricular systolic function. Our results identify hundreds of previously unappreciated genes with potential function in congenital heart disease and suggest causal function of five VUR in congenital heart disease
COVID19 Disease Map, a computational knowledge repository of virus-host interaction mechanisms.
Funder: Bundesministerium für Bildung und ForschungFunder: Bundesministerium für Bildung und Forschung (BMBF)We need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 molecular mechanisms. The COVID-19 Disease Map (C19DMap) is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. Notably, it is a computational resource for graph-based analyses and disease modelling. To this end, we established a framework of tools, platforms and guidelines necessary for a multifaceted community of biocurators, domain experts, bioinformaticians and computational biologists. The diagrams of the C19DMap, curated from the literature, are integrated with relevant interaction and text mining databases. We demonstrate the application of network analysis and modelling approaches by concrete examples to highlight new testable hypotheses. This framework helps to find signatures of SARS-CoV-2 predisposition, treatment response or prioritisation of drug candidates. Such an approach may help deal with new waves of COVID-19 or similar pandemics in the long-term perspective
