386 research outputs found

    Transmembrane Protein Oxygen Content and Compartmentalization of Cells

    Get PDF
    Recently, there was a report that explored the oxygen content of transmembrane proteins over macroevolutionary time scales where the authors observed a correlation between the geological time of appearance of compartmentalized cells with atmospheric oxygen concentration. The authors predicted, characterized and correlated the differences in the structure and composition of transmembrane proteins from the three kingdoms of life with atmospheric oxygen concentrations in geological timescale. They hypothesized that transmembrane proteins in ancient taxa were selectively excluding oxygen and as this constraint relaxed over time with increase in the levels of atmospheric oxygen the size and number of communication-related transmembrane proteins increased. In summary, they concluded that compartmentalized and non-compartmentalized cells can be distinguished by how oxygen is partitioned at the proteome level. They derived this conclusion from an analysis of 19 taxa. We extended their analysis on a larger sample of taxa comprising 309 eubacterial, 34 archaeal, and 30 eukaryotic complete proteomes and observed that one can not absolutely separate the two groups of cells based on partition of oxygen in their membrane proteins. In addition, the origin of compartmentalized cells is likely to have been driven by an innovation than happened 2700 million years ago in the membrane composition of cells that led to the evolution of endocytosis and exocytosis rather than due to the rise in concentration of atmospheric oxygen

    The chicken gene nomenclature committee report

    Get PDF
    Comparative genomics is an essential component of the post-genomic era. The chicken genome is the first avian genome to be sequenced and it will serve as a model for other avian species. Moreover, due to its unique evolutionary niche, the chicken genome can be used to understand evolution of functional elements and gene regulation in mammalian species. However comparative biology both within avian species and within amniotes is hampered due to the difficulty of recognising functional orthologs. This problem is compounded as different databases and sequence repositories proliferate and the names they assign to functional elements proliferate along with them. Currently, genes can be published under more than one name and one name sometimes refers to unrelated genes. Standardized gene nomenclature is necessary to facilitate communication between scientists and genomic resources. Moreover, it is important that this nomenclature be based on existing nomenclature efforts where possible to truly facilitate studies between different species. We report here the formation of the Chicken Gene Nomenclature Committee (CGNC), an international and centralized effort to provide standardized nomenclature for chicken genes. The CGNC works in conjunction with public resources such as NCBI and Ensembl and in consultation with existing nomenclature committees for human and mouse. The CGNC will develop standardized nomenclature in consultation with the research community and relies on the support of the research community to ensure that the nomenclature facilitates comparative and genomic studies

    AS-ALPS: a database for analyzing the effects of alternative splicing on protein structure, interaction and network in human and mouse

    Get PDF
    We have constructed a database, AS-ALPS (alternative splicing-induced alteration of protein structure), which provides information that would be useful for analyzing the effects of alternative splicing (AS) on protein structure, interactions with other bio-molecules and protein interaction networks in human and mouse. Several AS events have been revealed to contribute to the diversification of protein structure, which results in diversification of interaction partners or affinities, which in turn contributes to regulation of bio-molecular networks. Most AS variants, however, are only known at the sequence level. It is important to determine the effects of AS on protein structure and interaction, and to provide candidates for experimental targets that are relevant to network regulation by AS. For this purpose, the three-dimensional (3D) structures of proteins are valuable sources of information; however, these have not been fully exploited in any other AS-related databases. AS-ALPS is the only AS-related database that describes the spatial relationships between protein regions altered by AS (‘AS regions’) and both the proteins’ hydrophobic cores and sites of inter-molecular interactions. This information makes it possible to infer whether protein structural stability and/or protein interaction are affected by each AS event. AS-ALPS can be freely accessed at http://as-alps.nagahama-i-bio.ac.jp and http://genomenetwork.nig.ac.jp/as-alps/

    OmicBrowse: a Flash-based high-performance graphics interface for genomic resources

    Get PDF
    OmicBrowse is a genome browser designed as a scalable system for maintaining numerous genome annotation datasets. It is an open source tool capable of regulating multiple user data access to each dataset to allow multiple users to have their own integrative view of both their unpublished and published datasets, so that the maintenance costs related to supplying each collaborator exclusively with their own private data are significantly reduced. OmicBrowse supports DAS1 imports and exports of annotations to Internet site servers worldwide. We also provide a data-download named OmicDownload server that interactively selects datasets and filters the data on the selected datasets. Our OmicBrowse server has been freely available at http://omicspace.riken.jp/ since its launch in 2003. The OmicBrowse source code is downloadable from http://sourceforge.net/projects/omicbrowse/

    MAGIA, a web-based tool for miRNA and Genes Integrated Analysis

    Get PDF
    MAGIA (miRNA and genes integrated analysis) is a novel web tool for the integrative analysis of target predictions, miRNA and gene expression data. MAGIA is divided into two parts: the query section allows the user to retrieve and browse updated miRNA target predictions computed with a number of different algorithms (PITA, miRanda and Target Scan) and Boolean combinations thereof. The analysis section comprises a multistep procedure for (i) direct integration through different functional measures (parametric and non-parametric correlation indexes, a variational Bayesian model, mutual information and a meta-analysis approach based on P-value combination) of mRNA and miRNA expression data, (ii) construction of bipartite regulatory network of the best miRNA and mRNA putative interactions and (iii) retrieval of information available in several public databases of genes, miRNAs and diseases and via scientific literature text-mining. MAGIA is freely available for Academic users at http://gencomp.bio.unipd.it/magia

    The strength of co-authorship in gene name disambiguation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A biomedical entity mention in articles and other free texts is often ambiguous. For example, 13% of the gene names (aliases) might refer to more than one gene. The task of Gene Symbol Disambiguation (GSD) – a special case of Word Sense Disambiguation (WSD) – is to assign a unique gene identifier for all identified gene name aliases in biology-related articles. Supervised and unsupervised machine learning WSD techniques have been applied in the biomedical field with promising results. We examine here the utilisation potential of the fact – one of the special features of biological articles – that the authors of the documents are known through graph-based semi-supervised methods for the GSD task.</p> <p>Results</p> <p>Our key hypothesis is that a biologist refers to each particular gene by a fixed gene alias and this holds for the co-authors as well. To make use of the co-authorship information we decided to build the inverse co-author graph on MedLine abstracts. The nodes of the inverse co-author graph are articles and there is an edge between two nodes if and only if the two articles have a mutual author. We introduce here two methods using distances (based on the graph) of abstracts for the GSD task. We found that a disambiguation decision can be made in 85% of cases with an extremely high (99.5%) precision rate just by using information obtained from the inverse co-author graph. We incorporated the co-authorship information into two GSD systems in order to attain full coverage and in experiments our procedure achieved precision of 94.3%, 98.85%, 96.05% and 99.63% on the human, mouse, fly and yeast GSD evaluation sets, respectively.</p> <p>Conclusion</p> <p>Based on the promising results obtained so far we suggest that the co-authorship information and the circumstances of the articles' release (like the title of the journal, the year of publication) can be a crucial building block of any sophisticated similarity measure among biological articles and hence the methods introduced here should be useful for other biomedical natural language processing tasks (like organism or target disease detection) as well.</p

    Michigan molecular interactions r2: from interacting proteins to pathways

    Get PDF
    Molecular interaction data exists in a number of repositories, each with its own data format, molecule identifier and information coverage. Michigan molecular interactions (MiMI) assists scientists searching through this profusion of molecular interaction data. The original release of MiMI gathered data from well-known protein interaction databases, and deep merged this information while keeping track of provenance. Based on the feedback received from users, MiMI has been completely redesigned. This article describes the resulting MiMI Release 2 (MiMIr2). New functionality includes extension from proteins to genes and to pathways; identification of highlighted sentences in source publications; seamless two-way linkage with Cytoscape; query facilities based on MeSH/GO terms and other concepts; approximate graph matching to find relevant pathways; support for querying in bulk; and a user focus-group driven interface design. MiMI is part of the NIH's; National Center for Integrative Biomedical Informatics (NCIBI) and is publicly available at: http://mimi.ncibi.org

    EMAGE mouse embryo spatial gene expression database: 2010 update

    Get PDF
    EMAGE (http://www.emouseatlas.org/emage) is a freely available online database of in situ gene expression patterns in the developing mouse embryo. Gene expression domains from raw images are extracted and integrated spatially into a set of standard 3D virtual mouse embryos at different stages of development, which allows data interrogation by spatial methods. An anatomy ontology is also used to describe sites of expression, which allows data to be queried using text-based methods. Here, we describe recent enhancements to EMAGE including: the release of a completely re-designed website, which offers integration of many different search functions in HTML web pages, improved user feedback and the ability to find similar expression patterns at the click of a button; back-end refactoring from an object oriented to relational architecture, allowing associated SQL access; and the provision of further access by standard formatted URLs and a Java API. We have also increased data coverage by sourcing from a greater selection of journals and developed automated methods for spatial data annotation that are being applied to spatially incorporate the genome-wide (∼19 000 gene) ‘EURExpress’ dataset into EMAGE

    Automatic Assignment of EC Numbers

    Get PDF
    A wide range of research areas in molecular biology and medical biochemistry require a reliable enzyme classification system, e.g., drug design, metabolic network reconstruction and system biology. When research scientists in the above mentioned areas wish to unambiguously refer to an enzyme and its function, the EC number introduced by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) is used. However, each and every one of these applications is critically dependent upon the consistency and reliability of the underlying data for success. We have developed tools for the validation of the EC number classification scheme. In this paper, we present validated data of 3788 enzymatic reactions including 229 sub-subclasses of the EC classification system. Over 80% agreement was found between our assignment and the EC classification. For 61 (i.e., only 2.5%) reactions we found that their assignment was inconsistent with the rules of the nomenclature committee; they have to be transferred to other sub-subclasses. We demonstrate that our validation results can be used to initiate corrections and improvements to the EC number classification scheme

    Identifying hypothetical genetic influences on complex disease phenotypes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Statistical interactions between disease-associated loci of complex genetic diseases suggest that genes from these regions are involved in a common mechanism impacting, or impacted by, the disease. The computational problem we address is to discover relationships among genes from these interacting regions that may explain the observed statistical interaction and the role of these genes in the disease phenotype.</p> <p>Results</p> <p>We describe a heuristic algorithm for generating hypothetical gene relationships from loci associated with a complex disease phenotype. This approach, called Prioritizing Disease Genes by Analysis of Common Elements (PDG-ACE), mines biomedical keywords from text descriptions of genes and uses them to relate genes close to disease-associated loci. A keyword common to, and significantly over-represented in, a pair of gene descriptions may represent a preliminary hypothesis about the biological relationship between the genes, and suggest the role the genes play in the disease phenotype.</p> <p>Conclusion</p> <p>Our experimentation shows that the approach finds previously published relationships, while failing to find relationships that don't exist. The results also indicate that the approach is robust to differences in keyword vocabulary. We outline a brief case study in which results from a recently published Type 2 Diabetes association study are used to identify potential hypotheses.</p
    corecore