114 research outputs found

    The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999

    Get PDF
    SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Recent developments of the database include: cross-references to additional databases; a variety of new documentation files and improvements to TrEMBL, a computer annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except the CDS already included in SWISS-PROT. The URLs for SWISS-PROT on the WWW are: http://www.expasy.ch/sprot and http://www.ebi.ac.uk/spro

    CastorDB: a comprehensive knowledge base for Ricinus communis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Ricinus communis </it>is an industrially important non-edible oil seed crop, native to tropical and subtropical regions of the world. Although, <it>R. communis </it>genome was assembled in 4X draft by JCVI, and is predicted to contain 31,221 proteins, the function of most of the genes remains to be elucidated. A large amount of information of different aspects of the biology of <it>R. communis </it>is available, but most of the data are scattered one not easily accessible. Therefore a comprehensive resource on Castor, Castor DB, is required to facilitate research on this important plant.</p> <p>Findings</p> <p>CastorDB is a specialized and comprehensive database for the oil seed plant <it>R. communis</it>, integrating information from several diverse resources. CastorDB contains information on gene and protein sequences, gene expression and gene ontology annotation of protein sequences obtained from a variety of repositories, as primary data. In addition, computational analysis was used to predict cellular localization, domains, pathways, protein-protein interactions, sumoylation sites and biochemical properties and has been included as derived data. This database has an intuitive user interface that prompts the user to explore various possible information resources available on a given gene or a protein.</p> <p>Conclusion</p> <p>CastorDB provides a user friendly comprehensive resource on castor with particular emphasis on its genome, transcriptome, and proteome and on protein domains, pathways, protein localization, presence of sumoylation sites, expression data and protein interacting partners.</p

    The Molecular Biology Database Collection: 2007 update

    Get PDF
    The NAR online Molecular Biology Database Collection is a public resource that contains links to the databases described in this issue of Nucleic Acids Research, previous NAR database issues, as well as a selection of other molecular biology databases that are freely available on the web and might be useful to the molecular biologist. The 2007 update includes 968 databases, 110 more than the previous one. Many databases that have been described in earlier issues of NAR come with updated summaries, which reflect recent progress and, in some instances, an expanded scope of these databases. The complete database list and summaries are available online on the Nucleic Acids Research web site

    Beyond XML Query Languages

    Get PDF
    A query language is essential, if XML is to serve effectively as an exchange medium for large data sets. The design of query languages for XML is in its infancy, and the choice of a standard may be governed more by user acceptance than by any understanding of underlying principles. One would hope that expressive power, performance, and compatibility with other languages will be considered in choosing among alternatives, but it is likely that several contenders will co-exist for some time. It is worth observing that, during the 20-year development of relational query languages, several competing languages were developed; and even today there are several relational query language standards. In spite of this, a great deal of technology was developed that was independent of the surface syntax of a query language. This included technology below the language such as efficient execution models and work above the level of language - such as techniques for view definition and maintenance, triggers, etc. At Penn we are working on some of these language-independent issues. We include a summary of them here. They include execution and data models to support XML and semistructured query languages; the use of schemas and constraints in optimizing XML query languages; and tools for extracting data form existing sources and presenting it as XML

    Replication and update of molecular biology databases in a grid environment

    Get PDF
    PCSV, présenté par V. Breton, à paraître dans les proceedingsUpdate of molecular biology databases is a growing burden on the biomedical research community. As the grid allows to share and replicate data, we propose a service to automatically update the biology databases from a single changing reference using web services. In this paper we report the components, the architecture and the deployment of the update service on the french RUGBI grid infrastructure. RUGBI is a computing grid infrastructure based on existing middleware and technologies for the community of scientists in bioinformatics

    Neighborhood-Based Label Propagation in Large Protein Graphs

    Get PDF
    International audienceUnderstanding protein function is one of the keys to understanding life at the molecular level. It is also important in several scenarios including human disease and drug discovery. In this age of rapid and affordable biological sequencing, the number of sequences accumulating in databases is rising with an increasing rate. This presents many challenges for biologists and computer scientists alike. In order to make sense of this huge quantity of data, these sequences should be annotated with functional properties. UniProtKB consists of two components: i) the UniProtKB/Swiss-Prot database containing protein sequences with reliable information manually reviewed by expert bio-curators and ii) the UniProtKB/TrEMBL database that is used for storing and processing the unknown sequences. Hence, for all proteins we have available the sequence along with few more information such as the taxon and some structural domains. Pairwise similarity can be defined and computed on proteins based on such attributes. Other important attributes, while present for proteins in Swiss-Prot, are often missing for proteins in TrEMBL, such as their function and cellular localization. The enormous number of protein sequences now in TrEMBL calls for rapid procedures to annotate them automatically. In this work, we present DistNBLP, a novel Distributed Neighborhood-Based Label Propagation approach for large-scale annotation of proteins. To do this, the functional annotations of reviewed proteins are used to predict those of non-reviewed proteins using label propagation on a graph representation of the protein database. DistNBLP is built on top of the "akka" toolkit for building resilient distributed message-driven applications

    Protein ontology: Vocabulary for protein data

    Get PDF
    These Huge amounts of Protein Structure Data make it difficult to create explanatory and predictive models that are consistent with huge volume of data. Difficulty increase when large variety of heterogeneous approaches gathers data from multiple perspectives. In order to facilitate computational processing data, it is especially critical to develop standardized structured data representation model formats for proteomics data. In this paper we describe a Protein Ontology Model for integrating protein databases and deduce a structured vocabulary for understanding process of protein synthesis completely. Proposed Protein Ontology Model provides biologists and scientists with a description of sequence, structure and functions of protein and also provides interpretation of various factors on final protein structure conformation. The Structured Vocabulary for Protein Data, describing Protein Ontology is composed of various Type Definitions for Protein Entry Details, Sequence and Structural Information of Proteins, Structural Domain Family of Protein, Cellular Function of Protein, Chemical Bonds present in the Protein, and External Constraints deciding final protein conformation. The Proposed Ontology Model will provide easier ways to predict and understand proteins

    Finding the Core-Genes of Chloroplasts

    Full text link
    Due to the recent evolution of sequencing techniques, the number of available genomes is rising steadily, leading to the possibility to make large scale genomic comparison between sets of close species. An interesting question to answer is: what is the common functionality genes of a collection of species, or conversely, to determine what is specific to a given species when compared to other ones belonging in the same genus, family, etc. Investigating such problem means to find both core and pan genomes of a collection of species, \textit{i.e.}, genes in common to all the species vs. the set of all genes in all species under consideration. However, obtaining trustworthy core and pan genomes is not an easy task, leading to a large amount of computation, and requiring a rigorous methodology. Surprisingly, as far as we know, this methodology in finding core and pan genomes has not really been deeply investigated. This research work tries to fill this gap by focusing only on chloroplastic genomes, whose reasonable sizes allow a deep study. To achieve this goal, a collection of 99 chloroplasts are considered in this article. Two methodologies have been investigated, respectively based on sequence similarities and genes names taken from annotation tools. The obtained results will finally be evaluated in terms of biological relevance

    Small Cofactors May Assist Protein Emergence from RNA World: Clues from RNA-Protein Complexes

    Get PDF
    It is now widely accepted that at an early stage in the evolution of life an RNA world arose, in which RNAs both served as the genetic material and catalyzed diverse biochemical reactions. Then, proteins have gradually replaced RNAs because of their superior catalytic properties in catalysis over time. Therefore, it is important to investigate how primitive functional proteins emerged from RNA world, which can shed light on the evolutionary pathway of life from RNA world to the modern world. In this work, we proposed that the emergence of most primitive functional proteins are assisted by the early primitive nucleotide cofactors, while only a minority are induced directly by RNAs based on the analysis of RNA-protein complexes. Furthermore, the present findings have significant implication for exploring the composition of primitive RNA, i.e., adenine base as principal building blocks

    Ontology algebra for composition of protein data sources

    Get PDF
    These Huge amounts of Protein Structure Data make it difficult to create explanatory and predictive models that are consistent with huge volume of data. Difficulty increase when large variety of heterogeneous approaches gathers data from multiple perspectives. In order to facilitate computational processing data, it is especially critical to develop standardized structured data representation model formats for proteomics data. In this paper we describe a Protein Ontology Model for integrating protein databases and deduce a structured vocabulary for understanding process of protein synthesis completely. Proposed Protein Ontology Model provides biologists and scientists with a description of sequence, structure and functions of protein and also provides interpretation of various factors on final protein structure conformation. The Structured Vocabulary for Protein Data, describing Protein Ontology is composed of various Type Definitions for Protein Entry Details, Sequence and Structural Information of Proteins, Structural Domain Family of Protein, Cellular Function of Protein, Chemical Bonds present in the Protein, and External Constraints deciding final protein conformation. The Proposed Ontology Model will provide easier ways to predict and understand proteins
    • …
    corecore