18 research outputs found

    Database resources of the National Center for Biotechnology Information

    Get PDF
    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov

    Hyperdimensional Analysis of Amino Acid Pair Distributions in Proteins

    Get PDF
    Our manuscript presents a novel approach to protein structure analyses. We have organized an 8-dimensional data cube with protein 3D-structural information from 8706 high-resolution non-redundant protein-chains with the aim of identifying packing rules at the amino acid pair level. The cube contains information about amino acid type, solvent accessibility, spatial and sequence distance, secondary structure and sequence length. We are able to pose structural queries to the data cube using program ProPack. The response is a 1, 2 or 3D graph. Whereas the response is of a statistical nature, the user can obtain an instant list of all PDB-structures where such pair is found. The user may select a particular structure, which is displayed highlighting the pair in question. The user may pose millions of different queries and for each one he will receive the answer in a few seconds. In order to demonstrate the capabilities of the data cube as well as the programs, we have selected well known structural features, disulphide bridges and salt bridges, where we illustrate how the queries are posed, and how answers are given. Motifs involving cysteines such as disulphide bridges, zinc-fingers and iron-sulfur clusters are clearly identified and differentiated. ProPack also reveals that whereas pairs of Lys residues virtually never appear in close spatial proximity, pairs of Arg are abundant and appear at close spatial distance, contrasting the belief that electrostatic repulsion would prevent this juxtaposition and that Arg-Lys is perceived as a conservative mutation. The presented programs can find and visualize novel packing preferences in proteins structures allowing the user to unravel correlations between pairs of amino acids. The new tools allow the user to view statistical information and visualize instantly the structures that underpin the statistical information, which is far from trivial with most other SW tools for protein structure analysis

    Database resources of the National Center for Biotechnology Information

    Get PDF
    In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov

    The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists

    Get PDF
    Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools

    Has classical gene position been practically reduced?

    Get PDF
    One of the defining features of the classical gene was its position (a band in the chromosome). In molecular genetics, positions are defined instead as nucleotide numbers and there is no clear correspondence with its classical counterpart. However, the classical gene position did not simply disappear with the development of the molecular approach, but survived in the lab associated to different genetic practices. The survival of classical gene position would illustrate Waters’ view about the practical persistence of the genetic approach beyond reductionism and anti-reductionist claims. We show instead that at the level of laboratory practices there are also reductive processes, operating through the rise and fall of different techniques. Molecular markers made the concept of classical gene position practically dispensable, leading us to rethink whether it had any causal role or was just a mere heuristi

    CysView: protein classification based on cysteine pairing patterns

    No full text
    CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine patterns for those records in the data with disulfide annotations. It allows the viewing of records grouped by connectivity patterns. CysView's utility as an analysis tool was demonstrated by the rapid and correct classification of scorpion toxin entries from GenPept on the basis of their disulfide pairing patterns. It has proved useful for rapid detection of irrelevant and partial records, or those with incomplete annotations. CysView can be used to support distant homology between proteins. CysView is publicly available at http://research.i2r.a-star.edu.sg/CysView/

    CysView: Protein classification based on cysteine pairing patterns. Nucleic Acids Res

    No full text
    CysView is a web-based application tool that identifies and classifies proteins according to their disulfide connectivity patterns. It accepts a dataset of annotated protein sequences in various formats and returns a graphical representation of cysteine pairing patterns. CysView displays cysteine patterns for those records in the data with disulfide annotations. It allows the viewing of records grouped by connectivity patterns. CysView’s utility as an analysis tool was demonstrated by the rapid and correct classification of scorpion toxin entries from GenPept on the basis of their disulfide pairing patterns. It has proved useful for rapid detection of irrelevant and partial records, or those with incomplete annotations. CysView can be used to support distant homology between proteins. CysView is publicly available a

    Protein disulfide topology determination through the fusion of mass spectrometric analysis and sequence-based prediction using Dempster-Shafer theory

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Disulfide bonds constitute one of the most important cross-linkages in proteins and significantly influence protein structure and function. At the state-of-the-art, various methodological frameworks have been proposed for identification of disulfide bonds. These include among others, mass spectrometry-based methods, sequence-based predictive approaches, as well as techniques like crystallography and NMR. Each of these frameworks has its advantages and disadvantages in terms of pre-requisites for applicability, throughput, and accuracy. Furthermore, the results from different methods may concur or conflict in parts.</p> <p>Results</p> <p>In this paper, we propose a novel and theoretically rigorous framework for disulfide bond determination based on information fusion from different methods using an extended formulation of Dempster-Shafer theory. A key advantage of our approach is that it can automatically deal with concurring as well as conflicting evidence in a data-driven manner. Using the proposed framework, we have developed a method for disulfide bond determination that combines results from sequence-based prediction and mass spectrometric inference. This method leads to more accurate disulfide bond determination than any of the constituent methods taken individually. Furthermore, experiments indicate that the method improves the accuracy of bond identification as compared to leading extant methods at the state-of-the-art. Finally, the proposed framework is extensible in that results from any number of approaches can be incorporated. Results obtained using this framework can especially be useful in cases where the complexity of the bonding patterns coupled with specificities of the fragmentation pattern or limitations of computational models impair any single method to perform consistently across a diverse set of molecules.</p

    HCA and HML isolated from the red marine algae Hypnea cervicornis and Hypnea musciformis define a novel lectin family

    No full text
    HCA and HML represent lectins isolated from the red marine algae Hypnea cervicornis and Hypnea musciformis, respectively. Hemagglutination inhibition assays suggest that HML binds GalNAc/Gal substituted with a neutral sugar through 1–3, 1–4, or 1–2 linkages in O-linked mucin-type glycans, and Fuc(α1–6)GlcNAc of N-linked glycoproteins. The specificity of HCA includes the epitopes recognized by HML, although the glycoproteins inhibited distinctly HML and HCA. The agglutinating activity of HCA was inhibited by GalNAc, highlighting the different fine sugar epitope-recognizing specificity of each algal lectin. The primary structures of HCA (9193±3 Da) and HML (9357±1 Da) were determined by Edman degradation and tandem mass spectrometry of the N-terminally blocked fragments. Both lectins consist of a mixture of a 90-residue polypeptide containing seven intrachain disulfide bonds and two disulfide-bonded subunits generated by cleavage at the bond T50–E51 (HCA) and R50–E51 (HML). The amino acid sequences of HCA and HML display 55% sequence identity (80% similarity) between themselves, but do not show discernible sequence and cysteine spacing pattern similarities with any other known protein structure, indicating that HCA and HML belong to a novel lectin family. Alignment of the amino acid sequence of the two lectins revealed the existence of internal domain duplication, with residues 1–47 and 48–90 corresponding to the N- and C-terminal domains, respectively. The six conserved cysteines in each domain may form three intrachain cysteine linkages, and the unique cysteine residues of the N-terminal (Cys46) and the C-terminal (Cys71) domains may form an intersubunit disulfide bond.This work was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Financiadora de Estudos e Projetos (FINEP), Fundação Cearense de Amparo à Pesquisa (FUNCAP), and grants CAPES/COFECUB 336/01, and BFU2004-01432/BMC from the Ministerio de Educación y Ciencia, Madrid (Spain). C.S.N. is the recipient of a fellowship from the Coordenação Aperfeiçoamento de Pessoal de Nivel Superior (CAPES). A.H.S. and B.S.C. are senior investigators of CNPq.Peer reviewe
    corecore