32 research outputs found
Genome-wide subcellular localization of putative outer membrane and extracellular proteins in Leptospira interrogans serovar Lai genome using bioinformatics approaches
<p>Abstract</p> <p>Background</p> <p>In bacterial pathogens, both cell surface-exposed outer membrane proteins and proteins secreted into the extracellular environment play crucial roles in host-pathogen interaction and pathogenesis. Considerable efforts have been made to identify outer membrane (OM) and extracellular (EX) proteins produced by <it>Leptospira interrogans</it>, which may be used as novel targets for the development of infection markers and leptospirosis vaccines.</p> <p>Result</p> <p>In this study we used a novel computational framework based on combined prediction methods with deduction concept to identify putative OM and EX proteins encoded by the <it>Leptospira interrogans </it>genome. The framework consists of the following steps: (1) identifying proteins homologous to known proteins in subcellular localization databases derived from the "consensus vote" of computational predictions, (2) incorporating homology based search and structural information to enhance gene annotation and functional identification to infer the specific structural characters and localizations, and (3) developing a specific classifier for cytoplasmic proteins (CP) and cytoplasmic membrane proteins (CM) using Linear discriminant analysis (LDA). We have identified 114 putative EX and 63 putative OM proteins, of which 41% are conserved or hypothetical proteins containing sequence and/or protein folding structures similar to those of known EX and OM proteins.</p> <p>Conclusion</p> <p>Overall results derived from the combined computational analysis correlate with the available experimental evidence. This is the most extensive <it>in silico </it>protein subcellular localization identification to date for <it>Leptospira interrogans </it>serovar Lai genome that may be useful in protein annotation, discovery of novel genes and understanding the biology of Leptospira.</p
Strain Variation in the Transcriptome of the Dengue Fever Vector, Aedes aegypti
Studies of transcriptome dynamics provide a basis for understanding functional elements of the genome and the complexity of gene regulation. The dengue vector mosquito, Aedes aegypti, exhibits great adaptability to diverse ecological conditions, is phenotypically polymorphic, and shows variation in vectorial capacity to arboviruses. Previous genome sequencing showed richness in repetitive DNA and transposable elements that can contribute to genome plasticity. Population genetic studies revealed a varying degree of worldwide genetic polymorphism. However, the extent of functional genetic polymorphism across strains is unknown. The transcriptomes of three Ae. aegypti strains, Chetumal (CTM), Rexville D-Puerto Rico (Rex-D) and Liverpool (LVP), were compared. CTM is more susceptible than Rex- D to infection by dengue virus serotype 2. A total of 4188 transcripts exhibit either no or small variation (<2-fold) among sugar-fed samples of the three strains and between sugar- and blood-fed samples within each strain, corresponding most likely to genes encoding products necessary for vital functions. Transcripts enriched in blood-fed mosquitoes encode proteins associated with catalytic activities, molecular transport, metabolism of lipids, carbohydrates and amino acids, and functions related to blood digestion and the progression of the gonotropic cycle. Significant qualitative and quantitative differences were found in individual transcripts among strains including differential representation of paralogous gene products. The majority of immunity-associated transcripts decreased in accumulation after a bloodmeal and the results are discussed in relation to the different susceptibility of CTM and Rex-D mosquitoes to DENV2 infection
MycoBank gearing up for new horizons.
MycoBank, a registration system for fungi established in 2004 to capture all taxonomic novelties, acts as a coordination hub between repositories such as Index Fungorum and Fungal Names. Since January 2013, registration of fungal names is a mandatory requirement for valid publication under the International Code of Nomenclature for algae, fungi and plants (ICN). This review explains the database innovations that have been implemented over the past few years, and discusses new features such as advanced queries, registration of typification events (MBT numbers for lecto, epi- and neotypes), the multi-lingual database interface, the nomenclature discussion forum, annotation system, and web services with links to third parties. MycoBank has also introduced novel identification services, linking DNA sequence data to numerous related databases to enable intelligent search queries. Although MycoBank fills an important void for taxon registration, challenges for the future remain to improve links between taxonomic names and DNA data, and to also introduce a formal system for naming fungi known from DNA sequence data only. To further improve the quality of MycoBank data, remote access will now allow registered mycologists to act as MycoBank curators, using Citrix software
World Data Centre for Microorganisms: an information infrastructure to explore and utilize preserved microbial strains worldwide
The World Data Centre for Microorganisms (WDCM) was established 50 years ago as the data center of the World Federation for Culture Collections (WFCC) Microbial Resource Center (MIRCEN). WDCM aims to provide integrated information services using big data technology for microbial resource centers and microbiologists all over the world. Here, we provide an overview of WDCM including all of its integrated services. Culture Collections Information Worldwide (CCINFO) provides metadata information on 708 culture collections from 72 countries and regions. Global Catalogue of Microorganism (GCM) gathers strain catalogue information and provides a data retrieval, analysis, and visualization system of microbial resources. Currently, GCM includes more than 368,000 strains from 103 culture collections in 43 countries and regions. Analyzer of Bioresource Citation (ABC) is a data mining tool extracting strain related publications, patents, nucleotide sequences and genome information from public data sources to form a knowledge base. Reference Strain Catalogue (RSC) maintains a database of strains listed in International Standards Organization (ISO) and other international or regional standards. RSC allocates a unique identifier to strains recommended for use in diagnosis and quality control, and hence serves as a valuable cross-platform reference. WDCM provides free access to all these services at www.wdcm.org.National High Technology Research and Development Program of China [2014AA021501, 2014AA021503, 2015AA020108]; International S&T Cooperation Program of China (ISTCP) [2015DFG32550]; Bureau of Science & Technology for Development of Chinese Academy of Sciences (Strategic bio-resources information center) and Field Cloud Project of Chinese Academy of Sciences [XXH12503-05-01]. Funding for open access charge: National High Technology Research and Development Program of China [2014AA021501, 2014AA021503, 2015AA020108]; International S&T Cooperation Program of China (ISTCP) [2015DFG32550] ; Bureau of Science & Technology for Development of Chinese Academy of Sciences [Strategic bio-resources information center]; Field Cloud Project of Chinese Academy of Sciences [XXH12503-05-01]
d-Omix: a mixer of generic protein domain analysis tools
Domain combination provides important clues to the roles of protein domains in protein function, interaction and evolution. We have developed a web server d-Omix (a Mixer of Protein Domain Analysis Tools) aiming as a unified platform to analyze, compare and visualize protein data sets in various aspects of protein domain combinations. With InterProScan files for protein sets of interest provided by users, the server incorporates four services for domain analyses. First, it constructs protein phylogenetic tree based on a distance matrix calculated from protein domain architectures (DAs), allowing the comparison with a sequence-based tree. Second, it calculates and visualizes the versatility, abundance and co-presence of protein domains via a domain graph. Third, it compares the similarity of proteins based on DA alignment. Fourth, it builds a putative protein network derived from domain–domain interactions from DOMINE. Users may select a variety of input data files and flexibly choose domain search tools (e.g. hmmpfam, superfamily) for a specific analysis. Results from the d-Omix could be interactively explored and exported into various formats such as SVG, JPG, BMP and CSV. Users with only protein sequences could prepare an InterProScan file using a service provided by the server as well. The d-Omix web server is freely available at http://www.biotec.or.th/isl/Domix
Global catalogue of microorganisms (gcm): a comprehensive database and information retrieval, analysis, and visualization system for microbial resources
Abstract
Background
Throughout the long history of industrial and academic research, many microbes have been isolated, characterized and preserved (whenever possible) in culture collections. With the steady accumulation in observational data of biodiversity as well as microbial sequencing data, bio-resource centers have to function as data and information repositories to serve academia, industry, and regulators on behalf of and for the general public. Hence, the World Data Centre for Microorganisms (WDCM) started to take its responsibility for constructing an effective information environment that would promote and sustain microbial research data activities, and bridge the gaps currently present within and outside the microbiology communities.
Description
Strain catalogue information was collected from collections by online submission. We developed tools for automatic extraction of strain numbers and species names from various sources, including Genbank, Pubmed, and SwissProt. These new tools connect strain catalogue information with the corresponding nucleotide and protein sequences, as well as to genome sequence and references citing a particular strain. All information has been processed and compiled in order to create a comprehensive database of microbial resources, and was named Global Catalogue of Microorganisms (GCM). The current version of GCM contains information of over 273,933 strains, which includes 43,436bacterial, fungal and archaea species from 52 collections in 25 countries and regions.A number of online analysis and statistical tools have been integrated, together with advanced search functions, which should greatly facilitate the exploration of the content of GCM.
Conclusion
A comprehensive dynamic database of microbial resources has been created, which unveils the resources preserved in culture collections especially for those whose informatics infrastructures are still under development, which should foster cumulative research, facilitating the activities of microbiologists world-wide, who work in both public and industrial research centres. This database is available from http://gcm.wfcc.info.Peer Reviewe
gcType : a high-quality type strain genome database for microbial phylogenetic and functional research
Taxonomic and functional research of microorganisms has increasingly relied upon genome-based data and methods. As the depository of the Global Catalogue of Microorganisms (GCM) 10K prokaryotic type strain sequencing project, Global Catalogue of Type Strain (gcType) has published 1049 type strain genomes sequenced by the GCM 10K project which are preserved in global culture collections with a valid published status. Additionally, the information provided through gcType includes >12 000 publicly available type strain genome sequences from GenBank incorporated using quality control criteria and standard data annotation pipelines to form a high-quality reference database. This database integrates type strain sequences with their phenotypic information to facilitate phenotypic and genotypic analyses. Multiple formats of cross-genome searches and interactive interfaces have allowed extensive exploration of the database's resources. In this study, we describe web-based data analysis pipelines for genomic analyses and genome-based taxonomy, which could serve as a one-stop platform for the identification of prokaryotic species. The number of type strain genomes that are published will continue to increase as the GCM 10K project increases its collaboration with culture collections worldwide. Data of this project is shared with the International Nucleotide Sequence Database Collaboration. Access to gcType is free at http://gctype.wdcm.org/
Open Babel: An open chemical toolbox
Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendorneutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license fro