118 research outputs found

    G-InforBIO: integrated system for microbial genomics

    Get PDF
    BACKGROUND: Genome databases contain diverse kinds of information, including gene annotations and nucleotide and amino acid sequences. It is not easy to integrate such information for genomic study. There are few tools for integrated analyses of genomic data, therefore, we developed software that enables users to handle, manipulate, and analyze genome data with a variety of sequence analysis programs. RESULTS: The G-InforBIO system is a novel tool for genome data management and sequence analysis. The system can import genome data encoded as eXtensible Markup Language documents as formatted text documents, including annotations and sequences, from DNA Data Bank of Japan and GenBank encoded as flat files. The genome database is constructed automatically after importing, and the database can be exported as documents formatted with eXtensible Markup Language or tab-deliminated text. Users can retrieve data from the database by keyword searches, edit annotation data of genes, and process data with G-InforBIO. In addition, information in the G-InforBIO database can be analyzed seamlessly with nine different software programs, including programs for clustering and homology analyses. CONCLUSION: The G-InforBIO system simplifies genome analyses by integrating several available software programs to allow efficient handling and manipulation of genome data. G-InforBIO is freely available from the download site

    Specialized microbial databases for inductive exploration of microbial genome sequences

    Get PDF
    BACKGROUND: The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. METHODS: The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. RESULTS: Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. CONCLUSION: This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison

    Evaluation of DNA barcode libraries used in the UK and developing an action plan to fill priority gaps

    Get PDF
    There are approximately 76,000 eukaryote species recognised in the UK, and while we know some of them in great detail, the majority of these species are poorly known, and hundreds of new species are discovered each year. DNA barcoding uses a short, standardised segment of an organism’s genome for identification by comparison to a reference library; however, the UK lags behind several countries in Europe and North America in that we lack trusted, reliable and openly accessible reference sequences for key UK taxa. This report is the first step in rectifying this by engaging diverse stakeholders to facilitate collaboration and coordination; providing robust stakeholder-based and independent assessment of the current state of reference libraries available for all known UK taxa; and prioritising key taxa. A survey was developed and shared with the UK research and end user community, receiving 80 responses from a wide range of stakeholders and covering the focal taxa / assemblages and habitats; the DNA reference libraries in use, their quality assurance and perceived coverage. A formal gap analysis of the public DNA data in major DNA reference libraries highlighted that an estimated 52% of UK species have publicly available DNA data of some sort; however, coverage in gene specific reference libraries varies greatly (eg 2 – 52%), as does the associated quality assurance. Priority taxa highlighted by end users had coverage in reference libraries ranging from almost complete, in the case of known invasive non-native species, to significant coverage (71%) for taxa with conservation designations. However, these data also vary by kingdom and reference library, as does the associated quality assurance. If taking a strict requirement of DNA data provided by UK specimens and held in UK repositories, for robust QC and QA, then the proportion of UK species with public DNA data in reference libraries falls to less than 4% in the largest reference library assessed (BOLD). While standard genes for DNA-based identification have essentially been established, more work is required to establish the priority taxa required for regulatory delivery in contrast to taxa that are surveyed in a non-regulatory framework. Several barriers to the development of barcode libraries were highlighted, the most relevant being sustained large scale funding, expertise, capacity, laboratory skills and equipment, quality control and assurance, collecting logistics (eg permits and access) and communication. Significant opportunities identified include a large network of interested experts, several organisations with significant delivery capabilities, current large-scale projects and funding opportunities, emerging technologies and the economy of scale for DNA sequencing. Following a stakeholder workshop, we have outlined a concise action plan to provide reliable, open access reference sequences, linked to open access vouchers, identified by known experts, to facilitate UK academic and regulatory aims.This report is published by Natural England under the Open Government Licence - OGLv3.0 for public sector information. You are encouraged to use, and reuse, information subject to certain conditions. For details of the licence visit Copyright. Natural England photographs are only available for non-commercial purposes. If any other information such as maps or data cannot be used commercially this will be made clear within the report. ISBN 978-1-78354-671-8 © Natural England and other parties 2020 © Trustees of the Natural History Museum, Londo

    The European Bioinformatics Institute's data resources: towards systems biology

    Get PDF
    Genomic and post-genomic biological research has provided fine-grain insights into the molecular processes of life, but also threatens to drown biomedical researchers in data. Moreover, as new high-throughput technologies are developed, the types of data that are gathered en masse are diversifying. The need to collect, store and curate all this information in ways that allow its efficient retrieval and exploitation is greater than ever. The European Bioinformatics Institute's (EBI's) databases and tools have evolved to meet the changing needs of molecular biologists: since we last wrote about our services in the 2003 issue of Nucleic Acids Research, we have launched new databases covering protein–protein interactions (IntAct), pathways (Reactome) and small molecules (ChEBI). Our existing core databases have continued to evolve to meet the changing needs of biomedical researchers, and we have developed new data-access tools that help biologists to move intuitively through the different data types, thereby helping them to put the parts together to understand biology at the systems level. The EBI's data resources are all available on our website at http://www.ebi.ac.uk

    Assembly and Automated Annotation of the \u3ci\u3eClostridium scatologenes\u3c/i\u3e Genome

    Get PDF
    Clostridium scatologenes is an anaerobic bacterium that demonstrates some unusual metabolic traits such as the production of 3-methyl indole. The availability of genome level sequencing has lent itself to the exploration and elucidation of unique metabolic pathways in other organisms such as Clostridium botulinum. The Clostridium scatologenes genome, with an estimated length 4.2 million bp, was sequenced by the Applied Biosystems Solid method and the Roche 454 pyrosequencing method. The resulting DNA sequences were combined and assembled into 8267 contigs with an average length of 1250 bp with the Newbler Assembler program. Comparision of published subunits of csd gene and assembled contigs identified that one contig contained all three subunits. In addition a gene with similarity to clostridium carboxidivorans butyrate kinase was found lined next to csd gene. An alignment of the contig and csdgene sequences identified three deletions in the contig within the 4066 bases of the alignment. This implies that there is about 0.07% error rate in the sequencing itself requiring more finishing. Even without finishing the genome assembly into single contig, contigs were annotated in RAST pipeline predicting 2521 protein encoding genes (PEGs). The PEGs were classified by their metabolic function and compared to classified PEGs found in the closely related clostridium species, Clostridium carboxidivorans and Clostridium. ljungdahlii, which have similarly sized genomes. According to the RAST analysis, Clostridium scatologenes had 35% subsystem coverage of all known metabolic processes with its 2521 PEGs. This compares to 41% for Clostridium carboxidivorans with 4174 PEGs (29) and 42% for Clostridium ljungdahlii with 4184 PEGs (30), indicating that Clostridium scatologenesmay still have more genes to be identified. Comparison of the percent genes found in the metabolic subsystems was similar except in motility and chemotaxis. The contigs, on which the csd gene and tryptophan metabolizing genes lay, were examined to see if additional genes might support these metabolic pathways. Butyrate kinase was associated with the csd genes but no other associations were found for the two tryptophan metabolizing genes. The tryptophan biosynthesis operon genes were all found on one contig (contig 6771) and were syntenic with other bacterial species

    Retrieval and Representation of Nucleotide Sequence of Saccharomyces cerevisiae Cystathionine Gamma-Lyase (CYS3) Gene in Five Formats

    Get PDF
    Educational programmes all over the world are facing increasing pressure to integrate information technology in the curriculum. Knowledge of bioinformatics is at infancy in Nigeria it is therefore imperative to develop and build the capacity for high-throughput determination and  computational analysis of the nucleotide base sequences of the genomes of organisms. The present communication navigated the ENTREZ Web page and downloaded sequences of Cystathionine gamma- lyase gene from Saccharomyces cerevisiae. The sequence is then represented in the five best known database formats namely Plain, FASTA, EMBL, GCG and Genebank thereby making it more visible and available for other research applications such as comparative genomic analysis, evolutionary studies, searching for and identification of regulatory elements and scanning for mutations. The present study highlights data retrieval and representation. Data retrieval is important as it provides the opportunity to engage in data mining for discovery, a convenient alternative to traditional wet  laboratories, providing biological insights, and proficiency to access and use the vast repository of computational and webbased resources which are the most available information in the world today.Keywords: Nucleotide, Database, Genome, GenBank
    • …
    corecore