112 research outputs found

    DDBJ dealing with mass data produced by the second generation sequencer

    Get PDF
    DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) collected and released 2 368 110 entries or 1 415 106 598 bases in the period from July 2007 to June 2008. The releases in this period include genome scale data of Bombyx mori, Oryzas latipes, Drosophila and Lotus japonicus. In addition, from this year we collected and released trace archive data in collaboration with National Center for Biotechnology Information (NCBI). The first release contains those of O. latipes and bacterial meta genomes in human gut. To cope with the current progress of sequencing technology, we also accepted and released more than 100 million of short reads of parasitic protozoa and their hosts that were produced by using a Solexa sequencer

    Archiving next generation sequencing data

    Get PDF
    Next generation sequencing platforms are producing biological sequencing data in unprecedented amounts. The partners of the International Nucleotide Sequencing Database Collaboration, which includes the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ), have established the Sequence Read Archive (SRA) to provide the scientific community with an archival destination for next generation data sets. The SRA is now accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://www.ddbj.nig.ac.jp/sub/trace_sra-e.html from DDBJ. Users of these resources can obtain data sets deposited in any of the three SRA instances. Links and submission instructions are provided

    DDBJ launches a new archive database with analytical tools for next-generation sequence data

    Get PDF
    The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has collected and released 1 701 110 entries/1 116 138 614 bases between July 2008 and June 2009. A few highlighted data releases from DDBJ were the complete genome sequence of an endosymbiont within protist cells in the termite gut and Cap Analysis Gene Expression tags for human and mouse deposited from the Functional Annotation of the Mammalian cDNA consortium. In this period, we started a novel user announcement service using Really Simple Syndication (RSS) to deliver a list of data released from DDBJ on a daily basis. Comprehensive visualization of a DDBJ release data was attempted by using a word cloud program. Moreover, a new archive for sequencing data from next-generation sequencers, the ‘DDBJ Read Archive’ (DRA), was launched. Concurrently, for read data registered in DRA, a semi-automatic annotation tool called the ‘DDBJ Read Annotation Pipeline’ was released as a preliminary step. The pipeline consists of two parts: basic analysis for reference genome mapping and de novo assembly and high-level analysis of structural and functional annotations. These new services will aid users’ research and provide easier access to DDBJ databases

    Improvements to services at the European Nucleotide Archive

    Get PDF
    The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL–EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services

    Hyperlink Management System and ID Converter System: enabling maintenance-free hyperlinks among major biological databases

    Get PDF
    Hyperlink Management System (HMS) is a system for automatically updating and maintaining hyperlinks among major public databases in the field of life science. We daily create corresponding tables of data IDs of major databases for human genes and proteins, and provide a CGI-program that returns correct and up-to-date URLs for showing data of various databases that correspond to user-specified IDs. The HMS can deal with various IDs: accession numbers of International Nucleotide Sequence Databases, HUGO Gene Symbols and IDs of UniProt, PDB, H-InvDB and others, and it can return URLs of various databases: H-InvDB, HUGO Gene Nomenclature Committee Database, NCBI Entrez Gene, UniProt, PDB and others. For example, 23 297 pages of Locus view of H-InvDB are reachable by using HUGO Gene Symbols through the HMS. Not only the CGI-program, the HMS provides a Web page for finding and opening URLs of these databases. Although hyperlinking is an effective way of relating biological data among different databases, updating hyperlinks has been a laborious work. The HMS fully automates the job, enabling maintenance-free hyperlinks. We also developed the ID Converter System (ICS) for simply converting data IDs by using corresponding tables in the HMS. The HMS and ICS are freely available at http://biodb.jp/

    RESOPS: A Database for Analyzing the Correspondence of RNA Editing Sites to Protein Three-Dimensional Structures

    Get PDF
    Transcripts from mitochondrial and chloroplast DNA of land plants often undergo cytidine to uridine conversion-type RNA editing events. RESOPS is a newly built database that specializes in displaying RNA editing sites of land plant organelles on protein three-dimensional (3D) structures to help elucidate the mechanisms of RNA editing for gene expression regulation. RESOPS contains the following information: unedited and edited cDNA sequences with notes for the target nucleotides of RNA editing, conceptual translation from the edited cDNA sequence in pseudo-UniProt format, a list of proteins under the influence of RNA editing, multiple amino acid sequence alignments of edited proteins, the location of amino acid residues coded by codons under the influence of RNA editing in protein 3D structures and the statistics of biased distributions of the edited residues with respect to protein structures. Most of the data processing procedures are automated; hence, it is easy to keep abreast of updated genome and protein 3D structural data. In the RESOPS database, we clarified that the locations of residues switched by RNA editing are significantly biased to protein structural cores. The integration of different types of data in the database also help advance the understanding of RNA editing mechanisms. RESOPS is accessible at http://cib.cf.ocha.ac.jp/RNAEDITING/

    The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*

    Get PDF
    Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies

    NBRP databases: databases of biological resources in Japan

    Get PDF
    The National BioResource Project (NBRP) is a Japanese project that aims to establish a system for collecting, preserving and providing bioresources for use as experimental materials for life science research. It is promoted by 27 core resource facilities, each concerned with a particular group of organisms, and by one information center. The NBRP database is a product of this project. Thirty databases and an integrated database-retrieval system (BioResource World: BRW) have been created and made available through the NBRP home page (http://www.nbrp.jp). The 30 independent databases have individual features which directly reflect the data maintained by each resource facility. The BRW is designed for users who need to search across several resources without moving from one database to another. BRW provides access to a collection of 4.5-million records on bioresources including wild species, inbred lines, mutants, genetically engineered lines, DNA clones and so on. BRW supports summary browsing, keyword searching, and searching by DNA sequences or gene ontology. The results of searches provide links to online requests for distribution of research materials. A circulation system allows users to submit details of papers published on research conducted using NBRP resources

    K-meeridel põhinevad meetodid bakterite ja plasmiidide tuvastamiseks

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneMikroorganismid on Maad asustanud juba miljardeid aastaid ning neid leidub peaaegu kõikjal. Isegi meie oleme nendega lahutamatult seotud – baktereid elab nii meie nahal kui ka soolestikus. Osad bakteritest võivad aga olla patogeensed ja põhjustada haigusi. Näiteks oli keskajal suure hulga elanikkonnast tapnud Musta Surma põhjustajaks katkubakter Yersinia pestis. Tänapäeval aitavad meid bakterite vastu antibiootikumid, kuid järjest suurem probleem on antibiootikumiresistentsuse laialdane levik. Sellele aitavad kaasa plasmiidid – bakterites olevad DNA järjestused, mis on bakteri enda kromosoomist eraldiseisvad ning mida bakterid võivad kiirelt üksteisele edasi anda. Käesoleva doktoritöö eesmärgiks oli luua bakterite ja plasmiidide tuvastamiseks meetodid, mis võimaldaksid töötada sekveneerimiskeskuste poolt toodetud toorandmetega. Ülesande lahendamiseks otsustasime kasutada k-meeridel põhinevat analüüsi. K-meer tähistab lühikest DNA juppi pikkusega k nukleotiidi. Pikema DNA järjestuse, näiteks bakterigenoomi, saab jagada lühemateks k-meerideks ning vaadelda seda kui k-meeride kogumit. Sellise lähenemise eeliseks on sõltumatus lugemi pikkusest – kõik lugemid sisaldavad k-meere ning analüüsides k-meeride hulki, on võimalik määrata algse proovi koostis. StrainSeeker on meie töögrupis loodud programm bakteritüvede määramiseks. Me arendasime välja uudse algoritmi, mis näitab proovis esineva bakteri eeldatavat asukohta kasutaja poolt ette antaval fülogeneetilisel puul. Lõime ka visuaalse kasutajaliidesega veebiserveri. Plasmiidide tuvastamiseks eeldasime, et plasmiidide arv bakteri rakus on tavaliselt suurem bakteri kromosoomi omast, seega võiks ka plasmiidi k-meeride keskmine esinemissagedus olla suurem kui bakteri kromosoomi k-meeride puhul. Me testisime oma programmi, mis sai nimeks PlasmidSeeker, nii simuleeritud kui ka reaalsete bakteri täisgenoomi sekveneerimisandmestikega, millede puhul oli teada proovide tegelik koostis. PlasmidSeeker leidis üles kõik proovides olnud plasmiidid ning määras täpselt ka nende koopiaarvu. Kokkuvõttes oleme oma tööga andnud panuse arvutuslikku mikrobioloogiasse, luues uued võimalused bakteriaalsete proovide analüüsiks.Microbes have roamed Earth for billions of years and can be found almost anywhere. They are present even on our skin and in our gut. However, some bacteria can be pathogenic and cause diseases. For instance, the Black Death, which killed millions during the Middle Ages, was caused by the bacterium Yersinia pestis. Nowadays, antibiotics protect us against the bacterial threat, but a new problem is looming – widespread antibiotic resistance. This is partly facilitated by plasmids – DNA sequences which are separate from the bacterial chromosome and can be readily passed from one bacterium to the other. The general goal of this work was to develop methods for the identification of bacteria and plasmids from raw data produced by sequencing centers. We decided to use k-mer based analysis for this task. K-mer itself is simply a short stretch of DNA with a length of k nucleotides. A long DNA sequence, such as a bacterial genome, can be divided into shorter k-mers and analyzed as a whole. This has the advantage of not being limited by read length – any read contains k-mers and by analyzing these, we can identify the contents of the sample. StrainSeeker is a bacterial identification program developed by our group. We developed a novel algorithm that predicts the location of an isolated bacterium on the user-provided phylogenetic tree. Also, we created a web server with a visual interface for users with limited bioinformatics experience. For plasmid detection, we assumed that the plasmid copy number is usually higher compared to the bacterial chromosome. This means that the average frequency of plasmid k-mers should also be higher than the frequency of chromosomal k-mers. We named the program PlasmidSeeker and tested it with real and simulated bacterial whole genome sequencing samples, in which the real plasmid content was known. PlasmidSeeker detected all plasmids and accurately estimated their copy numbers. With our work, we have made a contribution to the field of computational microbiology and provided novel means for the analysis of bacterial samples
    corecore