93 research outputs found

    DDBJ working on evaluation and classification of bacterial genes in INSDC

    Get PDF
    DNA Data Bank of Japan (DDBJ) () newly collected and released 12 927 184 entries or 13 787 688 598 bases in the period from July 2005 to June 2006. The released data contain honeybee expressed sequence tags (ESTs), re-examined and re-annotated complete genome data of Escherichia coli K-12 W3110, medaka WGS and human MGA. We also systematically evaluated and classified the genes in the complete bacterial genomes submitted to the International Nucleotide Sequence Database Collaboration (INSDC, ) that is composed of DDBJ, EMBL Bank and GenBank. The examination and classification selected 557 000 genes as reliable ones among all the bacterial genes predicted by us

    GenBank

    Get PDF
    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage ()

    EMBL Nucleotide Sequence Database: developments in 2005

    Get PDF
    The EMBL Nucleotide Sequence Database () at the EMBL European Bioinformatics Institute, UK, offers a comprehensive set of publicly available nucleotide sequence and annotation, freely accessible to all. Maintained in collaboration with partners DDBJ and GenBank, coverage includes whole genome sequencing project data, directly submitted sequence, sequence recorded in support of patent applications and much more. The database continues to offer submission tools, data retrieval facilities and user support. In 2005, the volume of data offered has continued to grow exponentially. In addition to the newly presented data, the database encompasses a range of new data types generated by novel technologies, offers enhanced presentation and searchability of the data and has greater integration with other data resources offered at the EBI and elsewhere. In stride with these developing data types, the database has continued to develop submission and retrieval tools to maximise the information content of submitted data and to offer the simplest possible submission routes for data producers. New developments, the submission process, data retrieval and access to support are presented in this paper, along with links to sources of further information

    Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications

    Get PDF
    With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene–patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene–patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at ; the information is updated bimonthly

    Data Sharing: How Much Doesn't Get Submitted to GenBank?

    Get PDF
    Funding agencies and journals require researchers to deposit DNA sequences in public databases such as GenBank when the paper is published, but how often do authors fail to do so

    Evaluation of DNA barcode libraries used in the UK and developing an action plan to fill priority gaps

    Get PDF
    There are approximately 76,000 eukaryote species recognised in the UK, and while we know some of them in great detail, the majority of these species are poorly known, and hundreds of new species are discovered each year. DNA barcoding uses a short, standardised segment of an organism’s genome for identification by comparison to a reference library; however, the UK lags behind several countries in Europe and North America in that we lack trusted, reliable and openly accessible reference sequences for key UK taxa. This report is the first step in rectifying this by engaging diverse stakeholders to facilitate collaboration and coordination; providing robust stakeholder-based and independent assessment of the current state of reference libraries available for all known UK taxa; and prioritising key taxa. A survey was developed and shared with the UK research and end user community, receiving 80 responses from a wide range of stakeholders and covering the focal taxa / assemblages and habitats; the DNA reference libraries in use, their quality assurance and perceived coverage. A formal gap analysis of the public DNA data in major DNA reference libraries highlighted that an estimated 52% of UK species have publicly available DNA data of some sort; however, coverage in gene specific reference libraries varies greatly (eg 2 – 52%), as does the associated quality assurance. Priority taxa highlighted by end users had coverage in reference libraries ranging from almost complete, in the case of known invasive non-native species, to significant coverage (71%) for taxa with conservation designations. However, these data also vary by kingdom and reference library, as does the associated quality assurance. If taking a strict requirement of DNA data provided by UK specimens and held in UK repositories, for robust QC and QA, then the proportion of UK species with public DNA data in reference libraries falls to less than 4% in the largest reference library assessed (BOLD). While standard genes for DNA-based identification have essentially been established, more work is required to establish the priority taxa required for regulatory delivery in contrast to taxa that are surveyed in a non-regulatory framework. Several barriers to the development of barcode libraries were highlighted, the most relevant being sustained large scale funding, expertise, capacity, laboratory skills and equipment, quality control and assurance, collecting logistics (eg permits and access) and communication. Significant opportunities identified include a large network of interested experts, several organisations with significant delivery capabilities, current large-scale projects and funding opportunities, emerging technologies and the economy of scale for DNA sequencing. Following a stakeholder workshop, we have outlined a concise action plan to provide reliable, open access reference sequences, linked to open access vouchers, identified by known experts, to facilitate UK academic and regulatory aims.This report is published by Natural England under the Open Government Licence - OGLv3.0 for public sector information. You are encouraged to use, and reuse, information subject to certain conditions. For details of the licence visit Copyright. Natural England photographs are only available for non-commercial purposes. If any other information such as maps or data cannot be used commercially this will be made clear within the report. ISBN 978-1-78354-671-8 © Natural England and other parties 2020 © Trustees of the Natural History Museum, Londo

    The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    Get PDF
    The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence’ (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr
    corecore