14 research outputs found

    Distributions of number of and average number of accession numbers cited per article over time.

    No full text
    <p>The graphs show the number of (a) and the average number of (b) ENA, PDB and UniProt accession numbers cited per article according to publication year (in the OA-ePMC). Data from 2012 is excluded as it is not a complete year. In <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0063184#pone-0063184-g005" target="_blank">Figure 5(b)</a>, for a given year and database, average value is calculated by using articles containing accession number citations only. Text-mined results are used together with the publisher-annotated data to generate the graphs.</p

    Venn diagrams showing the spread of accession numbers supplied in the article XML and annotated by the Whatizit ANA pipeline.

    No full text
    <p>(a) ENA (total: 160,112) (b) PDB (total: 39,972) (c) UniProt (total: 9,430) (d) all (total: 209,519). The results show that text mining substantially increases the number of accession numbers identified.</p

    Extraction patterns and contextual cues for databases.

    No full text
    <p>Patterns are separated by the “;” sign.</p

    Database citations in articles relative to database size.

    No full text
    *<p>Total number of Annotations =  Publisher-annotated + text-mined.</p>**<p>This is the number of records in the curated component of UniProt.</p

    Comparison between article-to-database and database-to-citations.

    No full text
    <p>Venn diagrams show the overlapping article-to-database and database-to-article citations. (a) ENA (b) PDB (c) UniProt (d) all databases. Notable is that in the cases for ENA and PDB, database citations from the literature significantly enrich the database-literature crosslinks supplied from databases. For UniProt, the citations from the database to the literature dwarf the converse citations, mainly due to the fact that, for certain proteomes, many thousands of UniProt records can link to a single article. Text-mined results are used together with the publisher-annotated data to generate the venn diagrams.</p

    The reciprocal citation relationships between articles and database records.

    No full text
    <p>The reciprocal citation relationships between articles and database records.</p

    Distribution of number of articles according to years in the OA-ePMC set.

    No full text
    <p>This figure shows the distribution of articles by publication year in the OA-ePMC set. Note the apparent decrease in OA articles available in 2012 is due to an incomplete year (dataset was frozen for this study in June 2012).</p

    Accession numbers in supplementary files

    No full text
    <p>This dataset contains database accession numbers which are automatically extracted from the supplementary files linked to open access full text biomedical articles.</p

    Contributions and roles related to content as they correspond to identifier creation versus identifier reuse.

    No full text
    <p>The decision about whether to create a new identifier or reuse an existing one depends on the role you play in the creation, editing, and republishing of content; for certain roles (and when several roles apply) that decision is a judgement call. Asterisks convey cases in which the best course of action is often to correct/improve the original record in collaboration with the original source; the guidance about identifier creation versus reuse is meant to apply only when such collaboration is not practicable (and an alternate record is created). It is common that a given actor may have multiple roles along this spectrum; for instance, a given record in monarchinitiative.org may reflect a combination of (a) corrections Monarch staff made in collaboration with the original data source, (b) post-ingest curation by Monarch staff, (c) expanded content integrated from multiple sources.</p
    corecore