5,146 research outputs found

    From XML to XML: The why and how of making the biodiversity literature accessible to researchers

    Get PDF
    We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These follow our work on automating the markup of scanned copies of the biodiversity literature, for the purpose of supporting working taxonomists. We consider an enhanced TEI XML markup language, which is used as an intermediate stage in translating from the initial XML obtained from Optical Character Recognition to the target taXMLit. The intermediate representation allows additional information from external sources such as a taxonomic thesaurus to be incorporated before the final translation into taXMLit

    Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library

    Get PDF
    Background: The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. Description: A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article locating service is exposed as a standard OpenURL resolver on the BioStor web site http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. Conclusions: BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from http://biostor.org

    From Pixels and Minds to the Mathematical Knowledge in a Digital Library

    Get PDF
    summary:Experience in setting up a workflow from scanned images of mathematical papers into a fully fledged mathematical library is described on the example of the project Czech Digital Mathematics Library DML-CZ. An overview of the whole process is given, with description of all main production steps. DML-CZ has recently been launched to public with more than 100,000 digitized pages

    The Wiltshire Wills Feasibility Study

    Get PDF
    The Wiltshire and Swindon Record Office has nearly ninety thousand wills in its care. These records are neither adequately catalogued nor secured against loss by facsimile microfilm copies. With support from the Heritage Lottery Fund the Record Office has begun to produce suitable finding aids for the material. Beginning with this feasibility study the Record Office is developing a strategy to ensure the that facsimiles to protect the collection against risk of loss or damage and to improve public access are created.<p></p> This feasibility study explores the different methodologies that can be used to assist the preservation and conservation of the collection and improve public access to it. The study aims to produce a strategy that will enable the Record Office to create digital facsimiles of the Wills in its care for access purposes and to also create preservation quality microfilms. The strategy aims to seek the most cost effective and time efficient approach to the problem and identifies ways to optimise the processes by drawing on the experience of other similar projects. This report provides a set of guidelines and recommendations to ensure the best use of the resources available for to provide the most robust preservation strategy and to ensure that future access to the Wills as an information resource can be flexible, both local and remote, and sustainable

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF

    HathiTrust Research Center: Computational Research on the HathiTrust Repository

    Get PDF
    PIs (exec mgt team): Beth A. Plale, Indiana University; Marshall Scott Poole, University of Illinois Urbana-Champaign ; Robert McDonald, IU; John Unsworth (UIUC) Senior investigators: Loretta Auvil (UIUC); Johan Bollen (IU), Randy Butler (UIUC); Dennis Cromwell (IU), Geoffrey Fox (IU), Eileen Julien (IU), Stacy Kowalczyk (IU); Danny Powell (UIUC); Beth Sandore (UIUC); Craig Stewart (IU); John Towns (UIUC); Carolyn Walters (IU), Michael Welge (UIUC); Eric Wernert (IU

    Informatics and data mining tools and strategies for the Human Connectome Project

    Get PDF
    The Human Connectome Project (HCP) is a major endeavor that will acquire and analyze connectivity data plus other neuroimaging, behavioral, and genetic data from 1,200 healthy adults. It will serve as a key resource for the neuroscience research community, enabling discoveries of how the brain is wired and how it functions in different individuals. To fulfill its potential, the HCP consortium is developing an informatics platform that will handle: 1) storage of primary and processed data, 2) systematic processing and analysis of the data, 3) open access data sharing, and 4) mining and exploration of the data. This informatics platform will include two primary components. ConnectomeDB will provide database services for storing and distributing the data, as well as data analysis pipelines. Connectome Workbench will provide visualization and exploration capabilities. The platform will be based on standard data formats and provide an open set of application programming interfaces (APIs) that will facilitate broad utilization of the data and integration of HCP services into a variety of external applications. Primary and processed data generated by the HCP will be openly shared with the scientific community, and the informatics platform will be available under an open source license. This paper describes the HCP informatics platform as currently envisioned and places it into the context of the overall HCP vision and agenda

    DML and RusDML – Virtual Library Initiatives for Covering All Mathematics Electronically

    Get PDF
    With the rapidly growing activities in electronic publishing ideas came up to install global repositories which deal with three mainstreams in this enterprise: storing the electronic material currently available, pursuing projects to solve the archiving problem for this material with the ambition to preserve the content in readable form for future generations, and to capture the printed literature in digital versions providing good access and search facilities for the readers. Long-term availability of published research articles in mathematics and easy access to them is a strong need for researchers working with mathematics. Hence in this domain some pioneering projects have been established addressing the above mentioned problems

    Keeping Research Data Safe 2: Final Report

    Get PDF
    The first Keeping Research Data Safe study funded by JISC made a major contribution to understanding of long-term preservation costs for research data by developing a cost model and indentifying cost variables for preserving research data in UK universities (Beagrie et al, 2008). However it was completed over a very constrained timescale of four months with little opportunity to follow up other major issues or sources of preservation cost information it identified. It noted that digital preservation costs are notoriously difficult to address in part because of the absence of good case studies and longitudinal information for digital preservation costs or cost variables. In January 2009 JISC issued an ITT for a study on the identification of long-lived digital datasets for the purposes of cost analysis. The aim of this work was to provide a larger body of material and evidence against which existing and future data preservation cost modelling exercises could be tested and validated. The proposal for the KRDS2 study was submitted in response by a consortium consisting of 4 partners involved in the original Keeping Research Data Safe study (Universities of Cambridge and Southampton, Charles Beagrie Ltd, and OCLC Research) and 4 new partners with significant data collections and interests in preservation costs (Archaeology Data Service, University of London Computer Centre, University of Oxford, and the UK Data Archive). A range of supplementary materials in support of this main report have been made available on the KRDS2 project website at http://www.beagrie.com/jisc.php. That website will be maintained and continuously updated with future work as a resource for KRDS users
    • …
    corecore