Search CORE

5,146 research outputs found

From XML to XML: The why and how of making the biodiversity literature accessible to researchers

Author: Dil Anton
King David
Lyal Chris
Morse David
Roberts David
Willis Alistair
Publication venue
Publication date: 01/01/2010
Field of study

We present the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These follow our work on automating the markup of scanned copies of the biodiversity literature, for the purpose of supporting working taxonomists. We consider an enhanced TEI XML markup language, which is used as an intermediate stage in translating from the initial XML obtained from Optical Character Recognition to the target taXMLit. The intermediate representation allows additional information from external sources such as a taxonomic thesaurus to be incorporated before the final translation into taXMLit

CiteSeerX

Open Research Online (The Open University)

Recommended from our members

Improving search in scanned documents: Looking for OCR mismatches

Author: Dil Anton
King David
Lyal Chris
Morse David
Roberts Dave
Willis Alistair
Publication venue
Publication date: 01/09/2009
Field of study

Open Research Online (The Open University)

Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library

Author: AC Koch
Anonymous
AP Raselimanana
CP Alexander
DG Feitelson
EJ van Nieukerken
EWL Holt
H Melville
IG Councill
International Commission on Zoological Nomenclature
JD Lynch
L von Ahn
LB Holthuis
NL Evenhuis
O Lambert
O Lambert
Q Wei
RD Cameron
RDM Page
RDM Page
RI Pocock
Roderic DM Page
S Lawrence
S Pilsk
TF Smith
V Henning
W Michaelsen
WE Schevill
WE Schevill
X Lu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. Description: A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article locating service is exposed as a standard OpenURL resolver on the BioStor web site http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. Conclusions: BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from http://biostor.org

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Enlighten

From Pixels and Minds to the Mathematical Knowledge in a Digital Library

Author: Rákosník Jiří
Sojka Petr
Publication venue: 'Masaryk University Press'
Publication date: 01/01/2008
Field of study

summary:Experience in setting up a workflow from scanned images of mathematical papers into a fully fledged mathematical library is described on the example of the project Czech Digital Mathematics Library DML-CZ. An overview of the whole process is given, with description of all main production steps. DML-CZ has recently been launched to public with more than 100,000 digitized pages

Institute of Mathematics AS CR, v. v. i.

The Wiltshire Wills Feasibility Study

Author: Gow Ann
Ross Seamus
Publication venue
Publication date: 22/05/2000
Field of study

The Wiltshire and Swindon Record Office has nearly ninety thousand wills in its care. These records are neither adequately catalogued nor secured against loss by facsimile microfilm copies. With support from the Heritage Lottery Fund the Record Office has begun to produce suitable finding aids for the material. Beginning with this feasibility study the Record Office is developing a strategy to ensure the that facsimiles to protect the collection against risk of loss or damage and to improve public access are created.<p></p> This feasibility study explores the different methodologies that can be used to assist the preservation and conservation of the collection and improve public access to it. The study aims to produce a strategy that will enable the Record Office to create digital facsimiles of the Wills in its care for access purposes and to also create preservation quality microfilms. The strategy aims to seek the most cost effective and time efficient approach to the problem and identifies ways to optimise the processes by drawing on the experience of other similar projects. This report provides a set of guidelines and recommendations to ensure the best use of the resources available for to provide the most robust preservation strategy and to ensure that future access to the Wills as an information resource can be flexible, both local and remote, and sustainable

Enlighten

A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

Author: Price S
Publication venue: Department of Computer Science, University of Bristol
Publication date: 01/01/2004
Field of study

Explore Bristol Research

HathiTrust Research Center: Computational Research on the HathiTrust Repository

Author: Plale Beth A. et al.
Publication venue
Publication date: 01/01/2012
Field of study

PIs (exec mgt team): Beth A. Plale, Indiana University; Marshall Scott Poole, University of Illinois Urbana-Champaign ; Robert McDonald, IU; John Unsworth (UIUC) Senior investigators: Loretta Auvil (UIUC); Johan Bollen (IU), Randy Butler (UIUC); Dennis Cromwell (IU), Geoffrey Fox (IU), Eileen Julien (IU), Stacy Kowalczyk (IU); Danny Powell (UIUC); Beth Sandore (UIUC); Craig Stewart (IU); John Towns (UIUC); Carolyn Walters (IU), Michael Welge (UIUC); Eric Wernert (IU

IUScholarWorks (University of Indiana)

Informatics and data mining tools and strategies for the Human Connectome Project

Author: Curtiss Sandra W.
Glasser Matthew F.
Harwell John
Hodge Michael
Jenkinson Mark
Laumann Timothy
Marcus Daniel S.
Olsen Timothy
Prior Fred
Van Essen David C.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2011
Field of study

The Human Connectome Project (HCP) is a major endeavor that will acquire and analyze connectivity data plus other neuroimaging, behavioral, and genetic data from 1,200 healthy adults. It will serve as a key resource for the neuroscience research community, enabling discoveries of how the brain is wired and how it functions in different individuals. To fulfill its potential, the HCP consortium is developing an informatics platform that will handle: 1) storage of primary and processed data, 2) systematic processing and analysis of the data, 3) open access data sharing, and 4) mining and exploration of the data. This informatics platform will include two primary components. ConnectomeDB will provide database services for storing and distributing the data, as well as data analysis pipelines. Connectome Workbench will provide visualization and exploration capabilities. The platform will be based on standard data formats and provide an open set of application programming interfaces (APIs) that will facilitate broad utilization of the data and integration of HCP services into a variety of external applications. Primary and processed data generated by the HCP will be openly shared with the scientific community, and the informatics platform will be available under an open source license. This paper describes the HCP informatics platform as currently envisioned and places it into the context of the overall HCP vision and agenda

Directory of Open Access Journals

Digital Commons@Becker

Frontiers - Publisher Connector

PubMed Central

DML and RusDML – Virtual Library Initiatives for Covering All Mathematics Electronically

Author: Wegner Bernd
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2004
Field of study

With the rapidly growing activities in electronic publishing ideas came up to install global repositories which deal with three mainstreams in this enterprise: storing the electronic material currently available, pursuing projects to solve the archiving problem for this material with the ambition to preserve the content in readable form for future generations, and to capture the printed literature in digital versions providing good access and search facilities for the readers. Long-term availability of published research articles in mathematics and easy access to them is a strong need for researchers working with mathematics. Hence in this domain some pioneering projects have been established addressing the above mentioned problems

Bulgarian Digital Mathematics Library at IMI-BAS

Keeping Research Data Safe 2: Final Report

Author: Beagrie N
Lavoie B
Woollard M
Publication venue: Joint Information Sytems Committee (JISC)
Publication date: 01/01/2010
Field of study

The first Keeping Research Data Safe study funded by JISC made a major contribution to understanding of long-term preservation costs for research data by developing a cost model and indentifying cost variables for preserving research data in UK universities (Beagrie et al, 2008). However it was completed over a very constrained timescale of four months with little opportunity to follow up other major issues or sources of preservation cost information it identified. It noted that digital preservation costs are notoriously difficult to address in part because of the absence of good case studies and longitudinal information for digital preservation costs or cost variables. In January 2009 JISC issued an ITT for a study on the identification of long-lived digital datasets for the purposes of cost analysis. The aim of this work was to provide a larger body of material and evidence against which existing and future data preservation cost modelling exercises could be tested and validated. The proposal for the KRDS2 study was submitted in response by a consortium consisting of 4 partners involved in the original Keeping Research Data Safe study (Universities of Cambridge and Southampton, Charles Beagrie Ltd, and OCLC Research) and 4 new partners with significant data collections and interests in preservation costs (Archaeology Data Service, University of London Computer Centre, University of Oxford, and the UK Data Archive). A range of supplementary materials in support of this main report have been made available on the KRDS2 project website at http://www.beagrie.com/jisc.php. That website will be maintained and continuously updated with future work as a resource for KRDS users

University of Essex Research Repository