5,373 research outputs found
Finding scientific articles in a large digital archive: BioStor and the Biodiversity Heritage Library
The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article finding service is exposed as a standard OpenURL resolver on the BioStor web site "http://biostor.org/openurl/":http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from "http://biostor.org/":http://biostor.org/
Optimising metadata to make high-value content more accessible to Google users
Purpose: This paper shows how information in digital collections that have been catalogued using high-quality metadata can be retrieved more easily by users of search engines such as Google. Methodology/approach: The research and proposals described arose from an investigation into the observed phenomenon that pages from the Glasgow Digital Library (gdl.cdlr.strath.ac.uk) were regularly appearing near the top of Google search results shortly after publication, without any deliberate effort to achieve this. The reasons for this phenomenon are now well understood and are described in the second part of the paper. The first part provides context with a review of the impact of Google and a summary of recent initiatives by commercial publishers to make their content more visible to search engines. Findings/practical implications: The literature research provides firm evidence of a trend amongst publishers to ensure that their online content is indexed by Google, in recognition of its popularity with Internet users. The practical research demonstrates how search engine accessibility can be compatible with use of established collection management principles and high-quality metadata. Originality/value: The concept of data shoogling is introduced, involving some simple techniques for metadata optimisation. Details of its practical application are given, to illustrate how those working in academic, cultural and public-sector organisations could make their digital collections more easily accessible via search engines, without compromising any existing standards and practices
Recommended from our members
STELLAR (Semantic Technologies Enhancing the Lifecycle of Learning Resources): Jisc Final Report
[Project Summary]
As one of the earliest distance learning providers The Open University (OU) has a rich heritage of archived learning materials. An ever increasing amount of that is in digital form and is being deposited with the University Archive. This growth has been driven by digitisation activity from projects such as AVA (Access to Video Assets) and the Fedora-based Open University Digital Library âa place to discover digital and digitised archival content from the OU Library, from videos and images to digitised documentsâ. Other digital content is being captured from web archiving activities, such as work to preserve Moodle Virtual Learning Environment course websites. An evidence based understanding is required to inform digital preservation policies, curation strategy and investment in digital library development.
Following the Pre-enhancement, Enhancement and Post-enhancement methodology set out by Jisc, STELLAR adopted the model of a balanced scorecard to ascertain the value ascribed to the non-current learning materials. Four aspects were considered: Personal and professional perspectives of value; Value to the Higher Educational and academic communities; Value to internal processes and cultures; Financial perspectives of value. The outcomes of the survey indicated that stakeholders place a high value on the materials, and that they perceived them to have value in all areas evaluated.
Three OU courses were chosen from the digital library for the transformation stage. These materials were enhanced and transformed into RDF, a process that required more extensive metadata expertise and effort than was expected. Following enhancement the RDF was accessed through a tool called DiscOU, created by a member of the project team from the OUâs Knowledge Media Institute. DiscOU uses both linked data and a semantic meaning engine to analyse the meaning of the text in a search query. This is matched against the meaning of the content derived from an index of the full-text of the digital library content.
In the final stage stakeholders were asked through a survey and series of workshops to use the DiscOU proof-of-concept tool to assess their perception of the value of this transformation. This has revealed that overall, academics and other stakeholders in the university do believe that the value of the selected materials was positively impacted by the application of semantic technologies
Recommended from our members
Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: NL
Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: N
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Reviewing, indicating, and counting books for modern research evaluation systems
In this chapter, we focus on the specialists who have helped to improve the
conditions for book assessments in research evaluation exercises, with
empirically based data and insights supporting their greater integration. Our
review highlights the research carried out by four types of expert communities,
referred to as the monitors, the subject classifiers, the indexers and the
indicator constructionists. Many challenges lie ahead for scholars affiliated
with these communities, particularly the latter three. By acknowledging their
unique, yet interrelated roles, we show where the greatest potential is for
both quantitative and qualitative indicator advancements in book-inclusive
evaluation systems.Comment: Forthcoming in Glanzel, W., Moed, H.F., Schmoch U., Thelwall, M.
(2018). Springer Handbook of Science and Technology Indicators. Springer Some
corrections made in subsection 'Publisher prestige or quality
- âŚ