21,572 research outputs found
Developing an automatic metadata harvesting and generation system for a continuing education repository: a pilot study
The goal of this pilot study is to assess the accuracy and reliability of an automated metadata generation and harvesting system developed for a project repository which hosts continuing education resources for cataloging and metadata professionals. Using a web crawler developed for the repository, 500 educational web resources are selected as seed pages for metadata extraction, harvesting and generation. This paper summarizes the processes as well as the results of the study. The metadata harvesting system combined with powerful article analysis and data generation tools such as Adlegant’s Article Anaylsis API produces significant improvement in metadata generation
Rich Tags: Cross-Repository Browsing
We present RichTags, a system for cross-site browsing and exploration of digital repositories. Categorical and faceted search across repositories is poorly supported, especially compared to the support of keyword search through internet search engines. We combine a variety of information retrieval techniques to determine categories of papers, to enable cross-repository browsing by category. The browsing and exploration of this metadata is achieved through a multi-faceted dynamic exploration interface. Social interaction features have also been added to enable cross-repository tagging, commenting and sharing of papers into groups. These social features are available via an API to enable future work to add plugins to pull comments back to the repositories
RIOJA (Repository Interface to Overlaid Journal Archives) project: final report
RIOJA (Repository Interface to Overlaid Journal Archives) was a 18-month partnership
between UCL (University College London), Imperial College London, and the Universities Glasgow, Cambridge and Cornell. The project was funded by the JISC (Joint Information Systems Committee, UK). The project team worked with the Astrophysics community investigate aspects of overlay journals. For the purposes of the project, an overlay was defined as a quality-assured journal whose content is deposited to and resides more open access repositories.
The project had both technical aims and supporting, non-technical aims. The primary
technical deliverable from the project was a toolkit for the creation and maintenance overlay journals. The toolkit supports the exchange of data between a repository and piece of journal software. It supports functions such as author validation, metadata
extraction from the source repository, and submission tracking. The toolkit is platform-neutral and could, in theory, be employed by any journal using content from any number repositories, in any discipline. The project also implemented a demonstrator overlay applying the RIOJA toolkit to the arXiv subject repository, and a demonstrator
implementation of the RIOJA tool for GNU EPrints.
Aside from creating the demonstrator and its underlying tools, the project aimed to acceptibility and feasibility of the overlay model. First, a large-scale survey of the
Astrophysics community was undertaken. The survey collected data about research publishing practices within this community, and probed its reaction to the principle publishing. Second, the views of editors and publishers in this discipline were sought
through interviews. These views were added to findings from the literature and summarised
in a more general report on issues around the sustainability of an overlay journal
Recommended from our members
A linked data-driven & service-oriented architecture for sharing educational resources
The two fundamental aims of managing educational resources are to enable resources to be reusable and interoperable and to enable Web-scale sharing of resources across learning communities. Currently, a variety of approaches have been proposed to expose and manage educational resources and their metadata on the Web. These are usually based on heterogeneous metadata standards and schemas, such as IEEE LOM or ADL SCORM, and diverse repository interfaces such as OAI-PMH or SQI. Also, there is still a lack of usage of controlled vocabularies and available data sets that could replace the widespread use of unstructured text for describing resources. On the other hand, the Linked Data approach has proven that it offers a set of successful principles that have the potential to alleviate the aforementioned issues. In this paper, we introduce an architecture and prototype which is fundamentally based on (a) Linked Data principles and (b) Service-orientation to resolve the integration issues for sharing educational resources
Eating your own dog food
As part of its project to develop a new research data management system the University of Lincoln is embracing development practices built around APIs - interfaces to the underlying data and functions of the system which are explicitly designed to make life easy for developers by being machine readable and programmatically accessible
Facilitating Wiki/Repository Communication with Metadata
4th International Conference on Open RepositoriesThis presentation was part of the session : Fedora User Group PresentationsDate: 2009-05-20 01:30 PM – 03:00 PMThe National Science Digital Library (NSDL) Materials Digital Library Pathway (MatDL) has implemented an information infrastructure to disseminate government funded research results and to provide content as well as services to support the integration of research and education in materials. This paper describes how we are enabling two-way communication between a digital repository and open-source collaborative tools, such as wikis, to support users in materials research and education in the creation and re-use of compelling learning resources. A search results plug-in for MediaWiki has been developed to display relevant search results from the Fedora-based MatDL repository in the Soft Matter Wiki established and developed by MatDL and its partners. Wiki-to-repository information transfer has also been facilitated by mapping the metadata associated with resources originating in the wiki onto Dublin Core (DC) metadata elements and making the metadata and resources available in the repository.The Materials Digital Library Pathway (DUE-0532831) is supported by the National Science Foundation
JISC Final Report: IncReASe (Increasing Repository Content through Automation and Services)
The IncReASe (Increasing Repository Content through Automation and Services) was an eighteen month project (subsequently extended to twenty months) to enhance White Rose Research Online (WRRO)1. WRRO is a shared repository of research outputs (primarily publications) from the Universities of Leeds, Sheffield and York; it runs on the EPrints open source repository platform. The repository was created in 2004 and had steady growth but, in common with many other similar repositories, had difficulty in achieving a “critical mass” of content and in becoming truly embedded within researchers’ workflows. The main aim of the IncReASe project was to assess ingestion routes into WRRO with a view to lowering barriers to deposit. We reviewed the feasibility of bulk import of pre-existing metadata and/or full-text research outputs, hoping this activity would have a positive knock-on effect on repository growth and embedding. Prior to the project, we had identified researchers’ reluctance to duplicate effort in metadata creation as a significant barrier to WRRO uptake; we investigated how WRRO might share data with internal and external IT systems. This work included a review of how WRRO, as an institutional based repository, might interact with the subject repository of the Economic and Social Research Council (ESRC). The project addressed four main areas: (i) researcher behaviour: we investigated researcher awareness, motivation and workflow through a survey of archiving activity on the university web sites, a questionnaire and discussions with researchers (ii) bulk import: we imported data from local systems, including York’s submission data for the 2008 Research Assessment Exercise (RAE), and developed an import plug-in for use with the arXiv2 repository (iii) interoperability: we looked at how WRRO might interact with university and departmental publication databases and ESRC’s repository. (iv) metadata: we assessed metadata issues raised by importing publication data from a variety of sources. A number of outputs from the project have been made available from the IncReASe project web site http://eprints.whiterose.ac.uk/increase/. The project highlighted the low levels of researcher awareness of WRRO - and of broader open access issues, including research funders’ deposit requirements. We designed some new publicity materials to start to address this. Departmental publication databases provided a useful jumping off point for advocacy and liaison; this activity was helpful in promoting awareness of WRRO. Bulk import proved time consuming – both in terms of adjusting EPrints plug-ins to incorporate different datasets and in the staff time required to improve publication metadata. A number of deposit scenarios were developed in the context of our work with ESRC; we concentrated on investigating how a local deposit of a research paper and attendant metadata in WRRO might be used to populate ESRC’s repository. This work improved our understanding of researcher workflows and of the SWORD protocol as a potential (if partial) solution to the single deposit, multiple destination model we wish to develop; we think the prospect of institutional repository / ESRC data sharing is now a step closer. IncReASe experienced some staff recruitment difficulties. It was also necessary to adapt the project to the changing IT landscape at the three partner institutions – in particular, the introduction of a centralised publication management system at the University of Leeds. Although these factors had some impact on deliverables, the aims and objectives of the project were largely achieved
Submission of content to a digital object repository using a configurable workflow system
The prototype of a workflow system for the submission of content to a digital
object repository is here presented. It is based entirely on open-source
standard components and features a service-oriented architecture. The front-end
consists of Java Business Process Management (jBPM), Java Server Faces (JSF),
and Java Server Pages (JSP). A Fedora Repository and a mySQL data base
management system serve as a back-end. The communication between front-end and
back-end uses a SOAP minimal binding stub. We describe the design principles
and the construction of the prototype and discuss the possibilities and
limitations of work ow creation by administrators. The code of the prototype is
open-source and can be retrieved in the project escipub at
http://sourceforge.ne
DataCite as a novel bibliometric source: Coverage, strengths and limitations
This paper explores the characteristics of DataCite to determine its
possibilities and potential as a new bibliometric data source to analyze the
scholarly production of open data. Open science and the increasing data sharing
requirements from governments, funding bodies, institutions and scientific
journals has led to a pressing demand for the development of data metrics. As a
very first step towards reliable data metrics, we need to better comprehend the
limitations and caveats of the information provided by sources of open data. In
this paper, we critically examine records downloaded from the DataCite's OAI
API and elaborate a series of recommendations regarding the use of this source
for bibliometric analyses of open data. We highlight issues related to metadata
incompleteness, lack of standardization, and ambiguous definitions of several
fields. Despite these limitations, we emphasize DataCite's value and potential
to become one of the main sources for data metrics development.Comment: Paper accepted for publication in Journal of Informetric
- …