21,572 research outputs found

    Developing an automatic metadata harvesting and generation system for a continuing education repository: a pilot study

    Get PDF
    The goal of this pilot study is to assess the accuracy and reliability of an automated metadata generation and harvesting system developed for a project repository which hosts continuing education resources for cataloging and metadata professionals. Using a web crawler developed for the repository, 500 educational web resources are selected as seed pages for metadata extraction, harvesting and generation. This paper summarizes the processes as well as the results of the study. The metadata harvesting system combined with powerful article analysis and data generation tools such as Adlegant’s Article Anaylsis API produces significant improvement in metadata generation

    Rich Tags: Cross-Repository Browsing

    No full text
    We present RichTags, a system for cross-site browsing and exploration of digital repositories. Categorical and faceted search across repositories is poorly supported, especially compared to the support of keyword search through internet search engines. We combine a variety of information retrieval techniques to determine categories of papers, to enable cross-repository browsing by category. The browsing and exploration of this metadata is achieved through a multi-faceted dynamic exploration interface. Social interaction features have also been added to enable cross-repository tagging, commenting and sharing of papers into groups. These social features are available via an API to enable future work to add plugins to pull comments back to the repositories

    RIOJA (Repository Interface to Overlaid Journal Archives) project: final report

    Get PDF
    RIOJA (Repository Interface to Overlaid Journal Archives) was a 18-month partnership between UCL (University College London), Imperial College London, and the Universities Glasgow, Cambridge and Cornell. The project was funded by the JISC (Joint Information Systems Committee, UK). The project team worked with the Astrophysics community investigate aspects of overlay journals. For the purposes of the project, an overlay was defined as a quality-assured journal whose content is deposited to and resides more open access repositories. The project had both technical aims and supporting, non-technical aims. The primary technical deliverable from the project was a toolkit for the creation and maintenance overlay journals. The toolkit supports the exchange of data between a repository and piece of journal software. It supports functions such as author validation, metadata extraction from the source repository, and submission tracking. The toolkit is platform-neutral and could, in theory, be employed by any journal using content from any number repositories, in any discipline. The project also implemented a demonstrator overlay applying the RIOJA toolkit to the arXiv subject repository, and a demonstrator implementation of the RIOJA tool for GNU EPrints. Aside from creating the demonstrator and its underlying tools, the project aimed to acceptibility and feasibility of the overlay model. First, a large-scale survey of the Astrophysics community was undertaken. The survey collected data about research publishing practices within this community, and probed its reaction to the principle publishing. Second, the views of editors and publishers in this discipline were sought through interviews. These views were added to findings from the literature and summarised in a more general report on issues around the sustainability of an overlay journal

    Eating your own dog food

    Get PDF
    As part of its project to develop a new research data management system the University of Lincoln is embracing development practices built around APIs - interfaces to the underlying data and functions of the system which are explicitly designed to make life easy for developers by being machine readable and programmatically accessible

    Facilitating Wiki/Repository Communication with Metadata

    Get PDF
    4th International Conference on Open RepositoriesThis presentation was part of the session : Fedora User Group PresentationsDate: 2009-05-20 01:30 PM – 03:00 PMThe National Science Digital Library (NSDL) Materials Digital Library Pathway (MatDL) has implemented an information infrastructure to disseminate government funded research results and to provide content as well as services to support the integration of research and education in materials. This paper describes how we are enabling two-way communication between a digital repository and open-source collaborative tools, such as wikis, to support users in materials research and education in the creation and re-use of compelling learning resources. A search results plug-in for MediaWiki has been developed to display relevant search results from the Fedora-based MatDL repository in the Soft Matter Wiki established and developed by MatDL and its partners. Wiki-to-repository information transfer has also been facilitated by mapping the metadata associated with resources originating in the wiki onto Dublin Core (DC) metadata elements and making the metadata and resources available in the repository.The Materials Digital Library Pathway (DUE-0532831) is supported by the National Science Foundation

    JISC Final Report: IncReASe (Increasing Repository Content through Automation and Services)

    Get PDF
    The IncReASe (Increasing Repository Content through Automation and Services) was an eighteen month project (subsequently extended to twenty months) to enhance White Rose Research Online (WRRO)1. WRRO is a shared repository of research outputs (primarily publications) from the Universities of Leeds, Sheffield and York; it runs on the EPrints open source repository platform. The repository was created in 2004 and had steady growth but, in common with many other similar repositories, had difficulty in achieving a “critical mass” of content and in becoming truly embedded within researchers’ workflows. The main aim of the IncReASe project was to assess ingestion routes into WRRO with a view to lowering barriers to deposit. We reviewed the feasibility of bulk import of pre-existing metadata and/or full-text research outputs, hoping this activity would have a positive knock-on effect on repository growth and embedding. Prior to the project, we had identified researchers’ reluctance to duplicate effort in metadata creation as a significant barrier to WRRO uptake; we investigated how WRRO might share data with internal and external IT systems. This work included a review of how WRRO, as an institutional based repository, might interact with the subject repository of the Economic and Social Research Council (ESRC). The project addressed four main areas: (i) researcher behaviour: we investigated researcher awareness, motivation and workflow through a survey of archiving activity on the university web sites, a questionnaire and discussions with researchers (ii) bulk import: we imported data from local systems, including York’s submission data for the 2008 Research Assessment Exercise (RAE), and developed an import plug-in for use with the arXiv2 repository (iii) interoperability: we looked at how WRRO might interact with university and departmental publication databases and ESRC’s repository. (iv) metadata: we assessed metadata issues raised by importing publication data from a variety of sources. A number of outputs from the project have been made available from the IncReASe project web site http://eprints.whiterose.ac.uk/increase/. The project highlighted the low levels of researcher awareness of WRRO - and of broader open access issues, including research funders’ deposit requirements. We designed some new publicity materials to start to address this. Departmental publication databases provided a useful jumping off point for advocacy and liaison; this activity was helpful in promoting awareness of WRRO. Bulk import proved time consuming – both in terms of adjusting EPrints plug-ins to incorporate different datasets and in the staff time required to improve publication metadata. A number of deposit scenarios were developed in the context of our work with ESRC; we concentrated on investigating how a local deposit of a research paper and attendant metadata in WRRO might be used to populate ESRC’s repository. This work improved our understanding of researcher workflows and of the SWORD protocol as a potential (if partial) solution to the single deposit, multiple destination model we wish to develop; we think the prospect of institutional repository / ESRC data sharing is now a step closer. IncReASe experienced some staff recruitment difficulties. It was also necessary to adapt the project to the changing IT landscape at the three partner institutions – in particular, the introduction of a centralised publication management system at the University of Leeds. Although these factors had some impact on deliverables, the aims and objectives of the project were largely achieved

    Submission of content to a digital object repository using a configurable workflow system

    Full text link
    The prototype of a workflow system for the submission of content to a digital object repository is here presented. It is based entirely on open-source standard components and features a service-oriented architecture. The front-end consists of Java Business Process Management (jBPM), Java Server Faces (JSF), and Java Server Pages (JSP). A Fedora Repository and a mySQL data base management system serve as a back-end. The communication between front-end and back-end uses a SOAP minimal binding stub. We describe the design principles and the construction of the prototype and discuss the possibilities and limitations of work ow creation by administrators. The code of the prototype is open-source and can be retrieved in the project escipub at http://sourceforge.ne

    DataCite as a novel bibliometric source: Coverage, strengths and limitations

    Get PDF
    This paper explores the characteristics of DataCite to determine its possibilities and potential as a new bibliometric data source to analyze the scholarly production of open data. Open science and the increasing data sharing requirements from governments, funding bodies, institutions and scientific journals has led to a pressing demand for the development of data metrics. As a very first step towards reliable data metrics, we need to better comprehend the limitations and caveats of the information provided by sources of open data. In this paper, we critically examine records downloaded from the DataCite's OAI API and elaborate a series of recommendations regarding the use of this source for bibliometric analyses of open data. We highlight issues related to metadata incompleteness, lack of standardization, and ambiguous definitions of several fields. Despite these limitations, we emphasize DataCite's value and potential to become one of the main sources for data metrics development.Comment: Paper accepted for publication in Journal of Informetric
    • …
    corecore