7,392 research outputs found

    Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation

    Get PDF
    Digital archiving and preservation are important areas for research and development, but there is no agreed upon set of priorities or coherent plan for research in this area. Research projects in this area tend to be small and driven by particular institutional problems or concerns. As a consequence, proposed solutions from experimental projects and prototypes tend not to scale to millions of digital objects, nor do the results from disparate projects readily build on each other. It is also unclear whether it is worthwhile to seek general solutions or whether different strategies are needed for different types of digital objects and collections. The lack of coordination in both research and development means that there are some areas where researchers are reinventing the wheel while other areas are neglected. Digital archiving and preservation is an area that will benefit from an exercise in analysis, priority setting, and planning for future research. The WG aims to survey current research activities, identify gaps, and develop a white paper proposing future research directions in the area of digital preservation. Some of the potential areas for research include repository architectures and inter-operability among digital archives; automated tools for capture, ingest, and normalization of digital objects; and harmonization of preservation formats and metadata. There can also be opportunities for development of commercial products in the areas of mass storage systems, repositories and repository management systems, and data management software and tools.

    Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech

    Get PDF
    The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically related metadata. An important question is what can be expected from search of such transcriptions and different sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech test collection with manual and automatically derived metadata fields. Using this collection we investigate the comparative search effectiveness of individual fields comprising automated transcriptions and the available metadata. A further important question is how transcriptions and metadata should be combined for the greatest benefit to search accuracy. We compare simple field merging of individual fields with the extended BM25 model for weighted field combination (BM25F). Results indicate that BM25F can produce improved search accuracy, but that it is currently important to set its parameters suitably using a suitable training set

    Achieving interoperability between the CARARE schema for monuments and sites and the Europeana Data Model

    Full text link
    Mapping between different data models in a data aggregation context always presents significant interoperability challenges. In this paper, we describe the challenges faced and solutions developed when mapping the CARARE schema designed for archaeological and architectural monuments and sites to the Europeana Data Model (EDM), a model based on Linked Data principles, for the purpose of integrating more than two million metadata records from national monument collections and databases across Europe into the Europeana digital library.Comment: The final version of this paper is openly published in the proceedings of the Dublin Core 2013 conference, see http://dcevents.dublincore.org/IntConf/dc-2013/paper/view/17

    Multimedia Provocations

    Get PDF

    Potentially Polluting Marine Sites GeoDB: An S-100 Geospatial Database as an Effective Contribution to the Protection of the Marine Environment

    Get PDF
    Potentially Polluting Marine Sites (PPMS) are objects on, or areas of, the seabed that may release pollution in the future. A rationale for, and design of, a geospatial database to inventory and manipu-late PPMS is presented. Built as an S-100 Product Specification, it is specified through human-readable UML diagrams and implemented through machine-readable GML files, and includes auxiliary information such as pollution-control resources and potentially vulnerable sites in order to support analyses of the core data. The design and some aspects of implementation are presented, along with metadata requirements and structure, and a perspective on potential uses of the database

    Technical alignment

    Get PDF
    This essay discusses the importance of the areas of infrastructure and testing to help digital preservation services demonstrate reliability, transparency, and accountability. It encourages practitioners to build a strong culture in which transparency and collaborations between technical frameworks are valued highly. It also argues for devising and applying agreed-upon metrics that will enable the systematic analysis of preservation infrastructure. The essay begins by defining technical infrastructure and testing in the digital preservation context, provides case studies that exemplify both progress and challenges for technical alignment in both areas, and concludes with suggestions for achieving greater degrees of technical alignment going forward

    Evaluation of spoken document retrieval for historic speech collections

    Get PDF
    The re-use of spoken word audio collections maintained by audiovisual archives is severely hindered by their generally limited access. The CHoral project, which is part of the CATCH program funded by the Dutch Research Council, aims to provide users of speech archives with online, instead of on-location, access to relevant fragments, instead of full documents. To meet this goal, a spoken document retrieval framework is being developed. In this paper the evaluation efforts undertaken so far to assess and improve various aspects of the framework are presented. These efforts include (i) evaluation of the automatically generated textual representations of the spoken word documents that enable word-based search, (ii) the development of measures to estimate the quality of the textual representations for use in information retrieval, and (iii) studies to establish the potential user groups of the to-be-developed technology, and the first versions of the user interface supporting online access to spoken word collections

    Stewardship of the evolving scholarly record: from the invisible hand to conscious coordination

    Get PDF
    The scholarly record is increasingly digital and networked, while at the same time expanding in both the volume and diversity of the material it contains. The long-term future of the scholarly record cannot be effectively secured with traditional stewardship models developed for print materials. This report describes the key features of future stewardship models adapted to the characteristics of a digital, networked scholarly record, and discusses some practical implications of implementing these models. Key highlights include: As the scholarly record continues to evolve, conscious coordination will become an important organizing principle for stewardship models. Past stewardship models were built on an "invisible hand" approach that relied on the uncoordinated, institution-scale efforts of individual academic libraries acting autonomously to maintain local collections. Future stewardship of the evolving scholarly record requires conscious coordination of context, commitments, specialization, and reciprocity. With conscious coordination, local stewardship efforts leverage scale by collecting more of less. Keys to conscious coordination include right-scaling consolidation, cooperation, and community mix. Reducing transaction costs and building trust facilitate conscious coordination. Incentives to participate in cooperative stewardship activities should be linked to broader institutional priorities. The long-term future of the scholarly record in its fullest expression cannot be effectively secured with stewardship strategies designed for print materials. The features of the evolving scholarly record suggest that traditional stewardship strategies, built on an “invisible hand” approach that relies on the uncoordinated, institution-scale efforts of individual academic libraries acting autonomously to maintain local collections, is no longer suitable for collecting, organizing, making available, and preserving the outputs of scholarly inquiry. As the scholarly record continues to evolve, conscious coordination will become an important organizing principle for stewardship models. Conscious coordination calls for stewardship strategies that incorporate a broader awareness of the system-wide stewardship context; declarations of explicit commitments around portions of the local collection; formal divisions of labor within cooperative arrangements; and robust networks for reciprocal access. Stewardship strategies based on conscious coordination involve an acceleration of an already perceptible transition away from relatively autonomous local collections to ones built on networks of cooperation across many organizations, within and outside the traditional cultural heritage community
    corecore