3,220 research outputs found

    BlogForever D3.2: Interoperability Prospects

    Get PDF
    This report evaluates the interoperability prospects of the BlogForever platform. Therefore, existing interoperability models are reviewed, a Delphi study to identify crucial aspects for the interoperability of web archives and digital libraries is conducted, technical interoperability standards and protocols are reviewed regarding their relevance for BlogForever, a simple approach to consider interoperability in specific usage scenarios is proposed, and a tangible approach to develop a succession plan that would allow a reliable transfer of content from the current digital archive to other digital repositories is presented

    Digital Preservation Services : State of the Art Analysis

    Get PDF
    Research report funded by the DC-NET project.An overview of the state of the art in service provision for digital preservation and curation. Its focus is on the areas where bridging the gaps is needed between e-Infrastructures and efficient and forward-looking digital preservation services. Based on a desktop study and a rapid analysis of some 190 currently available tools and services for digital preservation, the deliverable provides a high-level view on the range of instruments currently on offer to support various functions within a preservation system.European Commission, FP7peer-reviewe

    Library and Information Science Education and eScience: The Current State of ALA Accredited MLS/MLIS Programs in Preparing Librarians and Information Professionals for eScience Needs

    Get PDF
    The purpose of this study is multifaceted: 1) to describe eScience research in acomprehensive way; 2) to help library and information specialists understand the realm of eScience research and the information needs of the community and demonstrate the importance of LIS professionals within the eScience domain; 3) and to explore the current state of curricular content of ALA accredited MLS/MLIS programs to understand the extent to which they prepare new professionals within eScience librarianship. The literature review focuses heavily on eScientists and other data-driven researchers’ information service needs in addition to demonstrating how and why librarians and information specialists can and should fulfill these service gaps and information needs within eScience research. By looking at the current curriculum of American Library Association (ALA) accredited MLS/MLIS programs, we can identify potential gaps in knowledge and where to improve in order to prepare and train new MLS/MLIS graduates to fulfill the needs of eScientists. This investigation is meant to be informative and can be used as a tool for LIS programs to assess their curriculums in comparison to the needs of eScience and other data-driven and networked research. Finally, this investigation will provide awareness and insight into the services needed to support a thriving eScience and data-driven research community to the LIS profession

    Digital forensics formats: seeking a digital preservation storage format for web archiving

    Get PDF
    In this paper we discuss archival storage formats from the point of view of digital curation and preservation. Considering established approaches to data management as our jumping off point, we selected seven format attributes which are core to the long term accessibility of digital materials. These we have labeled core preservation attributes. These attributes are then used as evaluation criteria to compare file formats belonging to five common categories: formats for archiving selected content (e.g. tar, WARC), disk image formats that capture data for recovery or installation (partimage, dd raw image), these two types combined with a selected compression algorithm (e.g. tar+gzip), formats that combine packing and compression (e.g. 7-zip), and forensic file formats for data analysis in criminal investigations (e.g. aff, Advanced Forensic File format). We present a general discussion of the file format landscape in terms of the attributes we discuss, and make a direct comparison between the three most promising archival formats: tar, WARC, and aff. We conclude by suggesting the next steps to take the research forward and to validate the observations we have made

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    Get PDF
    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

    BlogForever: D3.1 Preservation Strategy Report

    Get PDF
    This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design

    Big Data Reference Architectures, a systematic literature review

    Get PDF
    Today, we live in a world that produces data at an unprecedented rate. The significant amount of data has raised lots of attention and many strive to harness the power of this new material. In the same direction, academics and practitioners have considered means through which they can incorporate datadriven functions and explore patterns that were otherwise unknown. This has led to a concept called Big Data. Big Data is a field that deals with data sets that are too large and complex for traditional approaches to handle. Technical matters are fundamentally critical, but what is even more necessary, is an architecture that supports the orchestration of Big Data systems; an image of the system providing with clear understanding of different elements and their interdependencies. Reference architectures aid in defining the body of system and its key components, relationships, behaviors, patterns and limitations. This study provides an in-depth review of Big Data Reference Architectures by applying a systematic literature review. The study demonstrates a synthesis of high-quality research to offer indications of new trends. The study contributes to the body of knowledge on the principles of Reference Architectures, the current state of Big Data Reference Architectures, and their limitations

    Web Archive Services Framework for Tighter Integration Between the Past and Present Web

    Get PDF
    Web archives have contained the cultural history of the web for many years, but they still have a limited capability for access. Most of the web archiving research has focused on crawling and preservation activities, with little focus on the delivery methods. The current access methods are tightly coupled with web archive infrastructure, hard to replicate or integrate with other web archives, and do not cover all the users\u27 needs. In this dissertation, we focus on the access methods for archived web data to enable users, third-party developers, researchers, and others to gain knowledge from the web archives. We build ArcSys, a new service framework that extracts, preserves, and exposes APIs for the web archive corpus. The dissertation introduces a novel categorization technique to divide the archived corpus into four levels. For each level, we will propose suitable services and APIs that enable both users and third-party developers to build new interfaces. The first level is the content level that extracts the content from the archived web data. We develop ArcContent to expose the web archive content processed through various filters. The second level is the metadata level; we extract the metadata from the archived web data and make it available to users. We implement two services, ArcLink for temporal web graph and ArcThumb for optimizing the thumbnail creation in the web archives. The third level is the URI level that focuses on using the URI HTTP redirection status to enhance the user query. Finally, the highest level in the web archiving service framework pyramid is the archive level. In this level, we define the web archive by the characteristics of its corpus and building Web Archive Profiles. The profiles are used by the Memento Aggregator for query optimization

    Proceedings of the 12th International Conference on Digital Preservation

    Get PDF
    The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase
    corecore