2,249 research outputs found

    ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

    Full text link
    Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.Comment: JCDL 2016, Newark, NJ, US

    XSIL: Extensible Scientific Interchange Language

    Get PDF
    We motivate and define the XSIL language as a flexible, hierarchical, extensible transport language for scientific data objects. The entire object may be represented in the file, or there may be metadata in the XSIL file, with a powerful, fault-tolerant linking mechanism to external data. The language is based on XML, and is designed not only for parsing and processing by machines, but also for presentation to humans through web browsers and web-database technology. There is a natural mapping between the elements of the XSIL language and the object model into which they are translated by the parser. As well as common objects (Parameter, Array, Time, Table), we have extended XSIL to include the IGWDFrame, used by gravitational-wave observatories

    IVOA Recommendation: VOTable Format Definition Version 1.3

    Full text link
    This document describes the structures making up the VOTable standard. The main part of this document describes the adopted part of the VOTable standard; it is followed by appendices presenting extensions which have been proposed and/or discussed, but which are not part of the standard

    IVOA Recommendation: Universal Worker Service Pattern Version 1.0

    Full text link
    The Universal Worker Service (UWS) pattern defines how to manage asynchronous execution of jobs on a service. Any application of the pattern defines a family of related services with a common service contract. Possible uses of the pattern are also described

    BlogForever: D3.1 Preservation Strategy Report

    Get PDF
    This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design

    Firefox Extension to Add Contacts, Events, and View Addresses

    Get PDF
    Users of the Firefox browser have the ability to download plugins to manage their contacts. This usually involves typing or copying the details from some source to add contacts. Event and meeting invitations are sent by mail and are added to the user’s calendar once the user accepts the invitation. Users viewing address data on websites are limited to the mapping capabilities provided by the webpage viewed by the user. We developed a Firefox extension that allows the user to select portions of text with contact or event information and add it as a contact or an event in the calendar of their existing mail client application such as: Microsoft Outlook, Thunderbird, etc. The data is automatically parsed to pick up relevant information such as name, street address, phone number, and email address in case of contacts and street addresses and event dates in case of event. The extension also allows users to right click on a webpage that has a tabular display of addresses and view these addresses on a maps application such as Google Maps

    SWI-Prolog and the Web

    Get PDF
    Where Prolog is commonly seen as a component in a Web application that is either embedded or communicates using a proprietary protocol, we propose an architecture where Prolog communicates to other components in a Web application using the standard HTTP protocol. By avoiding embedding in external Web servers development and deployment become much easier. To support this architecture, in addition to the transfer protocol, we must also support parsing, representing and generating the key Web document types such as HTML, XML and RDF. This paper motivates the design decisions in the libraries and extensions to Prolog for handling Web documents and protocols. The design has been guided by the requirement to handle large documents efficiently. The described libraries support a wide range of Web applications ranging from HTML and XML documents to Semantic Web RDF processing. To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice of Logic Programming (TPLP
    • …
    corecore