Search CORE

2,249 research outputs found

ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation

Author: AlSum Ahmed
Brügger Niels
Gomes Daniel
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/02/2017
Field of study

Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.Comment: JCDL 2016, Newark, NJ, US

arXiv.org e-Print Archive

Crossref

XSIL: Extensible Scientific Interchange Language

Author: Blackburn Kent
Lazzarini Albert
Prince Thomas A.
Williams Roy
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1999
Field of study

We motivate and define the XSIL language as a flexible, hierarchical, extensible transport language for scientific data objects. The entire object may be represented in the file, or there may be metadata in the XSIL file, with a powerful, fault-tolerant linking mechanism to external data. The language is based on XML, and is designed not only for parsing and processing by machines, but also for presentation to humans through web browsers and web-database technology. There is a natural mapping between the elements of the XSIL language and the object model into which they are translated by the parser. As well as common objects (Parameter, Array, Time, Table), we have extended XSIL to include the IGWDFrame, used by gravitational-wave observatories

Caltech Authors

IVOA Recommendation: VOTable Format Definition Version 1.3

Author: Davenhall Clive
Durand Daniel
Fernique Pierre
Giaretta David
Hanisch Robert
McGlynn Tom
Ochsenbein François
Szalay Alex
Taylor Mark B.
Wicenec Andreas
Williams Roy
Publication venue: 'Smithsonian Institution'
Publication date: 13/09/2016
Field of study

This document describes the structures making up the VOTable standard. The main part of this document describes the adopted part of the VOTable standard; it is followed by appendices presenting extensions which have been proposed and/or discussed, but which are not part of the standard

arXiv.org e-Print Archive

Crossref

IVOA Recommendation: Universal Worker Service Pattern Version 1.0

Author: Harrison Paul
Rixon Guy
Publication venue: 'Smithsonian Institution'
Publication date: 03/10/2011
Field of study

The Universal Worker Service (UWS) pattern defines how to manage asynchronous execution of jobs on a service. Any application of the pattern defines a family of related services with a common service contract. Possible uses of the pattern are also described

arXiv.org e-Print Archive

Crossref

BlogForever: D3.1 Preservation Strategy Report

Author: Arango-Docio Silvia
Banos Vangelis
Garcia Llopis Jaime
Kalb Hendrik
Kim Yunhyong
Pinsent Ed
Ross Seamus
Sleeman Patricia
Stepanyan Karen
Trochidis Illias
Publication venue: BlogForever
Publication date: 25/10/2013
Field of study

This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design

Enlighten

Firefox Extension to Add Contacts, Events, and View Addresses

Author: Rao Vijay
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2008
Field of study

Users of the Firefox browser have the ability to download plugins to manage their contacts. This usually involves typing or copying the details from some source to add contacts. Event and meeting invitations are sent by mail and are added to the user’s calendar once the user accepts the invitation. Users viewing address data on websites are limited to the mapping capabilities provided by the webpage viewed by the user. We developed a Firefox extension that allows the user to select portions of text with contact or event information and add it as a contact or an event in the calendar of their existing mail client application such as: Microsoft Outlook, Thunderbird, etc. The data is automatically parsed to pick up relevant information such as name, street address, phone number, and email address in case of contacts and street addresses and event dates in case of event. The extension also allows users to right click on a webpage that has a tabular display of addresses and view these addresses on a maps application such as Google Maps

SJSU ScholarWorks

SWI-Prolog and the Web

Author: Gras
Huang
JAN WIELEMAKER
LOURENS VAN DER MEIJ
Mäkelä
Ramakrishnan
Wielemaker
Wielemaker
Wielemaker
ZHISHENG HUANG
Publication venue
Publication date: 06/11/2007
Field of study

Where Prolog is commonly seen as a component in a Web application that is either embedded or communicates using a proprietary protocol, we propose an architecture where Prolog communicates to other components in a Web application using the standard HTTP protocol. By avoiding embedding in external Web servers development and deployment become much easier. To support this architecture, in addition to the transfer protocol, we must also support parsing, representing and generating the key Web document types such as HTML, XML and RDF. This paper motivates the design decisions in the libraries and extensions to Prolog for handling Web documents and protocols. The design has been guided by the requirement to handle large documents efficiently. The described libraries support a wide range of Web applications ranging from HTML and XML documents to Semantic Web RDF processing. To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice of Logic Programming (TPLP

arXiv.org e-Print Archive

VU Research Portal

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE