6,875 research outputs found

    XQuery Streaming by Forest Transducers

    Full text link
    Streaming of XML transformations is a challenging task and only very few systems support streaming. Research approaches generally define custom fragments of XQuery and XPath that are amenable to streaming, and then design custom algorithms for each fragment. These languages have several shortcomings. Here we take a more principles approach to the problem of streaming XQuery-based transformations. We start with an elegant transducer model for which many static analysis problems are well-understood: the Macro Forest Transducer (MFT). We show that a large fragment of XQuery can be translated into MFTs --- indeed, a fragment of XQuery, that can express important features that are missing from other XQuery stream engines, such as GCX: our fragment of XQuery supports XPath predicates and let-statements. We then rely on a streaming execution engine for MFTs, one which uses a well-founded set of optimizations from functional programming, such as strictness analysis and deforestation. Our prototype achieves time and memory efficiency comparable to the fastest known engine for XQuery streaming, GCX. This is surprising because our engine relies on the OCaml built in garbage collector and does not use any specialized buffer management, while GCX's efficiency is due to clever and explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE International Conference on Data Engineering (ICDE 2014

    XSIL: Extensible Scientific Interchange Language

    Get PDF
    We motivate and define the XSIL language as a flexible, hierarchical, extensible transport language for scientific data objects. The entire object may be represented in the file, or there may be metadata in the XSIL file, with a powerful, fault-tolerant linking mechanism to external data. The language is based on XML, and is designed not only for parsing and processing by machines, but also for presentation to humans through web browsers and web-database technology. There is a natural mapping between the elements of the XSIL language and the object model into which they are translated by the parser. As well as common objects (Parameter, Array, Time, Table), we have extended XSIL to include the IGWDFrame, used by gravitational-wave observatories

    VXA: A Virtual Architecture for Durable Compressed Archives

    Full text link
    Data compression algorithms change frequently, and obsolete decoders do not always run on new hardware and operating systems, threatening the long-term usability of content archived using those algorithms. Re-encoding content into new formats is cumbersome, and highly undesirable when lossy compression is involved. Processor architectures, in contrast, have remained comparatively stable over recent decades. VXA, an archival storage system designed around this observation, archives executable decoders along with the encoded content it stores. VXA decoders run in a specialized virtual machine that implements an OS-independent execution environment based on the standard x86 architecture. The VXA virtual machine strictly limits access to host system services, making decoders safe to run even if an archive contains malicious code. VXA's adoption of a "native" processor architecture instead of type-safe language technology allows reuse of existing "hand-optimized" decoders in C and assembly language, and permits decoders access to performance-enhancing architecture features such as vector processing instructions. The performance cost of VXA's virtualization is typically less than 15% compared with the same decoders running natively. The storage cost of archived decoders, typically 30-130KB each, can be amortized across many archived files sharing the same compression method.Comment: 14 pages, 7 figures, 2 table

    Huddl: the Hydrographic Universal Data Description Language

    Get PDF
    Since many of the attempts to introduce a universal hydrographic data format have failed or have been only partially successful, a different approach is proposed. Our solution is the Hydrographic Universal Data Description Language (HUDDL), a descriptive XML-based language that permits the creation of a standardized description of (past, present, and future) data formats, and allows for applications like HUDDLER, a compiler that automatically creates drivers for data access and manipulation. HUDDL also represents a powerful solution for archiving data along with their structural description, as well as for cataloguing existing format specifications and their version control. HUDDL is intended to be an open, community-led initiative to simplify the issues involved in hydrographic data access

    BlogForever: D3.1 Preservation Strategy Report

    Get PDF
    This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design

    The Clarens Web Service Framework for Distributed Scientific Analysis in Grid Projects

    Get PDF
    Large scientific collaborations are moving towards service oriented architecutres for implementation and deployment of globally distributed systems. Clarens is a high performance, easy to deploy Web Service framework that supports the construction of such globally distributed systems. This paper discusses some of the core functionality of Clarens that the authors believe is important for building distributed systems based on Web Services that support scientific analysis

    Data production models for the CDF experiment

    Get PDF
    The data production for the CDF experiment is conducted on a large Linux PC farm designed to meet the needs of data collection at a maximum rate of 40 MByte/sec. We present two data production models that exploits advances in computing and communication technology. The first production farm is a centralized system that has achieved a stable data processing rate of approximately 2 TByte per day. The recently upgraded farm is migrated to the SAM (Sequential Access to data via Metadata) data handling system. The software and hardware of the CDF production farms has been successful in providing large computing and data throughput capacity to the experiment.Comment: 8 pages, 9 figures; presented at HPC Asia2005, Beijing, China, Nov 30 - Dec 3, 200

    Ur/Web: A Simple Model for Programming the Web

    Get PDF
    The World Wide Web has evolved gradually from a document delivery platform to an architecture for distributed programming. This largely unplanned evolution is apparent in the set of interconnected languages and protocols that any Web application must manage. This paper presents Ur/Web, a domain-specific, statically typed functional programming language with a much simpler model for programming modern Web applications. Ur/Web's model is unified, where programs in a single programming language are compiled to other "Web standards" languages as needed; modular, supporting novel kinds of encapsulation of Web-specific state; and exposes simple concurrency, where programmers can reason about distributed, multithreaded applications via a mix of transactions and cooperative preemption. We give a tutorial introduction to the main features of Ur/Web, formalize the basic programming model with operational semantics, and discuss the language implementation and the production Web applications that use it.National Science Foundation (U.S.) (Grant CCF-1217501
    • …
    corecore