6,875 research outputs found
XQuery Streaming by Forest Transducers
Streaming of XML transformations is a challenging task and only very few
systems support streaming. Research approaches generally define custom
fragments of XQuery and XPath that are amenable to streaming, and then design
custom algorithms for each fragment. These languages have several shortcomings.
Here we take a more principles approach to the problem of streaming
XQuery-based transformations. We start with an elegant transducer model for
which many static analysis problems are well-understood: the Macro Forest
Transducer (MFT). We show that a large fragment of XQuery can be translated
into MFTs --- indeed, a fragment of XQuery, that can express important features
that are missing from other XQuery stream engines, such as GCX: our fragment of
XQuery supports XPath predicates and let-statements. We then rely on a
streaming execution engine for MFTs, one which uses a well-founded set of
optimizations from functional programming, such as strictness analysis and
deforestation. Our prototype achieves time and memory efficiency comparable to
the fastest known engine for XQuery streaming, GCX. This is surprising because
our engine relies on the OCaml built in garbage collector and does not use any
specialized buffer management, while GCX's efficiency is due to clever and
explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE
International Conference on Data Engineering (ICDE 2014
XSIL: Extensible Scientific Interchange Language
We motivate and define the XSIL language as a flexible, hierarchical, extensible transport language for scientific data objects. The entire object may be represented in the file, or there may be metadata in the XSIL file, with a powerful, fault-tolerant linking mechanism to external data. The language is based on XML, and is designed not only for parsing and processing by machines, but also for presentation to humans through web browsers and web-database technology. There is a natural mapping between the elements of the XSIL language and the object model into which they are translated by the parser. As well as common objects (Parameter, Array, Time, Table), we have extended XSIL to include the IGWDFrame, used by gravitational-wave observatories
VXA: A Virtual Architecture for Durable Compressed Archives
Data compression algorithms change frequently, and obsolete decoders do not
always run on new hardware and operating systems, threatening the long-term
usability of content archived using those algorithms. Re-encoding content into
new formats is cumbersome, and highly undesirable when lossy compression is
involved. Processor architectures, in contrast, have remained comparatively
stable over recent decades. VXA, an archival storage system designed around
this observation, archives executable decoders along with the encoded content
it stores. VXA decoders run in a specialized virtual machine that implements an
OS-independent execution environment based on the standard x86 architecture.
The VXA virtual machine strictly limits access to host system services, making
decoders safe to run even if an archive contains malicious code. VXA's adoption
of a "native" processor architecture instead of type-safe language technology
allows reuse of existing "hand-optimized" decoders in C and assembly language,
and permits decoders access to performance-enhancing architecture features such
as vector processing instructions. The performance cost of VXA's virtualization
is typically less than 15% compared with the same decoders running natively.
The storage cost of archived decoders, typically 30-130KB each, can be
amortized across many archived files sharing the same compression method.Comment: 14 pages, 7 figures, 2 table
Huddl: the Hydrographic Universal Data Description Language
Since many of the attempts to introduce a universal hydrographic data format have failed or have been only partially successful, a different approach is proposed. Our solution is the Hydrographic Universal Data Description Language (HUDDL), a descriptive XML-based language that permits the creation of a standardized description of (past, present, and future) data formats, and allows for applications like HUDDLER, a compiler that automatically creates drivers for data access and manipulation. HUDDL also represents a powerful solution for archiving data along with their structural description, as well as for cataloguing existing format specifications and their version control. HUDDL is intended to be an open, community-led initiative to simplify the issues involved in hydrographic data access
Recommended from our members
A workflow for digitalization projects
More and more institutions want to convert their traditional content to
digital formats. In such pro jects the digitalization and metadata stages
often happen asynchronously. This paper identifies the importance of
frequent cross-verification of both. We suggest a workflow to formalise
this process, and a possible technical implementation to automate this
workflow
BlogForever: D3.1 Preservation Strategy Report
This report describes preservation planning approaches and strategies recommended by the BlogForever project as a core component of a weblog repository design. More specifically, we start by discussing why we would want to preserve weblogs in the first place and what it is exactly that we are trying to preserve. We further present a review of past and present work and highlight why current practices in web archiving do not address the needs of weblog preservation adequately. We make three distinctive contributions in this volume: a) we propose transferable practical workflows for applying a combination of established metadata and repository standards in developing a weblog repository, b) we provide an automated approach to identifying significant properties of weblog content that uses the notion of communities and how this affects previous strategies, c) we propose a sustainability plan that draws upon community knowledge through innovative repository design
The Clarens Web Service Framework for Distributed Scientific Analysis in Grid Projects
Large scientific collaborations are moving towards service oriented architecutres for implementation and deployment of globally distributed systems. Clarens is a high performance, easy to deploy Web Service framework that supports the construction of such globally distributed systems. This paper discusses some of the core functionality of Clarens that the authors believe is important for building distributed systems based on Web Services that support scientific analysis
Data production models for the CDF experiment
The data production for the CDF experiment is conducted on a large Linux PC
farm designed to meet the needs of data collection at a maximum rate of 40
MByte/sec. We present two data production models that exploits advances in
computing and communication technology. The first production farm is a
centralized system that has achieved a stable data processing rate of
approximately 2 TByte per day. The recently upgraded farm is migrated to the
SAM (Sequential Access to data via Metadata) data handling system. The software
and hardware of the CDF production farms has been successful in providing large
computing and data throughput capacity to the experiment.Comment: 8 pages, 9 figures; presented at HPC Asia2005, Beijing, China, Nov 30
- Dec 3, 200
Ur/Web: A Simple Model for Programming the Web
The World Wide Web has evolved gradually from a document delivery platform to an architecture for distributed programming. This largely unplanned evolution is apparent in the set of interconnected languages and protocols that any Web application must manage. This paper presents Ur/Web, a domain-specific, statically typed functional programming language with a much simpler model for programming modern Web applications. Ur/Web's model is unified, where programs in a single programming language are compiled to other "Web standards" languages as needed; modular, supporting novel kinds of encapsulation of Web-specific state; and exposes simple concurrency, where programmers can reason about distributed, multithreaded applications via a mix of transactions and cooperative preemption. We give a tutorial introduction to the main features of Ur/Web, formalize the basic programming model with operational semantics, and discuss the language implementation and the production Web applications that use it.National Science Foundation (U.S.) (Grant CCF-1217501
- …