123,936 research outputs found
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE's scalability to very large numbers of tasks on
two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and
Computin
The LCG POOL Project, General Overview and Project Structure
The POOL project has been created to implement a common persistency framework
for the LHC Computing Grid (LCG) application area. POOL is tasked to store
experiment data and meta data in the multi Petabyte area in a distributed and
grid enabled way. First production use of new framework is expected for summer
2003. The project follows a hybrid approach combining C++ Object streaming
technology such as ROOT I/O for the bulk data with a transactionally safe
relational database (RDBMS) store such as MySQL. POOL is based a strict
component approach - as laid down in the LCG persistency and blue print RTAG
documents - providing navigational access to distributed data without exposing
details of the particular storage technology. This contribution describes the
project breakdown into work packages, the high level interaction between the
main pool components and summarizes current status and plans.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 5 pages. PSN MOKT00
Optimized mobile thin clients through a MPEG-4 BiFS semantic remote display framework
According to the thin client computing principle, the user interface is physically separated from the application logic. In practice only a viewer component is executed on the client device, rendering the display updates received from the distant application server and capturing the user interaction. Existing remote display frameworks are not optimized to encode the complex scenes of modern applications, which are composed of objects with very diverse graphical characteristics. In order to tackle this challenge, we propose to transfer to the client, in addition to the binary encoded objects, semantic information about the characteristics of each object. Through this semantic knowledge, the client is enabled to react autonomously on user input and does not have to wait for the display update from the server. Resulting in a reduction of the interaction latency and a mitigation of the bursty remote display traffic pattern, the presented framework is of particular interest in a wireless context, where the bandwidth is limited and expensive. In this paper, we describe a generic architecture of a semantic remote display framework. Furthermore, we have developed a prototype using the MPEG-4 Binary Format for Scenes to convey the semantic information to the client. We experimentally compare the bandwidth consumption of MPEG-4 BiFS with existing, non-semantic, remote display frameworks. In a text editing scenario, we realize an average reduction of 23% of the data peaks that are observed in remote display protocol traffic
AliEnFS - a Linux File System for the AliEn Grid Services
Among the services offered by the AliEn (ALICE Environment
http://alien.cern.ch) Grid framework there is a virtual file catalogue to allow
transparent access to distributed data-sets using various file transfer
protocols. (AliEn File System) integrates the AliEn file catalogue as
a new file system type into the Linux kernel using LUFS, a hybrid user space
file system framework (Open Source http://lufs.sourceforge.net). LUFS uses a
special kernel interface level called VFS (Virtual File System Switch) to
communicate via a generalised file system interface to the AliEn file system
daemon. The AliEn framework is used for authentication, catalogue browsing,
file registration and read/write transfer operations. A C++ API implements the
generic file system operations. The goal of AliEnFS is to allow users easy
interactive access to a worldwide distributed virtual file system using
familiar shell commands (f.e. cp,ls,rm ...) The paper discusses general aspects
of Grid File Systems, the AliEn implementation and present and future
developments for the AliEn Grid File System.Comment: 9 pages, 12 figure
Adapting SAM for CDF
The CDF and D0 experiments probe the high-energy frontier and as they do so
have accumulated hundreds of Terabytes of data on the way to petabytes of data
over the next two years. The experiments have made a commitment to use the
developing Grid based on the SAM system to handle these data. The D0 SAM has
been extended for use in CDF as common patterns of design emerged to meet the
similar requirements of these experiments. The process by which the merger was
achieved is explained with particular emphasis on lessons learned concerning
the database design patterns plus realization of the use cases.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 4 pages, pdf format, TUAT00
Algorithm Diversity for Resilient Systems
Diversity can significantly increase the resilience of systems, by reducing
the prevalence of shared vulnerabilities and making vulnerabilities harder to
exploit. Work on software diversity for security typically creates variants of
a program using low-level code transformations. This paper is the first to
study algorithm diversity for resilience. We first describe how a method based
on high-level invariants and systematic incrementalization can be used to
create algorithm variants. Executing multiple variants in parallel and
comparing their outputs provides greater resilience than executing one variant.
To prevent different parallel schedules from causing variants' behaviors to
diverge, we present a synchronized execution algorithm for DistAlgo, an
extension of Python for high-level, precise, executable specifications of
distributed algorithms. We propose static and dynamic metrics for measuring
diversity. An experimental evaluation of algorithm diversity combined with
implementation-level diversity for several sequential algorithms and
distributed algorithms shows the benefits of algorithm diversity
Guppy: Process-Oriented Programming on Embedded Devices
Guppy is a new and experimental process-oriented programming language, taking much inspiration (and some code-base) from the existing occam-pi language. This paper reports on a variety of aspects related to this, specifically language, compiler and run-time system development, enabling Guppy programs to run on desktop and embedded systems. A native code-generation approach is taken, using C as the intermediate language, and with stack-space requirements determined at compile-time
- …