11,953 research outputs found
STORM: FUNCTIONAL DESCRIPTION
The StoRM service is a storage resource manager for generic disk based storage systems separating the data management layer from the underlying storage system
STORM: FUNCTIONAL DESCRIPTION
The StoRM service is a storage resource manager for generic disk based storage systems separating the data management layer from the underlying storage system
Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID
Evaluating the performance of scientific data processing systems is a
difficult task considering the plethora of application-specific solutions
available in this landscape and the lack of a generally-accepted benchmark. The
dual structure of scientific data coupled with the complex nature of processing
complicate the evaluation procedure further. SS-DB is the first attempt to
define a general benchmark for complex scientific processing over raw and
derived data. It fails to draw sufficient attention though because of the
ambiguous plain language specification and the extraordinary SciDB results. In
this paper, we remedy the shortcomings of the original SS-DB specification by
providing a formal representation in terms of ArrayQL algebra operators and
ArrayQL/SciQL constructs. These are the first formal representations of the
SS-DB benchmark. Starting from the formal representation, we give a reference
implementation and present benchmark results in EXTASCID, a novel system for
scientific data processing. EXTASCID is complete in providing native support
both for array and relational data and extensible in executing any user code
inside the system by the means of a configurable metaoperator. These features
result in an order of magnitude improvement over SciDB at data loading,
extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure
LogBase: A Scalable Log-structured Database System in the Cloud
Numerous applications such as financial transactions (e.g., stock trading)
are write-heavy in nature. The shift from reads to writes in web applications
has also been accelerating in recent years. Write-ahead-logging is a common
approach for providing recovery capability while improving performance in most
storage systems. However, the separation of log and application data incurs
write overheads observed in write-heavy environments and hence adversely
affects the write throughput and recovery time in the system. In this paper, we
introduce LogBase - a scalable log-structured database system that adopts
log-only storage for removing the write bottleneck and supporting fast system
recovery. LogBase is designed to be dynamically deployed on commodity clusters
to take advantage of elastic scaling property of cloud environments. LogBase
provides in-memory multiversion indexes for supporting efficient access to data
maintained in the log. LogBase also supports transactions that bundle read and
write operations spanning across multiple records. We implemented the proposed
system and compared it with HBase and a disk-based log-structured
record-oriented system modeled after RAMCloud. The experimental results show
that LogBase is able to provide sustained write throughput, efficient data
access out of the cache, and effective system recovery.Comment: VLDB201
The swiss army knife of job submission tools: grid-control
Grid-control is a lightweight and highly portable open source submission tool
that supports virtually all workflows in high energy physics (HEP). Since 2007
it has been used by a sizeable number of HEP analyses to process tasks that
sometimes consist of up 100k jobs. grid-control is built around a powerful
plugin and configuration system, that allows users to easily specify all
aspects of the desired workflow. Job submission to a wide range of local or
remote batch systems or grid middleware is supported. Tasks can be conveniently
specified through the parameter space that will be processed, which can consist
of any number of variables and data sources with complex dependencies on each
other. Dataset information is processed through a configurable pipeline of
dataset filters, partition plugins and partition filters. The partition plugins
can take the number of files, size of the work units, metadata or combinations
thereof into account. All changes to the input datasets or variables are
propagated through the processing pipeline and can transparently trigger
adjustments to the parameter space and the job submission. While the core
functionality is completely experiment independent, integration with the CMS
computing environment is provided by a small set of plugins.Comment: 8 pages, 7 figures, Proceedings for the 22nd International Conference
on Computing in High Energy and Nuclear Physic
EMASS (trademark): An expandable solution for NASA space data storage needs
The data acquisition, distribution, processing, and archiving requirements of NASA and other U.S. Government data centers present significant data management challenges that must be met in the 1990's. The Earth Observing System (EOS) project alone is expected to generate daily data volumes greater than 2 Terabytes (2 x 10(exp 12) Bytes). As the scientific community makes use of this data, their work will result in larger, increasingly complex data sets to be further exploited and managed. The challenge for data storage systems is to satisfy the initial data management requirements with cost effective solutions that provide for planned growth. The expendable architecture of the E-Systems Modular Automated Storage System (EMASS(TM)), a mass storage system which is designed to support NASA's data capture, storage, distribution, and management requirements into the 21st century is described
- …