11,953 research outputs found

    STORM: FUNCTIONAL DESCRIPTION

    Get PDF
    The StoRM service is a storage resource manager for generic disk based storage systems separating the data management layer from the underlying storage system

    STORM: FUNCTIONAL DESCRIPTION

    Get PDF
    The StoRM service is a storage resource manager for generic disk based storage systems separating the data management layer from the underlying storage system

    Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID

    Full text link
    Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we give a reference implementation and present benchmark results in EXTASCID, a novel system for scientific data processing. EXTASCID is complete in providing native support both for array and relational data and extensible in executing any user code inside the system by the means of a configurable metaoperator. These features result in an order of magnitude improvement over SciDB at data loading, extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure

    LogBase: A Scalable Log-structured Database System in the Cloud

    Full text link
    Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads observed in write-heavy environments and hence adversely affects the write throughput and recovery time in the system. In this paper, we introduce LogBase - a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery. LogBase is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments. LogBase provides in-memory multiversion indexes for supporting efficient access to data maintained in the log. LogBase also supports transactions that bundle read and write operations spanning across multiple records. We implemented the proposed system and compared it with HBase and a disk-based log-structured record-oriented system modeled after RAMCloud. The experimental results show that LogBase is able to provide sustained write throughput, efficient data access out of the cache, and effective system recovery.Comment: VLDB201

    The swiss army knife of job submission tools: grid-control

    Get PDF
    Grid-control is a lightweight and highly portable open source submission tool that supports virtually all workflows in high energy physics (HEP). Since 2007 it has been used by a sizeable number of HEP analyses to process tasks that sometimes consist of up 100k jobs. grid-control is built around a powerful plugin and configuration system, that allows users to easily specify all aspects of the desired workflow. Job submission to a wide range of local or remote batch systems or grid middleware is supported. Tasks can be conveniently specified through the parameter space that will be processed, which can consist of any number of variables and data sources with complex dependencies on each other. Dataset information is processed through a configurable pipeline of dataset filters, partition plugins and partition filters. The partition plugins can take the number of files, size of the work units, metadata or combinations thereof into account. All changes to the input datasets or variables are propagated through the processing pipeline and can transparently trigger adjustments to the parameter space and the job submission. While the core functionality is completely experiment independent, integration with the CMS computing environment is provided by a small set of plugins.Comment: 8 pages, 7 figures, Proceedings for the 22nd International Conference on Computing in High Energy and Nuclear Physic

    EMASS (trademark): An expandable solution for NASA space data storage needs

    Get PDF
    The data acquisition, distribution, processing, and archiving requirements of NASA and other U.S. Government data centers present significant data management challenges that must be met in the 1990's. The Earth Observing System (EOS) project alone is expected to generate daily data volumes greater than 2 Terabytes (2 x 10(exp 12) Bytes). As the scientific community makes use of this data, their work will result in larger, increasingly complex data sets to be further exploited and managed. The challenge for data storage systems is to satisfy the initial data management requirements with cost effective solutions that provide for planned growth. The expendable architecture of the E-Systems Modular Automated Storage System (EMASS(TM)), a mass storage system which is designed to support NASA's data capture, storage, distribution, and management requirements into the 21st century is described
    corecore