134,252 research outputs found

    Towards Exascale Scientific Metadata Management

    Full text link
    Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

    Ontology-based data semantic management and application in IoT- and cloud-enabled smart homes

    Get PDF
    The application of emerging technologies of Internet of Things (IoT) and cloud computing have increasing the popularity of smart homes, along with which, large volumes of heterogeneous data have been generating by home entities. The representation, management and application of the continuously increasing amounts of heterogeneous data in the smart home data space have been critical challenges to the further development of smart home industry. To this end, a scheme for ontology-based data semantic management and application is proposed in this paper. Based on a smart home system model abstracted from the perspective of implementing users’ household operations, a general domain ontology model is designed by defining the correlative concepts, and a logical data semantic fusion model is designed accordingly. Subsequently, to achieve high-efficiency ontology data query and update in the implementation of the data semantic fusion model, a relational-database-based ontology data decomposition storage method is developed by thoroughly investigating existing storage modes, and the performance is demonstrated using a group of elaborated ontology data query and update operations. Comprehensively utilizing the stated achievements, ontology-based semantic reasoning with a specially designed semantic matching rule is studied as well in this work in an attempt to provide accurate and personalized home services, and the efficiency is demonstrated through experiments conducted on the developed testing system for user behavior reasoning

    Implications of non-volatile memory as primary storage for database management systems

    Get PDF
    Traditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks is extremely high. To hide this latency, DRAM is used as an intermediate storage. DRAM is significantly faster than disk, but deployed in smaller capacities due to cost and power constraints, and without the necessary persistency feature that disks have. Non-Volatile Memory (NVM) is an emerging storage class technology which promises the best of both worlds. It can offer large storage capacities, due to better scaling and cost metrics than DRAM, and is non-volatile (persistent) like hard disks. At the same time, its data retrieval time is much lower than that of hard disks and it is also byte-addressable like DRAM. In this paper, we explore the implications of employing NVM as primary storage for DBMS. In other words, we investigate the modifications necessary to be applied on a traditional relational DBMS to take advantage of NVM features. As a case study, we have modified the storage engine (SE) of PostgreSQL enabling efficient use of NVM hardware. We detail the necessary changes and challenges such modifications entail and evaluate them using a comprehensive emulation platform. Results indicate that our modified SE reduces query execution time by up to 40% and 14.4% when compared to disk and NVM storage, with average reductions of 20.5% and 4.5%, respectively.The research leading to these results has received funding from the European Union’s 7th Framework Programme under grant agreement number 318633, the Ministry of Science and Technology of Spain under contract TIN2015-65316-P, and a HiPEAC collaboration grant awarded to Naveed Ul Mustafa.Peer ReviewedPostprint (author's final draft

    Leveraging Non-Volatile Memory in Modern Storage Management Architectures

    Get PDF
    Non-volatile memory technologies (NVM) introduce a novel class of devices that combine characteristics of both storage and main memory. Like storage, NVM is not only persistent, but also denser and cheaper than DRAM. Like DRAM, NVM is byte-addressable and has lower access latency. In recent years, NVM has gained a lot of attention both in academia and in the data management industry, with views ranging from skepticism to over excitement. Some critics claim that NVM is not cheap enough to replace flash-based SSDs nor is it fast enough to replace DRAM, while others see it simply as a storage device. Supporters of NVM have observed that its low latency and byte-addressability requires radical changes and a complete rewrite of storage management architectures. This thesis takes a moderate stance between these two views. We consider that, while NVM might not replace flash-based SSD or DRAM in the near future, it has the potential to reduce the gap between them. Furthermore, treating NVM as a regular storage media does not fully leverage its byte-addressability and low latency. On the other hand, completely redesigning systems to be NVM-centric is impractical. Proposals that attempt to leverage NVM to simplify storage management result in completely new architectures that face the same challenges that are already well-understood and addressed by the traditional architectures. Therefore, we take three common storage management architectures as a starting point, and propose incremental changes to enable them to better leverage NVM. First, in the context of log-structured merge-trees, we investigate the impact of storing data in NVM, and devise methods to enable small granularity accesses and NVM-aware caching policies. Second, in the context of B+Trees, we propose to extend the buffer pool and describe a technique based on the concept of optimistic consistency to handle corrupted pages in NVM. Third, we employ NVM to enable larger capacity and reduced costs in a index+log key-value store, and combine it with other techniques to build a system that achieves low tail latency. This thesis aims to describe and evaluate these techniques in order to enable storage management architectures to leverage NVM and achieve increased performance and lower costs, without major architectural changes.:1 Introduction 1.1 Non-Volatile Memory 1.2 Challenges 1.3 Non-Volatile Memory & Database Systems 1.4 Contributions and Outline 2 Background 2.1 Non-Volatile Memory 2.1.1 Types of NVM 2.1.2 Access Modes 2.1.3 Byte-addressability and Persistency 2.1.4 Performance 2.2 Related Work 2.3 Case Study: Persistent Tree Structures 2.3.1 Persistent Trees 2.3.2 Evaluation 3 Log-Structured Merge-Trees 3.1 LSM and NVM 3.2 LSM Architecture 3.2.1 LevelDB 3.3 Persistent Memory Environment 3.4 2Q Cache Policy for NVM 3.5 Evaluation 3.5.1 Write Performance 3.5.2 Read Performance 3.5.3 Mixed Workloads 3.6 Additional Case Study: RocksDB 3.6.1 Evaluation 4 B+Trees 4.1 B+Tree and NVM 4.1.1 Category #1: Buffer Extension 4.1.2 Category #2: DRAM Buffered Access 4.1.3 Category #3: Persistent Trees 4.2 Persistent Buffer Pool with Optimistic Consistency 4.2.1 Architecture and Assumptions 4.2.2 Embracing Corruption 4.3 Detecting Corruption 4.3.1 Embracing Corruption 4.4 Repairing Corruptions 4.5 Performance Evaluation and Expectations 4.5.1 Checksums Overhead 4.5.2 Runtime and Recovery 4.6 Discussion 5 Index+Log Key-Value Stores 5.1 The Case for Tail Latency 5.2 Goals and Overview 5.3 Execution Model 5.3.1 Reactive Systems and Actor Model 5.3.2 Message-Passing Communication 5.3.3 Cooperative Multitasking 5.4 Log-Structured Storage 5.5 Networking 5.6 Implementation Details 5.6.1 NVM Allocation on RStore 5.6.2 Log-Structured Storage and Indexing 5.6.3 Garbage Collection 5.6.4 Logging and Recovery 5.7 Systems Operations 5.8 Evaluation 5.8.1 Methodology 5.8.2 Environment 5.8.3 Other Systems 5.8.4 Throughput Scalability 5.8.5 Tail Latency 5.8.6 Scans 5.8.7 Memory Consumption 5.9 Related Work 6 Conclusion Bibliography A PiBenc

    Power Management Techniques for Data Centers: A Survey

    Full text link
    With growing use of internet and exponential growth in amount of data to be stored and processed (known as 'big data'), the size of data centers has greatly increased. This, however, has resulted in significant increase in the power consumption of the data centers. For this reason, managing power consumption of data centers has become essential. In this paper, we highlight the need of achieving energy efficiency in data centers and survey several recent architectural techniques designed for power management of data centers. We also present a classification of these techniques based on their characteristics. This paper aims to provide insights into the techniques for improving energy efficiency of data centers and encourage the designers to invent novel solutions for managing the large power dissipation of data centers.Comment: Keywords: Data Centers, Power Management, Low-power Design, Energy Efficiency, Green Computing, DVFS, Server Consolidatio
    • …
    corecore