134,252 research outputs found
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Ontology-based data semantic management and application in IoT- and cloud-enabled smart homes
The application of emerging technologies of Internet of Things (IoT) and cloud computing have increasing the popularity of smart homes, along with which, large volumes of heterogeneous data have been generating by home entities. The representation, management and application of the continuously increasing amounts of heterogeneous data in the smart home data space have been critical challenges to the further development of smart home industry. To this end, a scheme for ontology-based data semantic management and application is proposed in this paper. Based on a smart home system model abstracted from the perspective of implementing users’ household operations, a general domain ontology model is designed by defining the correlative concepts, and a logical data semantic fusion model is designed accordingly. Subsequently, to achieve high-efficiency ontology data query and update in the implementation of the data semantic fusion model, a relational-database-based ontology data decomposition storage method is developed by thoroughly investigating existing storage modes, and the performance is demonstrated using a group of elaborated ontology data query and update operations. Comprehensively utilizing the stated achievements, ontology-based semantic reasoning with a specially designed semantic matching rule is studied as well in this work in an attempt to provide accurate and personalized home services, and the efficiency is demonstrated through experiments conducted on the developed testing system for user behavior reasoning
Implications of non-volatile memory as primary storage for database management systems
Traditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks is extremely high. To hide this latency, DRAM is used as an intermediate storage. DRAM is significantly faster than disk, but deployed in smaller capacities due to cost and power constraints, and without the necessary persistency feature that disks have. Non-Volatile Memory (NVM) is an emerging storage class technology which promises the best of both worlds. It can offer large storage capacities, due to better scaling and cost metrics than DRAM, and is non-volatile (persistent) like hard disks. At the same time, its data retrieval time is much lower than that of hard disks and it is also byte-addressable like DRAM. In this paper, we explore the implications of employing NVM as primary storage for DBMS. In other words, we investigate the modifications necessary to be applied on a traditional relational DBMS to take advantage of NVM features. As a case study, we have modified the storage engine (SE) of PostgreSQL enabling efficient use of NVM hardware. We detail the necessary changes and challenges such modifications entail and evaluate them using a comprehensive emulation platform. Results indicate that our modified SE reduces query execution time by up to 40% and 14.4% when compared to disk and NVM storage, with average reductions of 20.5% and 4.5%, respectively.The research leading to these results has received funding from the European Union’s 7th Framework Programme under grant agreement number 318633, the Ministry of Science and Technology of Spain under contract TIN2015-65316-P, and a HiPEAC collaboration grant awarded to Naveed Ul Mustafa.Peer ReviewedPostprint (author's final draft
Leveraging Non-Volatile Memory in Modern Storage Management Architectures
Non-volatile memory technologies (NVM) introduce a novel class of devices that combine characteristics of both storage and main memory. Like storage, NVM is not only persistent, but also denser and cheaper than DRAM. Like DRAM, NVM is byte-addressable and has lower access latency. In recent years, NVM has gained a lot of attention both in academia and in the data management industry, with views ranging from skepticism to over excitement. Some critics claim that NVM is not cheap enough to replace flash-based SSDs nor is it fast enough to replace DRAM, while others see it simply as a storage device. Supporters of NVM have observed that its low latency and byte-addressability requires radical changes and a complete rewrite of storage management architectures.
This thesis takes a moderate stance between these two views. We consider that, while NVM might not replace flash-based SSD or DRAM in the near future, it has the potential to reduce the gap between them. Furthermore, treating NVM as a regular storage media does not fully leverage its byte-addressability and low latency. On the other hand, completely redesigning systems to be NVM-centric is impractical. Proposals that attempt to leverage NVM to simplify storage management result in completely new architectures that face the same challenges that are already well-understood and addressed by the traditional architectures. Therefore, we take three common storage management architectures as a starting point, and propose incremental changes to enable them to better leverage NVM. First, in the context of log-structured merge-trees, we investigate the impact of storing data in NVM, and devise methods to enable small granularity accesses and NVM-aware caching policies. Second, in the context of B+Trees, we propose to extend the buffer pool and describe a technique based on the concept of optimistic consistency to handle corrupted pages in NVM. Third, we employ NVM to enable larger capacity and reduced costs in a index+log key-value store, and combine it with other techniques to build a system that achieves low tail latency. This thesis aims to describe and evaluate these techniques in order to enable storage management architectures to leverage NVM and achieve increased performance and lower costs, without major architectural changes.:1 Introduction
1.1 Non-Volatile Memory
1.2 Challenges
1.3 Non-Volatile Memory & Database Systems
1.4 Contributions and Outline
2 Background
2.1 Non-Volatile Memory
2.1.1 Types of NVM
2.1.2 Access Modes
2.1.3 Byte-addressability and Persistency
2.1.4 Performance
2.2 Related Work
2.3 Case Study: Persistent Tree Structures
2.3.1 Persistent Trees
2.3.2 Evaluation
3 Log-Structured Merge-Trees
3.1 LSM and NVM
3.2 LSM Architecture
3.2.1 LevelDB
3.3 Persistent Memory Environment
3.4 2Q Cache Policy for NVM
3.5 Evaluation
3.5.1 Write Performance
3.5.2 Read Performance
3.5.3 Mixed Workloads
3.6 Additional Case Study: RocksDB
3.6.1 Evaluation
4 B+Trees
4.1 B+Tree and NVM
4.1.1 Category #1: Buffer Extension
4.1.2 Category #2: DRAM Buffered Access
4.1.3 Category #3: Persistent Trees
4.2 Persistent Buffer Pool with Optimistic Consistency
4.2.1 Architecture and Assumptions
4.2.2 Embracing Corruption
4.3 Detecting Corruption
4.3.1 Embracing Corruption
4.4 Repairing Corruptions
4.5 Performance Evaluation and Expectations
4.5.1 Checksums Overhead
4.5.2 Runtime and Recovery
4.6 Discussion
5 Index+Log Key-Value Stores
5.1 The Case for Tail Latency
5.2 Goals and Overview
5.3 Execution Model
5.3.1 Reactive Systems and Actor Model
5.3.2 Message-Passing Communication
5.3.3 Cooperative Multitasking
5.4 Log-Structured Storage
5.5 Networking
5.6 Implementation Details
5.6.1 NVM Allocation on RStore
5.6.2 Log-Structured Storage and Indexing
5.6.3 Garbage Collection
5.6.4 Logging and Recovery
5.7 Systems Operations
5.8 Evaluation
5.8.1 Methodology
5.8.2 Environment
5.8.3 Other Systems
5.8.4 Throughput Scalability
5.8.5 Tail Latency
5.8.6 Scans
5.8.7 Memory Consumption
5.9 Related Work
6 Conclusion
Bibliography
A PiBenc
Power Management Techniques for Data Centers: A Survey
With growing use of internet and exponential growth in amount of data to be
stored and processed (known as 'big data'), the size of data centers has
greatly increased. This, however, has resulted in significant increase in the
power consumption of the data centers. For this reason, managing power
consumption of data centers has become essential. In this paper, we highlight
the need of achieving energy efficiency in data centers and survey several
recent architectural techniques designed for power management of data centers.
We also present a classification of these techniques based on their
characteristics. This paper aims to provide insights into the techniques for
improving energy efficiency of data centers and encourage the designers to
invent novel solutions for managing the large power dissipation of data
centers.Comment: Keywords: Data Centers, Power Management, Low-power Design, Energy
Efficiency, Green Computing, DVFS, Server Consolidatio
- …