1,984 research outputs found

    A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack

    Full text link
    With the ever-increasing amount of data generate in the world, estimated to reach over 200 Zettabytes by 2025, pressure on efficient data storage systems is intensifying. The shift from HDD to flash-based SSD provides one of the most fundamental shifts in storage technology, increasing performance capabilities significantly. However, flash storage comes with different characteristics than prior HDD storage technology. Therefore, storage software was unsuitable for leveraging the capabilities of flash storage. As a result, a plethora of storage applications have been design to better integrate with flash storage and align with flash characteristics. In this literature study we evaluate the effect the introduction of flash storage has had on the design of file systems, which providing one of the most essential mechanisms for managing persistent storage. We analyze the mechanisms for effectively managing flash storage, managing overheads of introduced design requirements, and leverage the capabilities of flash storage. Numerous methods have been adopted in file systems, however prominently revolve around similar design decisions, adhering to the flash hardware constrains, and limiting software intervention. Future design of storage software remains prominent with the constant growth in flash-based storage devices and interfaces, providing an increasing possibility to enhance flash integration in the host storage software stack

    A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack

    Get PDF
    With the ever-increasing amount of data generate in the world, estimated to reach over 200 Zettabytes by 2025, pressure on efficient data storage systems is intensifying. The shift from HDD to flash-based SSD provides one of the most fundamental shifts in storage technology, increasing performance capabilities significantly. However, flash storage comes with different characteristics than prior HDD storage technology. Therefore, storage software was unsuitable for leveraging the capabilities of flash storage. As a result, a plethora of storage applications have been design to better integrate with flash storage and align with flash characteristics. In this literature study we evaluate the effect the introduction of flash storage has had on the design of file systems, which providing one of the most essential mechanisms for managing persistent storage. We analyze the mechanisms for effectively managing flash storage, managing overheads of introduced design requirements, and leverage the capabilities of flash storage. Numerous methods have been adopted in file systems, however prominently revolve around similar design decisions, adhering to the flash hardware constrains, and limiting software intervention. Future design of storage software remains prominent with the constant growth in flash-based storage devices and interfaces, providing an increasing possibility to enhance flash integration in the host storage software stack

    DeltaFS: Pursuing Zero Update Overhead via Metadata-Enabled Delta Compression for Log-structured File System on Mobile Devices

    Full text link
    Data compression has been widely adopted to release mobile devices from intensive write pressure. Delta compression is particularly promising for its high compression efficacy over conventional compression methods. However, this method suffers from non-trivial system overheads incurred by delta maintenance and read penalty, which prevents its applicability on mobile devices. To this end, this paper proposes DeltaFS, a metadata-enabled Delta compression on log-structured File System for mobile devices, to achieve utmost compressing efficiency and zero hardware costs. DeltaFS smartly exploits the out-of-place updating ability of Log-structured File System (LFS) to alleviate the problems of write amplification, which is the key bottleneck for delta compression implementation. Specifically, DeltaFS utilizes the inline area in file inodes for delta maintenance with zero hardware cost, and integrates an inline area manage strategy to improve the utilization of constrained inline area. Moreover, a complimentary delta maintenance strategy is incorporated, which selectively maintains delta chunks in the main data area to break through the limitation of constrained inline area. Experimental results show that DeltaFS substantially reduces write traffics by up to 64.8\%, and improves the I/O performance by up to 37.3\%

    Architectural Principles for Database Systems on Storage-Class Memory

    Get PDF
    Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density. Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures. However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives. We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM

    A hierarchal framework for recognising activities of daily life

    Get PDF
    PhDIn today’s working world the elderly who are dependent can sometimes be neglected by society. Statistically, after toddlers it is the elderly who are observed to have higher accident rates while performing everyday activities. Alzheimer’s disease is one of the major impairments that elderly people suffer from, and leads to the elderly person not being able to live an independent life due to forgetfulness. One way to support elderly people who aspire to live an independent life and remain safe in their home is to find out what activities the elderly person is carrying out at a given time and provide appropriate assistance or institute safeguards. The aim of this research is to create improved methods to identify tasks related to activities of daily life and determine a person’s current intentions and so reason about that person’s future intentions. A novel hierarchal framework has been developed, which recognises sensor events and maps them to significant activities and intentions. As privacy is becoming a growing concern, the monitoring of an individual’s behaviour can be seen as intrusive. Hence, the monitoring is based around using simple non intrusive sensors and tags on everyday objects that are used to perform daily activities around the home. Specifically there is no use of any cameras or visual surveillance equipment, though the techniques developed are still relevant in such a situation. Models for task recognition and plan recognition have been developed and tested on scenarios where the plans can be interwoven. Potential targets are people in the first stages of Alzheimer’s disease and in the structuring of the library of kernel plan sequences, typical routines used to sustain meaningful activity have been used. Evaluations have been carried out using volunteers conducting activities of daily life in an experimental home environment. The results generated from the sensors have been interpreted and analysis of developed algorithms has been made. The outcomes and findings of these experiments demonstrate that the developed hierarchal framework is capable of carrying activity recognition as well as being able to carry out intention analysis, e.g. predicting what activity they are most likely to carry out next

    Heterogeneous Anomaly Detection for Software Systems via Semi-supervised Cross-modal Attention

    Full text link
    Prompt and accurate detection of system anomalies is essential to ensure the reliability of software systems. Unlike manual efforts that exploit all available run-time information, existing approaches usually leverage only a single type of monitoring data (often logs or metrics) or fail to make effective use of the joint information among different types of data. Consequently, many false predictions occur. To better understand the manifestations of system anomalies, we conduct a systematical study on a large amount of heterogeneous data, i.e., logs and metrics. Our study demonstrates that logs and metrics can manifest system anomalies collaboratively and complementarily, and neither of them only is sufficient. Thus, integrating heterogeneous data can help recover the complete picture of a system's health status. In this context, we propose Hades, the first end-to-end semi-supervised approach to effectively identify system anomalies based on heterogeneous data. Our approach employs a hierarchical architecture to learn a global representation of the system status by fusing log semantics and metric patterns. It captures discriminative features and meaningful interactions from heterogeneous data via a cross-modal attention module, trained in a semi-supervised manner. We evaluate Hades extensively on large-scale simulated data and datasets from Huawei Cloud. The experimental results present the effectiveness of our model in detecting system anomalies. We also release the code and the annotated dataset for replication and future research.Comment: In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). arXiv admin note: substantial text overlap with arXiv:2207.0291

    Audubon Data Project Final Report

    Get PDF
    The Audubon Data Project was initiated as a Clark University Capstone project. The project’s client, Mass Audubon’s Shaping the Future of Your Community program, had identified a need to improve their data management methods and make better use of their data. The Capstone team, composed of Clark University graduate students, met with the client regularly to review the current state of the data and potential improvements to be made. The process began with a data review. During the review we worked with the client to explicitly define the purposes and requirements of the data, the current process for updating and using the data, and the ways that different types of records were related to one another. After the review, we were able to identify the issues in the current system which we would seek to resolve. These included data integrity issues such as ensuring crucial items (such as a town name) were always included when entering data, and data structure issues such as having a relatively user-friendly way to express relationships and update records that were part of a relationship

    Process Mining for Smart Product Design

    Get PDF
    corecore