1,984 research outputs found
A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack
With the ever-increasing amount of data generate in the world, estimated to
reach over 200 Zettabytes by 2025, pressure on efficient data storage systems
is intensifying. The shift from HDD to flash-based SSD provides one of the most
fundamental shifts in storage technology, increasing performance capabilities
significantly. However, flash storage comes with different characteristics than
prior HDD storage technology. Therefore, storage software was unsuitable for
leveraging the capabilities of flash storage. As a result, a plethora of
storage applications have been design to better integrate with flash storage
and align with flash characteristics.
In this literature study we evaluate the effect the introduction of flash
storage has had on the design of file systems, which providing one of the most
essential mechanisms for managing persistent storage. We analyze the mechanisms
for effectively managing flash storage, managing overheads of introduced design
requirements, and leverage the capabilities of flash storage. Numerous methods
have been adopted in file systems, however prominently revolve around similar
design decisions, adhering to the flash hardware constrains, and limiting
software intervention. Future design of storage software remains prominent with
the constant growth in flash-based storage devices and interfaces, providing an
increasing possibility to enhance flash integration in the host storage
software stack
A Survey on the Integration of NAND Flash Storage in the Design of File Systems and the Host Storage Software Stack
With the ever-increasing amount of data generate in the world, estimated to reach over 200 Zettabytes by 2025, pressure on efficient data storage systems is intensifying. The shift from HDD to flash-based SSD provides one of the most fundamental shifts in storage technology, increasing performance capabilities significantly. However, flash storage comes with different characteristics than prior HDD storage technology. Therefore, storage software was unsuitable for leveraging the capabilities of flash storage. As a result, a plethora of storage applications have been design to better integrate with flash storage and align with flash characteristics. In this literature study we evaluate the effect the introduction of flash storage has had on the design of file systems, which providing one of the most essential mechanisms for managing persistent storage. We analyze the mechanisms for effectively managing flash storage, managing overheads of introduced design requirements, and leverage the capabilities of flash storage. Numerous methods have been adopted in file systems, however prominently revolve around similar design decisions, adhering to the flash hardware constrains, and limiting software intervention. Future design of storage software remains prominent with the constant growth in flash-based storage devices and interfaces, providing an increasing possibility to enhance flash integration in the host storage software stack
DeltaFS: Pursuing Zero Update Overhead via Metadata-Enabled Delta Compression for Log-structured File System on Mobile Devices
Data compression has been widely adopted to release mobile devices from
intensive write pressure. Delta compression is particularly promising for its
high compression efficacy over conventional compression methods. However, this
method suffers from non-trivial system overheads incurred by delta maintenance
and read penalty, which prevents its applicability on mobile devices. To this
end, this paper proposes DeltaFS, a metadata-enabled Delta compression on
log-structured File System for mobile devices, to achieve utmost compressing
efficiency and zero hardware costs. DeltaFS smartly exploits the out-of-place
updating ability of Log-structured File System (LFS) to alleviate the problems
of write amplification, which is the key bottleneck for delta compression
implementation. Specifically, DeltaFS utilizes the inline area in file inodes
for delta maintenance with zero hardware cost, and integrates an inline area
manage strategy to improve the utilization of constrained inline area.
Moreover, a complimentary delta maintenance strategy is incorporated, which
selectively maintains delta chunks in the main data area to break through the
limitation of constrained inline area. Experimental results show that DeltaFS
substantially reduces write traffics by up to 64.8\%, and improves the I/O
performance by up to 37.3\%
Architectural Principles for Database Systems on Storage-Class Memory
Database systems have long been optimized to hide the higher latency of storage media, yielding complex persistence mechanisms. With the advent of large DRAM capacities, it became possible to keep a full copy of the data in DRAM. Systems that leverage this possibility, such as main-memory databases, keep two copies of the data in two different formats: one in main memory and the other one in storage. The two copies are kept synchronized using snapshotting and logging. This main-memory-centric architecture yields nearly two orders of magnitude faster analytical processing than traditional, disk-centric ones. The rise of Big Data emphasized the importance of such systems with an ever-increasing need for more main memory. However, DRAM is hitting its scalability limits: It is intrinsically hard to further increase its density.
Storage-Class Memory (SCM) is a group of novel memory technologies that promise to alleviate DRAM’s scalability limits. They combine the non-volatility, density, and economic characteristics of storage media with the byte-addressability and a latency close to that of DRAM. Therefore, SCM can serve as persistent main memory, thereby bridging the gap between main memory and storage. In this dissertation, we explore the impact of SCM as persistent main memory on database systems. Assuming a hybrid SCM-DRAM hardware architecture, we propose a novel software architecture for database systems that places primary data in SCM and directly operates on it, eliminating the need for explicit IO. This architecture yields many benefits: First, it obviates the need to reload data from storage to main memory during recovery, as data is discovered and accessed directly in SCM. Second, it allows replacing the traditional logging infrastructure by fine-grained, cheap micro-logging at data-structure level. Third, secondary data can be stored in DRAM and reconstructed during recovery. Fourth, system runtime information can be stored in SCM to improve recovery time. Finally, the system may retain and continue in-flight transactions in case of system failures.
However, SCM is no panacea as it raises unprecedented programming challenges. Given its byte-addressability and low latency, processors can access, read, modify, and persist data in SCM using load/store instructions at a CPU cache line granularity. The path from CPU registers to SCM is long and mostly volatile, including store buffers and CPU caches, leaving the programmer with little control over when data is persisted. Therefore, there is a need to enforce the order and durability of SCM writes using persistence primitives, such as cache line flushing instructions. This in turn creates new failure scenarios, such as missing or misplaced persistence primitives.
We devise several building blocks to overcome these challenges. First, we identify the programming challenges of SCM and present a sound programming model that solves them. Then, we tackle memory management, as the first required building block to build a database system, by designing a highly scalable SCM allocator, named PAllocator, that fulfills the versatile needs of database systems. Thereafter, we propose the FPTree, a highly scalable hybrid SCM-DRAM persistent B+-Tree that bridges the gap between the performance of transient and persistent B+-Trees. Using these building blocks, we realize our envisioned database architecture in SOFORT, a hybrid SCM-DRAM columnar transactional engine. We propose an SCM-optimized MVCC scheme that eliminates write-ahead logging from the critical path of transactions. Since SCM -resident data is near-instantly available upon recovery, the new recovery bottleneck is rebuilding DRAM-based data. To alleviate this bottleneck, we propose a novel recovery technique that achieves nearly instant responsiveness of the database by accepting queries right after recovering SCM -based data, while rebuilding DRAM -based data in the background. Additionally, SCM brings new failure scenarios that existing testing tools cannot detect. Hence, we propose an online testing framework that is able to automatically simulate power failures and detect missing or misplaced persistence primitives. Finally, our proposed building blocks can serve to build more complex systems, paving the way for future database systems on SCM
A hierarchal framework for recognising activities of daily life
PhDIn today’s working world the elderly who are dependent can sometimes be
neglected by society. Statistically, after toddlers it is the elderly who are observed
to have higher accident rates while performing everyday activities. Alzheimer’s
disease is one of the major impairments that elderly people suffer from, and leads
to the elderly person not being able to live an independent life due to forgetfulness.
One way to support elderly people who aspire to live an independent life and
remain safe in their home is to find out what activities the elderly person is
carrying out at a given time and provide appropriate assistance or institute
safeguards.
The aim of this research is to create improved methods to identify tasks related to
activities of daily life and determine a person’s current intentions and so reason
about that person’s future intentions. A novel hierarchal framework has been
developed, which recognises sensor events and maps them to significant activities
and intentions. As privacy is becoming a growing concern, the monitoring of an
individual’s behaviour can be seen as intrusive. Hence, the monitoring is based
around using simple non intrusive sensors and tags on everyday objects that are
used to perform daily activities around the home. Specifically there is no use of
any cameras or visual surveillance equipment, though the techniques developed
are still relevant in such a situation.
Models for task recognition and plan recognition have been developed and tested
on scenarios where the plans can be interwoven. Potential targets are people in the
first stages of Alzheimer’s disease and in the structuring of the library of kernel
plan sequences, typical routines used to sustain meaningful activity have been
used. Evaluations have been carried out using volunteers conducting activities of
daily life in an experimental home environment. The results generated from the
sensors have been interpreted and analysis of developed algorithms has been
made. The outcomes and findings of these experiments demonstrate that the
developed hierarchal framework is capable of carrying activity recognition as well
as being able to carry out intention analysis, e.g. predicting what activity they are
most likely to carry out next
Heterogeneous Anomaly Detection for Software Systems via Semi-supervised Cross-modal Attention
Prompt and accurate detection of system anomalies is essential to ensure the
reliability of software systems. Unlike manual efforts that exploit all
available run-time information, existing approaches usually leverage only a
single type of monitoring data (often logs or metrics) or fail to make
effective use of the joint information among different types of data.
Consequently, many false predictions occur. To better understand the
manifestations of system anomalies, we conduct a systematical study on a large
amount of heterogeneous data, i.e., logs and metrics. Our study demonstrates
that logs and metrics can manifest system anomalies collaboratively and
complementarily, and neither of them only is sufficient. Thus, integrating
heterogeneous data can help recover the complete picture of a system's health
status. In this context, we propose Hades, the first end-to-end semi-supervised
approach to effectively identify system anomalies based on heterogeneous data.
Our approach employs a hierarchical architecture to learn a global
representation of the system status by fusing log semantics and metric
patterns. It captures discriminative features and meaningful interactions from
heterogeneous data via a cross-modal attention module, trained in a
semi-supervised manner. We evaluate Hades extensively on large-scale simulated
data and datasets from Huawei Cloud. The experimental results present the
effectiveness of our model in detecting system anomalies. We also release the
code and the annotated dataset for replication and future research.Comment: In Proceedings of the 2023 IEEE/ACM 45th International Conference on
Software Engineering (ICSE). arXiv admin note: substantial text overlap with
arXiv:2207.0291
Audubon Data Project Final Report
The Audubon Data Project was initiated as a Clark University Capstone project. The project’s client, Mass Audubon’s Shaping the Future of Your Community program, had identified a need to improve their data management methods and make better use of their data. The Capstone team, composed of Clark University graduate students, met with the client regularly to review the current state of the data and potential improvements to be made. The process began with a data review. During the review we worked with the client to explicitly define the purposes and requirements of the data, the current process for updating and using the data, and the ways that different types of records were related to one another. After the review, we were able to identify the issues in the current system which we would seek to resolve. These included data integrity issues such as ensuring crucial items (such as a town name) were always included when entering data, and data structure issues such as having a relatively user-friendly way to express relationships and update records that were part of a relationship
- …