261 research outputs found
Cold Storage Data Archives: More Than Just a Bunch of Tapes
The abundance of available sensor and derived data from large scientific
experiments, such as earth observation programs, radio astronomy sky surveys,
and high-energy physics already exceeds the storage hardware globally
fabricated per year. To that end, cold storage data archives are the---often
overlooked---spearheads of modern big data analytics in scientific,
data-intensive application domains. While high-performance data analytics has
received much attention from the research community, the growing number of
problems in designing and deploying cold storage archives has only received
very little attention.
In this paper, we take the first step towards bridging this gap in knowledge
by presenting an analysis of four real-world cold storage archives from three
different application domains. In doing so, we highlight (i) workload
characteristics that differentiate these archives from traditional,
performance-sensitive data analytics, (ii) design trade-offs involved in
building cold storage systems for these archives, and (iii) deployment
trade-offs with respect to migration to the public cloud. Based on our
analysis, we discuss several other important research challenges that need to
be addressed by the data management community
The Design and Implementation of a High-Performance Log-Structured RAID System for ZNS SSDs
Zoned Namespace (ZNS) defines a new abstraction for host software to flexibly
manage storage in flash-based SSDs as append-only zones. It also provides a
Zone Append primitive to further boost the write performance of ZNS SSDs by
exploiting intra-zone parallelism. However, making Zone Append effective for
reliable and scalable storage, in the form of a RAID array of multiple ZNS
SSDs, is non-trivial since Zone Append offloads address management to ZNS SSDs
and requires hosts to dedicatedly manage RAID stripes across multiple drives.
We propose ZapRAID, a high-performance log-structured RAID system for ZNS SSDs
by carefully exploiting Zone Append to achieve high write parallelism and
lightweight stripe management. ZapRAID adopts a group-based data layout with a
coarse-grained ordering across multiple groups of stripes, such that it can use
small-size metadata for stripe management on a per-group basis under Zone
Append. It further adopts hybrid data management to simultaneously achieve
intra-zone and inter-zone parallelism through a careful combination of both
Zone Append and Zone Write primitives. We evaluate ZapRAID using
microbenchmarks, trace-driven experiments, and real-application experiments.
Our evaluation results show that ZapRAID achieves high write throughput and
maintains high performance in normal reads, degraded reads, crash recovery, and
full-drive recovery.Comment: 29 page
Towards Software-Defined Data Protection: GDPR Compliance at the Storage Layer is Within Reach
Enforcing data protection and privacy rules within large data processing
applications is becoming increasingly important, especially in the light of
GDPR and similar regulatory frameworks. Most modern data processing happens on
top of a distributed storage layer, and securing this layer against accidental
or malicious misuse is crucial to ensuring global privacy guarantees. However,
the performance overhead and the additional complexity for this is often
assumed to be significant -- in this work we describe a path forward that
tackles both challenges. We propose "Software-Defined Data Protection" (SDP),
an adoption of the "Software-Defined Storage" approach to non-performance
aspects: a trusted controller translates company and application-specific
policies to a set of rules deployed on the storage nodes. These, in turn, apply
the rules at line-rate but do not take any decisions on their own. Such an
approach decouples often changing policies from request-level enforcement and
allows storage nodes to implement the latter more efficiently.
Even though in-storage processing brings challenges, mainly because it can
jeopardize line-rate processing, we argue that today's Smart Storage solutions
can already implement the required functionality, thanks to the separation of
concerns introduced by SDP. We highlight the challenges that remain, especially
that of trusting the storage nodes. These need to be tackled before we can
reach widespread adoption in cloud environments
Elevating commodity storage with the SALSA host translation layer
To satisfy increasing storage demands in both capacity and performance,
industry has turned to multiple storage technologies, including Flash SSDs and
SMR disks. These devices employ a translation layer that conceals the
idiosyncrasies of their mediums and enables random access. Device translation
layers are, however, inherently constrained: resources on the drive are scarce,
they cannot be adapted to application requirements, and lack visibility across
multiple devices. As a result, performance and durability of many storage
devices is severely degraded.
In this paper, we present SALSA: a translation layer that executes on the
host and allows unmodified applications to better utilize commodity storage.
SALSA supports a wide range of single- and multi-device optimizations and,
because is implemented in software, can adapt to specific workloads. We
describe SALSA's design, and demonstrate its significant benefits using
microbenchmarks and case studies based on three applications: MySQL, the Swift
object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling,
Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS
- …