7,395 research outputs found
Recommended from our members
PERSES: Data layout for low impact failures
Growth in disk capacity continues to outpace advances in read speed and device reliability. This has led to storage systems spending increasing amounts of time in a degraded state while failed disks reconstruct. Users and applications that do not use the data on the failed or degraded drives are negligibly impacted by the failure, increasing the perceived performance of the system. We leverage this observation with PERSES, a statistical data allocation scheme to reduce the performance impact of reconstruction after disk failure. PERSES reduces degradation from the perspective of the user by clustering data on disks such that data with high probability of co-access is placed on the same device as often as possible. Trace-driven simulations show that, by laying out data with PERSES, we can reduce the perceived time lost due to failure over three years by up to 80% compared to arbitrary allocation
Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques
A file system optimization is the most common task in the file system field.
Usually, it is seen as the key file system problem. Moreover, it is possible to
state that optimization is dominant in commercial development. A problem of a
new file system architecture development arises more frequently in academia.
End-user can treat file system performance as the key problem of file system
evolving as technology. Such understanding arises from common treatment of
persistent memory as slow subsystem. As a result, problem of improving
performance of data processing treats as a problem of file system performance
optimization. However, evolution of physical technologies of persistent data
storage requires significant changing of concepts and approaches of file
systems' internal techniques. Generally speaking, only trying to improve the
file system efficiency cannot resolve all issue of file systems as
technological direction. Moreover, it can impede evolution of file system
technology at whole. It is impossible to satisfy end-user's expectations by
means of file systems optimization only. New persistent storage technologies
can question about file systems necessity at whole without suggestion of
revolutionary new file system's approaches. However, file system contains
paradigm of information structuring that is very important for end-user as a
human being. It needs to distinguish the two classes of tasks: (1) optimization
task; (2) task of elaboration a new architecture vision or paradigm. But,
frequently, project goal degenerates into optimization task which is meant
really elaboration of a new paradigm. End-user expectations are complex and
contradictory set of requirements. Only optimization tasks cannot resolve the
all current needs of end-user in the file system field. End-user's expectations
require resolving tasks of a new architecture vision or paradigm elaboration
Mirrored and Hybrid Disk Arrays: Organization, Scheduling, Reliability, and Performance
Basic mirroring (BM) classified as RAID level 1 replicates data on two disks,
thus doubling disk access bandwidth for read requests. RAID1/0 is an array of
BM pairs with balanced loads due to striping. When a disk fails the read load
on its pair is doubled, which results in halving the maximum attainable
bandwidth. We review RAID1 organizations which attain a balanced load upon disk
failure, but as shown by reliability analysis tend to be less reliable than
RAID1/0. Hybrid disk arrays which store XORed instead of replicated data tend
to have a higher reliability than mirrored disks, but incur a higher overhead
in updating data. Read request response time can be improved by processing them
at a higher priority than writes, since they have a direct effect on
application response time. Shortest seek distance and affinity based routing
both shorten seek time. Anticipatory arm placement places arms optimally to
minimize the seek distance. The analysis of RAID1 in normal, degraded, and
rebuild mode is provided to quantify RAID1/0 performance. We compare the
reliability of mirrored disk organizations against each other and hybrid disks
and erasure coded disk arrays
Authenticated Key-Value Stores with Hardware Enclaves
Authenticated data storage on an untrusted platform is an important computing
paradigm for cloud applications ranging from big-data outsourcing, to
cryptocurrency and certificate transparency log. These modern applications
increasingly feature update-intensive workloads, whereas existing authenticated
data structures (ADSs) designed with in-place updates are inefficient to handle
such workloads. In this paper, we address this issue and propose a novel
authenticated log-structured merge tree (eLSM) based key-value store by
leveraging Intel SGX enclaves.
We present a system design that runs the code of eLSM store inside enclave.
To circumvent the limited enclave memory (128 MB with the latest Intel CPUs),
we propose to place the memory buffer of the eLSM store outside the enclave and
protect the buffer using a new authenticated data structure by digesting
individual LSM-tree levels. We design protocols to support query authentication
in data integrity, completeness (under range queries), and freshness. The proof
in our protocol is made small by including only the Merkle proofs at selective
levels.
We implement eLSM on top of Google LevelDB and Facebook RocksDB with minimal
code change and performance interference. We evaluate the performance of eLSM
under the YCSB workload benchmark and show a performance advantage of up to
4.5X speedup.Comment: eLSM, Enclave, key-value store, ADS, 18 page
Page Cache Attacks
We present a new hardware-agnostic side-channel attack that targets one of
the most fundamental software caches in modern computer systems: the operating
system page cache. The page cache is a pure software cache that contains all
disk-backed pages, including program binaries, shared libraries, and other
files, and our attacks thus work across cores and CPUs. Our side-channel
permits unprivileged monitoring of some memory accesses of other processes,
with a spatial resolution of 4KB and a temporal resolution of 2 microseconds on
Linux (restricted to 6.7 measurements per second) and 466 nanoseconds on
Windows (restricted to 223 measurements per second); this is roughly the same
order of magnitude as the current state-of-the-art cache attacks. We
systematically analyze our side channel by demonstrating different local
attacks, including a sandbox bypassing high-speed covert channel, timed
user-interface redressing attacks, and an attack recovering automatically
generated temporary passwords. We further show that we can trade off the side
channel's hardware agnostic property for remote exploitability. We demonstrate
this via a low profile remote covert channel that uses this page-cache
side-channel to exfiltrate information from a malicious sender process through
innocuous server requests. Finally, we propose mitigations for some of our
attacks, which have been acknowledged by operating system vendors and slated
for future security patches
Simulating Data Access Profiles of Computational Jobs in Data Grids
The data access patterns of applications running in computing grids are
changing due to the recent proliferation of high speed local and wide area
networks. The data-intensive jobs are no longer strictly required to run at the
computing sites, where the respective input data are located. Instead, jobs may
access the data employing arbitrary combinations of data-placement, stage-in
and remote data access. These data access profiles exhibit partially
non-overlapping throughput bottlenecks. This fact can be exploited in order to
minimize the time jobs spend waiting for input data. In this work we present a
novel grid computing simulator, which puts a heavy emphasis on the various data
access profiles. The fundamental assumptions underlying our simulator are
justified by empirical experiments performed in the Worldwide LHC Computing
Grid (WLCG) at CERN. We demonstrate how to calibrate the simulator parameters
in accordance with the true system using posterior inference with
likelihood-free Markov Chain Monte Carlo. Thereafter, we validate the
simulator's output with respect to an authentic production workload from WLCG,
demonstrating its remarkable accuracy
ReCA: an Efficient Reconfigurable Cache Architecture for Storage Systems with Online Workload Characterization
In recent years, SSDs have gained tremendous attention in computing and
storage systems due to significant performance improvement over HDDs. The cost
per capacity of SSDs, however, prevents them from entirely replacing HDDs in
such systems. One approach to effectively take advantage of SSDs is to use them
as a caching layer to store performance critical data blocks to reduce the
number of accesses to disk subsystem. Due to characteristics of Flash-based
SSDs such as limited write endurance and long latency on write operations,
employing caching algorithms at the Operating System (OS) level necessitates to
take such characteristics into consideration. Previous caching techniques are
optimized towards only one type of application, which affects both generality
and applicability. In addition, they are not adaptive when the workload pattern
changes over time. This paper presents an efficient Reconfigurable Cache
Architecture (ReCA) for storage systems using a comprehensive workload
characterization to find an optimal cache configuration for I/O intensive
applications. For this purpose, we first investigate various types of I/O
workloads and classify them into five major classes. Based on this
characterization, an optimal cache configuration is presented for each class of
workloads. Then, using the main features of each class, we continuously monitor
the characteristics of an application during system runtime and the cache
organization is reconfigured if the application changes from one class to
another class of workloads. The cache reconfiguration is done online and
workload classes can be extended to emerging I/O workloads in order to maintain
its efficiency with the characteristics of I/O requests. Experimental results
obtained by implementing ReCA in a server running Linux show that the proposed
architecture improves performance and lifetime up to 24\% and 33\%,
respectively
Bandana: Using Non-volatile Memory for Storing Deep Learning Models
Typical large-scale recommender systems use deep learning models that are
stored on a large amount of DRAM. These models often rely on embeddings, which
consume most of the required memory. We present Bandana, a storage system that
reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as
the primary storage medium, with a small amount of DRAM as cache. The main
challenge in storing embeddings on NVM is its limited read bandwidth compared
to DRAM. Bandana uses two primary techniques to address this limitation: first,
it stores embedding vectors that are likely to be read together in the same
physical location, using hypergraph partitioning, and second, it decides the
number of embedding vectors to cache in DRAM by simulating dozens of small
caches. These techniques allow Bandana to increase the effective read bandwidth
of NVM by 2-3x and thereby significantly reduce the total cost of ownership
A Survey on Tiering and Caching in High-Performance Storage Systems
Although every individual invented storage technology made a big step towards
perfection, none of them is spotless. Different data store essentials such as
performance, availability, and recovery requirements have not met together in a
single economically affordable medium, yet. One of the most influential factors
is price. So, there has always been a trade-off between having a desired set of
storage choices and the costs. To address this issue, a network of various
types of storing media is used to deliver the high performance of expensive
devices such as solid state drives and non-volatile memories, along with the
high capacity of inexpensive ones like hard disk drives. In software, caching
and tiering are long-established concepts for handling file operations and
moving data automatically within such a storage network and manage data backup
in low-cost media. Intelligently moving data around different devices based on
the needs is the key insight for this matter. In this survey, we discuss some
recent pieces of research that have been done to improve high-performance
storage systems with caching and tiering techniques.Comment: Ph.D. Research Exam Repor
knor: A NUMA-Optimized In-Memory, Distributed and Semi-External-Memory k-means Library
k-means is one of the most influential and utilized machine learning
algorithms. Its computation limits the performance and scalability of many
statistical analysis and machine learning tasks. We rethink and optimize
k-means in terms of modern NUMA architectures to develop a novel
parallelization scheme that delays and minimizes synchronization barriers. The
\textit{k-means NUMA Optimized Routine} (\textsf{knor}) library has (i)
in-memory (\textsf{knori}), (ii) distributed memory (\textsf{knord}), and (iii)
semi-external memory (\textsf{knors}) modules that radically improve the
performance of k-means for varying memory and hardware budgets. \textsf{knori}
boosts performance for single machine datasets by an order of magnitude or
more. \textsf{knors} improves the scalability of k-means on a memory budget
using SSDs. \textsf{knors} scales to billions of points on a single machine,
using a fraction of the resources that distributed in-memory systems require.
\textsf{knord} retains \textsf{knori}'s performance characteristics, while
scaling in-memory through distributed computation in the cloud. \textsf{knor}
modifies Elkan's triangle inequality pruning algorithm such that we utilize it
on billion-point datasets without the significant memory overhead of the
original algorithm. We demonstrate \textsf{knor} outperforms distributed
commercial products like HO, Turi (formerly Dato, GraphLab) and Spark's
MLlib by more than an order of magnitude for datasets of to
points
- …