7,878 research outputs found
On Failure Diagnosis of the Storage Stack
Diagnosing storage system failures is challenging even for professionals. One
example is the "When Solid State Drives Are Not That Solid" incident occurred
at Algolia data center, where Samsung SSDs were mistakenly blamed for failures
caused by a Linux kernel bug. With the system complexity keeps increasing, such
obscure failures will likely occur more often. As one step to address the
challenge, we present our on-going efforts called X-Ray. Different from
traditional methods that focus on either the software or the hardware, X-Ray
leverages virtualization to collects events across layers, and correlates them
to generate a correlation tree. Moreover, by applying simple rules, X-Ray can
highlight critical nodes automatically. Preliminary results based on 5 failure
cases shows that X-Ray can effectively narrow down the search space for
failures
Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing
We present a new approach to testing file-system crash consistency: bounded
black-box crash testing (B3). B3 tests the file system in a black-box manner
using workloads of file-system operations. Since the space of possible
workloads is infinite, B3 bounds this space based on parameters such as the
number of file-system operations or which operations to include, and
exhaustively generates workloads within this bounded space. Each workload is
tested on the target file system by simulating power-loss crashes while the
workload is being executed, and checking if the file system recovers to a
correct state after each crash. B3 builds upon insights derived from our study
of crash-consistency bugs reported in Linux file systems in the last five
years. We observed that most reported bugs can be reproduced using small
workloads of three or fewer file-system operations on a newly-created file
system, and that all reported bugs result from crashes after fsync() related
system calls. We build two tools, CrashMonkey and ACE, to demonstrate the
effectiveness of this approach. Our tools are able to find 24 out of the 26
crash-consistency bugs reported in the last five years. Our tools also revealed
10 new crash-consistency bugs in widely-used, mature Linux file systems, seven
of which existed in the kernel since 2014. Our tools also found a
crash-consistency bug in a verified file system, FSCQ. The new bugs result in
severe consequences like broken rename atomicity and loss of persisted files
Intrusion Detection A Text Mining Based Approach
Intrusion Detection is one of major threats for organization. The approach of
intrusion detection using text processing has been one of research interests
which is gaining significant importance from researchers. In text mining based
approach for intrusion detection, system calls serve as source for mining and
predicting possibility of intrusion or attack. When an application runs, there
might be several system calls which are initiated in the background. These
system calls form the strong basis and the deciding factor for intrusion
detection. In this paper, we mainly discuss the approach for intrusion
detection by designing a distance measure which is designed by taking into
consideration the conventional Gaussian function and modified to suit the need
for similarity function. A Framework for intrusion detection is also discussed
as part of this research.Comment: 13 pages, 4 figures, Special issue on Computing Applications and Data
Mining, Paper 01021609, International Journal of Computer Science and
Information Security (IJCSIS), Vol. 14 S1, February 201
Optimal Repair Layering for Erasure-Coded Data Centers: From Theory to Practice
Repair performance in hierarchical data centers is often bottlenecked by
cross-rack network transfer. Recent theoretical results show that the
cross-rack repair traffic can be minimized through repair layering, whose idea
is to partition a repair operation into inner-rack and cross-rack layers.
However, how repair layering should be implemented and deployed in practice
remains an open issue. In this paper, we address this issue by proposing a
practical repair layering framework called DoubleR. We design two families of
practical double regenerating codes (DRC), which not only minimize the
cross-rack repair traffic, but also have several practical properties that
improve state-of-the-art regenerating codes. We implement and deploy DoubleR
atop Hadoop Distributed File System (HDFS), and show that DoubleR maintains the
theoretical guarantees of DRC and improves the repair performance of
regenerating codes in both node recovery and degraded read operations.Comment: 24 pages. Accepted by ACM Transactions on Storag
Applied Erasure Coding in Networks and Distributed Storage
The amount of digital data is rapidly growing. There is an increasing use of
a wide range of computer systems, from mobile devices to large-scale data
centers, and important for reliable operation of all computer systems is
mitigating the occurrence and the impact of errors in digital data. The demand
for new ultra-fast and highly reliable coding techniques for data at rest and
for data in transit is a major research challenge. Reliability is one of the
most important design requirements. The simplest way of providing a degree of
reliability is by using data replication techniques. However, replication is
highly inefficient in terms of capacity utilization. Erasure coding has
therefore become a viable alternative to replication since it provides the same
level of reliability as replication with significantly less storage overhead.
The present thesis investigates efficient constructions of erasure codes for
different applications. Methods from both coding and information theory have
been applied to network coding, Optical Packet Switching (OPS) networks and
distributed storage systems. The following four issues are addressed: -
Construction of binary and non-binary erasure codes; - Reduction of the header
overhead due to the encoding coefficients in network coding; - Construction and
implementation of new erasure codes for large-scale distributed storage systems
that provide savings in the storage and network resources compared to
state-of-the-art codes; and - Provision of a unified view on Quality of Service
(QoS) in OPS networks when erasure codes are used, with the focus on Packet
Loss Rate (PLR), survivability and secrecy
The Design and Implementation of a Rekeying-aware Encrypted Deduplication Storage System
Rekeying refers to an operation of replacing an existing key with a new key
for encryption. It renews security protection, so as to protect against key
compromise and enable dynamic access control in cryptographic storage. However,
it is non-trivial to realize efficient rekeying in encrypted deduplication
storage systems, which use deterministic content-derived encryption keys to
allow deduplication on ciphertexts. We design and implement REED, a
rekeying-aware encrypted deduplication storage system. REED builds on a
deterministic version of all-or-nothing transform (AONT), such that it enables
secure and lightweight rekeying, while preserving the deduplication capability.
We propose two REED encryption schemes that trade between performance and
security, and extend REED for dynamic access control. We implement a REED
prototype with various performance optimization techniques and demonstrate how
we can exploit similarity to mitigate key generation overhead. Our trace-driven
testbed evaluation shows that our REED prototype maintains high performance and
storage efficiency
Scalable, Fast Cloud Computing with Execution Templates
Large scale cloud data analytics applications are often CPU bound. Most of
these cycles are wasted: benchmarks written in C++ run 10-51 times faster than
frameworks such as Naiad and Spark. However, calling faster implementations
from those frameworks only sees moderate (3-5x) speedups because their control
planes cannot schedule work fast enough.
This paper presents execution templates, a control plane abstraction for
CPU-bound cloud applications, such as machine learning. Execution templates
leverage highly repetitive control flow to cache scheduling decisions as {\it
templates}. Rather than reschedule hundreds of thousands of tasks on every loop
execution, nodes instantiate these templates. A controller's template specifies
the execution across all worker nodes, which it partitions into per-worker
templates. To ensure that templates execute correctly, controllers dynamically
patch templates to match program control flow. We have implemented execution
templates in Nimbus, a C++ cloud computing framework. Running in Nimbus,
analytics benchmarks can run 16-43 times faster than in Naiad and Spark.
Nimbus's control plane can scale out to run these faster benchmarks on up to
100 nodes (800 cores)
Isolate First, Then Share: a New OS Architecture for Datacenter Computing
This paper presents the "isolate first, then share" OS model in which the
processor cores, memory, and devices are divided up between disparate OS
instances and a new abstraction, subOS, is proposed to encapsulate an OS
instance that can be created, destroyed, and resized on-the-fly. The intuition
is that this avoids shared kernel states between applications, which in turn
reduces performance loss caused by contention. We decompose the OS into the
supervisor and several subOSes running at the same privilege level: a subOS
directly manages physical resources, while the supervisor can create, destroy,
resize a subOS on-the-fly. The supervisor and subOSes have few state sharing,
but fast inter-subOS communication mechanisms are provided on demand.
We present the first implementation, RainForest, which supports unmodified
Linux binaries. Our comprehensive evaluation shows RainForest outperforms Linux
with four different kernels, LXC, and Xen in terms of worst-case and average
performance most of time when running a large number of benchmarks. The source
code is available soon.Comment: 14 pages, 13 figures, 5 table
Cuckoo++ Hash Tables: High-Performance Hash Tables for Networking Applications
Hash tables are an essential data-structure for numerous networking
applications (e.g., connection tracking, firewalls, network address
translators). Among these, cuckoo hash tables provide excellent performance by
allowing lookups to be processed with very few memory accesses (2 to 3 per
lookup). Yet, for large tables, cuckoo hash tables remain memory bound and each
memory access impacts performance. In this paper, we propose algorithmic
improvements to cuckoo hash tables allowing to eliminate some unnecessary
memory accesses; these changes are conducted without altering the properties of
the original cuckoo hash table so that all existing theoretical analysis remain
applicable. On a single core, our hash table achieves 37M lookups per second
for positive lookups (i.e., when the key looked up is present in the table),
and 60M lookups per second for negative lookups, a 50% improvement over the
implementation included into the DPDK. On a 18-core, with mostly positive
lookups, our implementation achieves 496M lookups per second, a 45% improvement
over DPDK.Comment: 13 page
A Survey on Large Scale Metadata Server for Big Data Storage
Big Data is defined as high volume of variety of data with an exponential
data growth rate. Data are amalgamated to generate revenue, which results a
large data silo. Data are the oils of modern IT industries. Therefore, the data
are growing at an exponential pace. The access mechanism of these data silos
are defined by metadata. The metadata are decoupled from data server for
various beneficial reasons. For instance, ease of maintenance. The metadata are
stored in metadata server (MDS). Therefore, the study on the MDS is mandatory
in designing of a large scale storage system. The MDS requires many parameters
to augment with its architecture. The architecture of MDS depends on the demand
of the storage system's requirements. Thus, MDS is categorized in various ways
depending on the underlying architecture and design methodology. The article
surveys on the various kinds of MDS architecture, designs, and methodologies.
This article emphasizes on clustered MDS (cMDS) and the reports are prepared
based on a) Bloom filterbased MDS, b) Clientfunded MDS, c) Geoaware
MDS, d) Cacheaware MDS, e) Loadaware MDS, f) Hashbased MDS, and g)
Treebased MDS. Additionally, the article presents the issues and challenges
of MDS for mammoth sized data.Comment: Submitted to ACM for possible publicatio
- …