67,835 research outputs found
Increasing Availability in Distributed Storage Systems via Clustering
We introduce the Fixed Cluster Repair System (FCRS) as a novel architecture
for Distributed Storage Systems (DSS), achieving a small repair bandwidth while
guaranteeing a high availability. Specifically we partition the set of servers
in a DSS into clusters and allow a failed server to choose any cluster
other than its own as its repair group. Thereby, we guarantee an availability
of . We characterize the repair bandwidth vs. storage trade-off for the
FCRS under functional repair and show that the minimum repair bandwidth can be
improved by an asymptotic multiplicative factor of compared to the state
of the art coding techniques that guarantee the same availability. We further
introduce Cubic Codes designed to minimize the repair bandwidth of the FCRS
under the exact repair model. We prove an asymptotic multiplicative improvement
of in the minimum repair bandwidth compared to the existing exact repair
coding techniques that achieve the same availability. We show that Cubic Codes
are information-theoretically optimal for the FCRS with and complete
clusters. Furthermore, under the repair-by-transfer model, Cubic Codes are
optimal irrespective of the number of clusters
The power of indirect social ties
While direct social ties have been intensely studied in the context of
computer-mediated social networks, indirect ties (e.g., friends of friends)
have seen little attention. Yet in real life, we often rely on friends of our
friends for recommendations (of good doctors, good schools, or good
babysitters), for introduction to a new job opportunity, and for many other
occasional needs. In this work we attempt to 1) quantify the strength of
indirect social ties, 2) validate it, and 3) empirically demonstrate its
usefulness for distributed applications on two examples. We quantify social
strength of indirect ties using a(ny) measure of the strength of the direct
ties that connect two people and the intuition provided by the sociology
literature. We validate the proposed metric experimentally by comparing
correlations with other direct social tie evaluators. We show via data-driven
experiments that the proposed metric for social strength can be used
successfully for social applications. Specifically, we show that it alleviates
known problems in friend-to-friend storage systems by addressing two previously
documented shortcomings: reduced set of storage candidates and data
availability correlations. We also show that it can be used for predicting the
effects of a social diffusion with an accuracy of up to 93.5%.Comment: Technical Repor
Enabling Social Applications via Decentralized Social Data Management
An unprecedented information wealth produced by online social networks,
further augmented by location/collocation data, is currently fragmented across
different proprietary services. Combined, it can accurately represent the
social world and enable novel socially-aware applications. We present
Prometheus, a socially-aware peer-to-peer service that collects social
information from multiple sources into a multigraph managed in a decentralized
fashion on user-contributed nodes, and exposes it through an interface
implementing non-trivial social inferences while complying with user-defined
access policies. Simulations and experiments on PlanetLab with emulated
application workloads show the system exhibits good end-to-end response time,
low communication overhead and resilience to malicious attacks.Comment: 27 pages, single ACM column, 9 figures, accepted in Special Issue of
Foundations of Social Computing, ACM Transactions on Internet Technolog
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
- …