17,133 research outputs found
Exploring heterogeneity of unreliable machines for p2p backup
P2P architecture is a viable option for enterprise backup. In contrast to
dedicated backup servers, nowadays a standard solution, making backups directly
on organization's workstations should be cheaper (as existing hardware is
used), more efficient (as there is no single bottleneck server) and more
reliable (as the machines are geographically dispersed).
We present the architecture of a p2p backup system that uses pairwise
replication contracts between a data owner and a replicator. In contrast to
standard p2p storage systems using directly a DHT, the contracts allow our
system to optimize replicas' placement depending on a specific optimization
strategy, and so to take advantage of the heterogeneity of the machines and the
network. Such optimization is particularly appealing in the context of backup:
replicas can be geographically dispersed, the load sent over the network can be
minimized, or the optimization goal can be to minimize the backup/restore time.
However, managing the contracts, keeping them consistent and adjusting them in
response to dynamically changing environment is challenging.
We built a scientific prototype and ran the experiments on 150 workstations
in the university's computer laboratories and, separately, on 50 PlanetLab
nodes. We found out that the main factor affecting the quality of the system is
the availability of the machines. Yet, our main conclusion is that it is
possible to build an efficient and reliable backup system on highly unreliable
machines (our computers had just 13% average availability)
Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU
Monte Carlo simulations of the Ising model play an important role in the
field of computational statistical physics, and they have revealed many
properties of the model over the past few decades. However, the effect of
frustration due to random disorder, in particular the possible spin glass
phase, remains a crucial but poorly understood problem. One of the obstacles in
the Monte Carlo simulation of random frustrated systems is their long
relaxation time making an efficient parallel implementation on state-of-the-art
computation platforms highly desirable. The Graphics Processing Unit (GPU) is
such a platform that provides an opportunity to significantly enhance the
computational performance and thus gain new insight into this problem. In this
paper, we present optimization and tuning approaches for the CUDA
implementation of the spin glass simulation on GPUs. We discuss the integration
of various design alternatives, such as GPU kernel construction with minimal
communication, memory tiling, and look-up tables. We present a binary data
format, Compact Asynchronous Multispin Coding (CAMSC), which provides an
additional speedup compared with the traditionally used Asynchronous
Multispin Coding (AMSC). Our overall design sustains a performance of 33.5
picoseconds per spin flip attempt for simulating the three-dimensional
Edwards-Anderson model with parallel tempering, which significantly improves
the performance over existing GPU implementations.Comment: 15 pages, 18 figure
Creating a Relational Distributed Object Store
In and of itself, data storage has apparent business utility. But when we can
convert data to information, the utility of stored data increases dramatically.
It is the layering of relation atop the data mass that is the engine for such
conversion. Frank relation amongst discrete objects sporadically ingested is
rare, making the process of synthesizing such relation all the more
challenging, but the challenge must be met if we are ever to see an equivalent
business value for unstructured data as we already have with structured data.
This paper describes a novel construct, referred to as a relational distributed
object store (RDOS), that seeks to solve the twin problems of how to
persistently and reliably store petabytes of unstructured data while
simultaneously creating and persisting relations amongst billions of objects.Comment: 12 pages, 5 figure
Communication Cost for Updating Linear Functions when Message Updates are Sparse: Connections to Maximally Recoverable Codes
We consider a communication problem in which an update of the source message
needs to be conveyed to one or more distant receivers that are interested in
maintaining specific linear functions of the source message. The setting is one
in which the updates are sparse in nature, and where neither the source nor the
receiver(s) is aware of the exact {\em difference vector}, but only know the
amount of sparsity that is present in the difference-vector. Under this
setting, we are interested in devising linear encoding and decoding schemes
that minimize the communication cost involved. We show that the optimal
solution to this problem is closely related to the notion of maximally
recoverable codes (MRCs), which were originally introduced in the context of
coding for storage systems. In the context of storage, MRCs guarantee optimal
erasure protection when the system is partially constrained to have local
parity relations among the storage nodes. In our problem, we show that optimal
solutions exist if and only if MRCs of certain kind (identified by the desired
linear functions) exist. We consider point-to-point and broadcast versions of
the problem, and identify connections to MRCs under both these settings. For
the point-to-point setting, we show that our linear-encoder based achievable
scheme is optimal even when non-linear encoding is permitted. The theory is
illustrated in the context of updating erasure coded storage nodes. We present
examples based on modern storage codes such as the minimum bandwidth
regenerating codes.Comment: To Appear in IEEE Transactions on Information Theor
Gossip Algorithms for Distributed Signal Processing
Gossip algorithms are attractive for in-network processing in sensor networks
because they do not require any specialized routing, there is no bottleneck or
single point of failure, and they are robust to unreliable wireless network
conditions. Recently, there has been a surge of activity in the computer
science, control, signal processing, and information theory communities,
developing faster and more robust gossip algorithms and deriving theoretical
performance guarantees. This article presents an overview of recent work in the
area. We describe convergence rate results, which are related to the number of
transmitted messages and thus the amount of energy consumed in the network for
gossiping. We discuss issues related to gossiping over wireless links,
including the effects of quantization and noise, and we illustrate the use of
gossip algorithms for canonical signal processing tasks including distributed
estimation, source localization, and compression.Comment: Submitted to Proceedings of the IEEE, 29 page
Measuring Catastrophic Forgetting in Neural Networks
Deep neural networks are used in many state-of-the-art systems for machine
perception. Once a network is trained to do a specific task, e.g., bird
classification, it cannot easily be trained to do new tasks, e.g.,
incrementally learning to recognize additional bird species or learning an
entirely different task such as flower recognition. When new tasks are added,
typical deep neural networks are prone to catastrophically forgetting previous
tasks. Networks that are capable of assimilating new information incrementally,
much like how humans form new memories over time, will be more efficient than
re-training the model from scratch each time a new task needs to be learned.
There have been multiple attempts to develop schemes that mitigate catastrophic
forgetting, but these methods have not been directly compared, the tests used
to evaluate them vary considerably, and these methods have only been evaluated
on small-scale problems (e.g., MNIST). In this paper, we introduce new metrics
and benchmarks for directly comparing five different mechanisms designed to
mitigate catastrophic forgetting in neural networks: regularization,
ensembling, rehearsal, dual-memory, and sparse-coding. Our experiments on
real-world images and sounds show that the mechanism(s) that are critical for
optimal performance vary based on the incremental training paradigm and type of
data being used, but they all demonstrate that the catastrophic forgetting
problem has yet to be solved.Comment: To appear in AAAI 201
- …