2,244 research outputs found
Quick and energy-efficient Bayesian computing of binocular disparity using stochastic digital signals
Reconstruction of the tridimensional geometry of a visual scene using the
binocular disparity information is an important issue in computer vision and
mobile robotics, which can be formulated as a Bayesian inference problem.
However, computation of the full disparity distribution with an advanced
Bayesian model is usually an intractable problem, and proves computationally
challenging even with a simple model. In this paper, we show how probabilistic
hardware using distributed memory and alternate representation of data as
stochastic bitstreams can solve that problem with high performance and energy
efficiency. We put forward a way to express discrete probability distributions
using stochastic data representations and perform Bayesian fusion using those
representations, and show how that approach can be applied to diparity
computation. We evaluate the system using a simulated stochastic implementation
and discuss possible hardware implementations of such architectures and their
potential for sensorimotor processing and robotics.Comment: Preprint of article submitted for publication in International
Journal of Approximate Reasoning and accepted pending minor revision
Building Resilient Cloud Over Unreliable Commodity Infrastructure
Cloud Computing has emerged as a successful computing paradigm for
efficiently utilizing managed compute infrastructure such as high speed
rack-mounted servers, connected with high speed networking, and reliable
storage. Usually such infrastructure is dedicated, physically secured and has
reliable power and networking infrastructure. However, much of our idle compute
capacity is present in unmanaged infrastructure like idle desktops, lab
machines, physically distant server machines, and laptops. We present a scheme
to utilize this idle compute capacity on a best-effort basis and provide high
availability even in face of failure of individual components or facilities.
We run virtual machines on the commodity infrastructure and present a cloud
interface to our end users. The primary challenge is to maintain availability
in the presence of node failures, network failures, and power failures. We run
multiple copies of a Virtual Machine (VM) redundantly on geographically
dispersed physical machines to achieve availability. If one of the running
copies of a VM fails, we seamlessly switchover to another running copy. We use
Virtual Machine Record/Replay capability to implement this redundancy and
switchover. In current progress, we have implemented VM Record/Replay for
uniprocessor machines over Linux/KVM and are currently working on VM
Record/Replay on shared-memory multiprocessor machines. We report initial
experimental results based on our implementation.Comment: Oral presentation at IEEE "Cloud Computing for Emerging Markets",
Oct. 11-12, 2012, Bangalore, Indi
Contrasting Views of Complexity and Their Implications For Network-Centric Infrastructures
There exists a widely recognized need to better understand
and manage complex âsystems of systems,â ranging from
biology, ecology, and medicine to network-centric technologies.
This is motivating the search for universal laws of highly evolved
systems and driving demand for new mathematics and methods
that are consistent, integrative, and predictive. However, the theoretical
frameworks available today are not merely fragmented
but sometimes contradictory and incompatible. We argue that
complexity arises in highly evolved biological and technological
systems primarily to provide mechanisms to create robustness.
However, this complexity itself can be a source of new fragility,
leading to ârobust yet fragileâ tradeoffs in system design. We
focus on the role of robustness and architecture in networked
infrastructures, and we highlight recent advances in the theory
of distributed control driven by network technologies. This view
of complexity in highly organized technological and biological systems
is fundamentally different from the dominant perspective in
the mainstream sciences, which downplays function, constraints,
and tradeoffs, and tends to minimize the role of organization and
design
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication
This paper has two contributions. First, we propose a novel coded matrix
multiplication technique called Generalized PolyDot codes that advances on
existing methods for coded matrix multiplication under storage and
communication constraints. This technique uses "garbage alignment," i.e.,
aligning computations in coded computing that are not a part of the desired
output. Generalized PolyDot codes bridge between Polynomial codes and MatDot
codes, trading off between recovery threshold and communication costs. Second,
we demonstrate that Generalized PolyDot can be used for training large Deep
Neural Networks (DNNs) on unreliable nodes prone to soft-errors. This requires
us to address three additional challenges: (i) prohibitively large overhead of
coding the weight matrices in each layer of the DNN at each iteration; (ii)
nonlinear operations during training, which are incompatible with linear
coding; and (iii) not assuming presence of an error-free master node, requiring
us to architect a fully decentralized implementation without any "single point
of failure." We allow all primary DNN training steps, namely, matrix
multiplication, nonlinear activation, Hadamard product, and update steps as
well as the encoding/decoding to be error-prone. We consider the case of
mini-batch size , as well as , leveraging coded matrix-vector
products, and matrix-matrix products respectively. The problem of DNN training
under soft-errors also motivates an interesting, probabilistic error model
under which a real number MDS code is shown to correct errors
with probability as compared to for the
more conventional, adversarial error model. We also demonstrate that our
proposed strategy can provide unbounded gains in error tolerance over a
competing replication strategy and a preliminary MDS-code-based strategy for
both these error models.Comment: Presented in part at the IEEE International Symposium on Information
Theory 2018 (Submission Date: Jan 12 2018); Currently under review at the
IEEE Transactions on Information Theor
- âŠ