14,605 research outputs found
Exploiting Errors for Efficiency: A Survey from Circuits to Algorithms
When a computational task tolerates a relaxation of its specification or when
an algorithm tolerates the effects of noise in its execution, hardware,
programming languages, and system software can trade deviations from correct
behavior for lower resource usage. We present, for the first time, a synthesis
of research results on computing systems that only make as many errors as their
users can tolerate, from across the disciplines of computer aided design of
circuits, digital system design, computer architecture, programming languages,
operating systems, and information theory.
Rather than over-provisioning resources at each layer to avoid errors, it can
be more efficient to exploit the masking of errors occurring at one layer which
can prevent them from propagating to a higher layer. We survey tradeoffs for
individual layers of computing systems from the circuit level to the operating
system level and illustrate the potential benefits of end-to-end approaches
using two illustrative examples. To tie together the survey, we present a
consistent formalization of terminology, across the layers, which does not
significantly deviate from the terminology traditionally used by research
communities in their layer of focus.Comment: 35 page
Optimal Placement of Cores, Caches and Memory Controllers in Network On-Chip
Parallel programming is emerging fast and intensive applications need more
resources, so there is a huge demand for on-chip multiprocessors. Accessing L1
caches beside the cores are the fastest after registers but the size of private
caches cannot increase because of design, cost and technology limits. Then
split I-cache and D-cache are used with shared LLC (last level cache). For a
unified shared LLC, bus interface is not scalable, and it seems that
distributed shared LLC (DSLLC) is a better choice. Most of papers assume a
distributed shared LLC beside each core in on-chip network. Many works assume
that DSLLCs are placed in all cores; however, we will show that this design
ignores the effect of traffic congestion in on-chip network. In fact, our work
focuses on optimal placement of cores, DSLLCs and even memory controllers to
minimize the expected latency based on traffic load in a mesh on-chip network
with fixed number of cores and total cache capacity. We try to do some
analytical modeling deriving intended cost function and then optimize the mean
delay of the on-chip network communication. This work is supposed to be
verified using some traffic patterns that are run on CSIM simulator
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent
We design and implement a distributed multinode synchronous SGD algorithm,
without altering hyper parameters, or compressing data, or altering algorithmic
behavior. We perform a detailed analysis of scaling, and identify optimal
design points for different networks. We demonstrate scaling of CNNs on 100s of
nodes, and present what we believe to be record training throughputs. A 512
minibatch VGG-A CNN training run is scaled 90X on 128 nodes. Also 256 minibatch
VGG-A and OverFeat-FAST networks are scaled 53X and 42X respectively on a 64
node cluster. We also demonstrate the generality of our approach via
best-in-class 6.5X scaling for a 7-layer DNN on 16 nodes. Thereafter we attempt
to democratize deep-learning by training on an Ethernet based AWS cluster and
show ~14X scaling on 16 nodes
A Survey of Neuromorphic Computing and Neural Networks in Hardware
Neuromorphic computing has come to refer to a variety of brain-inspired
computers, devices, and models that contrast the pervasive von Neumann computer
architecture. This biologically inspired approach has created highly connected
synthetic neurons and synapses that can be used to model neuroscience theories
as well as solve challenging machine learning problems. The promise of the
technology is to create a brain-like ability to learn and adapt, but the
technical challenges are significant, starting with an accurate neuroscience
model of how the brain works, to finding materials and engineering
breakthroughs to build devices to support these models, to creating a
programming framework so the systems can learn, to creating applications with
brain-like capabilities. In this work, we provide a comprehensive survey of the
research and motivations for neuromorphic computing over its history. We begin
with a 35-year review of the motivations and drivers of neuromorphic computing,
then look at the major research areas of the field, which we define as
neuro-inspired models, algorithms and learning approaches, hardware and
devices, supporting systems, and finally applications. We conclude with a broad
discussion on the major research topics that need to be addressed in the coming
years to see the promise of neuromorphic computing fulfilled. The goals of this
work are to provide an exhaustive review of the research conducted in
neuromorphic computing since the inception of the term, and to motivate further
work by illuminating gaps in the field where new research is needed
Recent Advances in Convolutional Neural Network Acceleration
In recent years, convolutional neural networks (CNNs) have shown great
performance in various fields such as image classification, pattern
recognition, and multi-media compression. Two of the feature properties, local
connectivity and weight sharing, can reduce the number of parameters and
increase processing speed during training and inference. However, as the
dimension of data becomes higher and the CNN architecture becomes more
complicated, the end-to-end approach or the combined manner of CNN is
computationally intensive, which becomes limitation to CNN's further
implementation. Therefore, it is necessary and urgent to implement CNN in a
faster way. In this paper, we first summarize the acceleration methods that
contribute to but not limited to CNN by reviewing a broad variety of research
papers. We propose a taxonomy in terms of three levels, i.e.~structure level,
algorithm level, and implementation level, for acceleration methods. We also
analyze the acceleration methods in terms of CNN architecture compression,
algorithm optimization, and hardware-based improvement. At last, we give a
discussion on different perspectives of these acceleration and optimization
methods within each level. The discussion shows that the methods in each level
still have large exploration space. By incorporating such a wide range of
disciplines, we expect to provide a comprehensive reference for researchers who
are interested in CNN acceleration.Comment: submitted to Neurocomputin
Stochastic Collocation with Non-Gaussian Correlated Process Variations: Theory, Algorithms and Applications
Stochastic spectral methods have achieved great success in the uncertainty
quantification of many engineering problems, including electronic and photonic
integrated circuits influenced by fabrication process variations. Existing
techniques employ a generalized polynomial-chaos expansion, and they almost
always assume that all random parameters are mutually independent or Gaussian
correlated. However, this assumption is rarely true in real applications. How
to handle non-Gaussian correlated random parameters is a long-standing and
fundamental challenge. A main bottleneck is the lack of theory and
computational methods to perform a projection step in a correlated uncertain
parameter space. This paper presents an optimization-based approach to
automatically determinate the quadrature nodes and weights required in a
projection step, and develops an efficient stochastic collocation algorithm for
systems with non-Gaussian correlated parameters. We also provide some
theoretical proofs for the complexity and error bound of our proposed method.
Numerical experiments on synthetic, electronic and photonic integrated circuit
examples show the nearly exponential convergence rate and excellent efficiency
of our proposed approach. Many other challenging uncertainty-related problems
can be further solved based on this work.Comment: 14 pages,11 figure. 4 table
A Comprehensive Survey of Recent Advancements in Molecular Communication
With much advancement in the field of nanotechnology, bioengineering and
synthetic biology over the past decade, microscales and nanoscales devices are
becoming a reality. Yet the problem of engineering a reliable communication
system between tiny devices is still an open problem. At the same time, despite
the prevalence of radio communication, there are still areas where traditional
electromagnetic waves find it difficult or expensive to reach. Points of
interest in industry, cities, and medical applications often lie in embedded
and entrenched areas, accessible only by ventricles at scales too small for
conventional radio waves and microwaves, or they are located in such a way that
directional high frequency systems are ineffective. Inspired by nature, one
solution to these problems is molecular communication (MC), where chemical
signals are used to transfer information. Although biologists have studied MC
for decades, it has only been researched for roughly 10 year from a
communication engineering lens. Significant number of papers have been
published to date, but owing to the need for interdisciplinary work, much of
the results are preliminary. In this paper, the recent advancements in the
field of MC engineering are highlighted. First, the biological, chemical, and
physical processes used by an MC system are discussed. This includes different
components of the MC transmitter and receiver, as well as the propagation and
transport mechanisms. Then, a comprehensive survey of some of the recent works
on MC through a communication engineering lens is provided. The paper ends with
a technology readiness analysis of MC and future research directions.Comment: Accepted for publication in IEEE Communications Surveys & Tutorial
Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges
With the emerging big data applications of Machine Learning, Speech
Recognition, Artificial Intelligence, and DNA Sequencing in recent years,
computer architecture research communities are facing the explosive scale of
various data explosion. To achieve high efficiency of data-intensive computing,
studies of heterogeneous accelerators which focus on latest applications, have
become a hot issue in computer architecture domain. At present, the
implementation of heterogeneous accelerators mainly relies on heterogeneous
computing units such as Application-specific Integrated Circuit (ASIC),
Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among
the typical heterogeneous architectures above, FPGA-based reconfigurable
accelerators have two merits as follows: First, FPGA architecture contains a
large number of reconfigurable circuits, which satisfy requirements of high
performance and low power consumption when specific applications are running.
Second, the reconfigurable architectures of employing FPGA performs prototype
systems rapidly and features excellent customizability and reconfigurability.
Nowadays, in top-tier conferences of computer architecture, emerging a batch of
accelerating works based on FPGA or other reconfigurable architectures. To
better review the related work of reconfigurable computing accelerators
recently, this survey reserves latest high-level research products of
reconfigurable accelerator architectures and algorithm applications as the
basis. In this survey, we compare hot research issues and concern domains,
furthermore, analyze and illuminate advantages, disadvantages, and challenges
of reconfigurable accelerators. In the end, we prospect the development
tendency of accelerator architectures in the future, hoping to provide a
reference for computer architecture researchers
Molecular Communication Systems Design for Future City
An area of interest in the modern age is the human migration from rural areas
to cities. Cities are characterized by a dense concentration of buildings and
key infrastructures. However, what has been lacking is a pervasive sensor
technology that can monitor the performance of these structures. Partly, this
is due to the fact that the information collected from sensors cannot be easily
transported from the embedded location to an external data hub. Examples of
health monitoring in structures include monitoring corrosion, fracture stress,
and material delamination. The scenario examples include: pipelines, tunnel
networks, and some industrial and medical areas such as hospitals (minimise
electromagnetic interference), and turbines.Comment: PhD report. arXiv admin note: text overlap with arXiv:1310.0070 by
other author
Recent Advances in Physical Reservoir Computing: A Review
Reservoir computing is a computational framework suited for
temporal/sequential data processing. It is derived from several recurrent
neural network models, including echo state networks and liquid state machines.
A reservoir computing system consists of a reservoir for mapping inputs into a
high-dimensional space and a readout for pattern analysis from the
high-dimensional states in the reservoir. The reservoir is fixed and only the
readout is trained with a simple method such as linear regression and
classification. Thus, the major advantage of reservoir computing compared to
other recurrent neural networks is fast learning, resulting in low training
cost. Another advantage is that the reservoir without adaptive updating is
amenable to hardware implementation using a variety of physical systems,
substrates, and devices. In fact, such physical reservoir computing has
attracted increasing attention in diverse fields of research. The purpose of
this review is to provide an overview of recent advances in physical reservoir
computing by classifying them according to the type of the reservoir. We
discuss the current issues and perspectives related to physical reservoir
computing, in order to further expand its practical applications and develop
next-generation machine learning systems.Comment: 62 pages, 13 figure
- …