14,605 research outputs found

    Exploiting Errors for Efficiency: A Survey from Circuits to Algorithms

    Full text link
    When a computational task tolerates a relaxation of its specification or when an algorithm tolerates the effects of noise in its execution, hardware, programming languages, and system software can trade deviations from correct behavior for lower resource usage. We present, for the first time, a synthesis of research results on computing systems that only make as many errors as their users can tolerate, from across the disciplines of computer aided design of circuits, digital system design, computer architecture, programming languages, operating systems, and information theory. Rather than over-provisioning resources at each layer to avoid errors, it can be more efficient to exploit the masking of errors occurring at one layer which can prevent them from propagating to a higher layer. We survey tradeoffs for individual layers of computing systems from the circuit level to the operating system level and illustrate the potential benefits of end-to-end approaches using two illustrative examples. To tie together the survey, we present a consistent formalization of terminology, across the layers, which does not significantly deviate from the terminology traditionally used by research communities in their layer of focus.Comment: 35 page

    Optimal Placement of Cores, Caches and Memory Controllers in Network On-Chip

    Full text link
    Parallel programming is emerging fast and intensive applications need more resources, so there is a huge demand for on-chip multiprocessors. Accessing L1 caches beside the cores are the fastest after registers but the size of private caches cannot increase because of design, cost and technology limits. Then split I-cache and D-cache are used with shared LLC (last level cache). For a unified shared LLC, bus interface is not scalable, and it seems that distributed shared LLC (DSLLC) is a better choice. Most of papers assume a distributed shared LLC beside each core in on-chip network. Many works assume that DSLLCs are placed in all cores; however, we will show that this design ignores the effect of traffic congestion in on-chip network. In fact, our work focuses on optimal placement of cores, DSLLCs and even memory controllers to minimize the expected latency based on traffic load in a mesh on-chip network with fixed number of cores and total cache capacity. We try to do some analytical modeling deriving intended cost function and then optimize the mean delay of the on-chip network communication. This work is supposed to be verified using some traffic patterns that are run on CSIM simulator

    Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

    Full text link
    We design and implement a distributed multinode synchronous SGD algorithm, without altering hyper parameters, or compressing data, or altering algorithmic behavior. We perform a detailed analysis of scaling, and identify optimal design points for different networks. We demonstrate scaling of CNNs on 100s of nodes, and present what we believe to be record training throughputs. A 512 minibatch VGG-A CNN training run is scaled 90X on 128 nodes. Also 256 minibatch VGG-A and OverFeat-FAST networks are scaled 53X and 42X respectively on a 64 node cluster. We also demonstrate the generality of our approach via best-in-class 6.5X scaling for a 7-layer DNN on 16 nodes. Thereafter we attempt to democratize deep-learning by training on an Ethernet based AWS cluster and show ~14X scaling on 16 nodes

    A Survey of Neuromorphic Computing and Neural Networks in Hardware

    Full text link
    Neuromorphic computing has come to refer to a variety of brain-inspired computers, devices, and models that contrast the pervasive von Neumann computer architecture. This biologically inspired approach has created highly connected synthetic neurons and synapses that can be used to model neuroscience theories as well as solve challenging machine learning problems. The promise of the technology is to create a brain-like ability to learn and adapt, but the technical challenges are significant, starting with an accurate neuroscience model of how the brain works, to finding materials and engineering breakthroughs to build devices to support these models, to creating a programming framework so the systems can learn, to creating applications with brain-like capabilities. In this work, we provide a comprehensive survey of the research and motivations for neuromorphic computing over its history. We begin with a 35-year review of the motivations and drivers of neuromorphic computing, then look at the major research areas of the field, which we define as neuro-inspired models, algorithms and learning approaches, hardware and devices, supporting systems, and finally applications. We conclude with a broad discussion on the major research topics that need to be addressed in the coming years to see the promise of neuromorphic computing fulfilled. The goals of this work are to provide an exhaustive review of the research conducted in neuromorphic computing since the inception of the term, and to motivate further work by illuminating gaps in the field where new research is needed

    Recent Advances in Convolutional Neural Network Acceleration

    Full text link
    In recent years, convolutional neural networks (CNNs) have shown great performance in various fields such as image classification, pattern recognition, and multi-media compression. Two of the feature properties, local connectivity and weight sharing, can reduce the number of parameters and increase processing speed during training and inference. However, as the dimension of data becomes higher and the CNN architecture becomes more complicated, the end-to-end approach or the combined manner of CNN is computationally intensive, which becomes limitation to CNN's further implementation. Therefore, it is necessary and urgent to implement CNN in a faster way. In this paper, we first summarize the acceleration methods that contribute to but not limited to CNN by reviewing a broad variety of research papers. We propose a taxonomy in terms of three levels, i.e.~structure level, algorithm level, and implementation level, for acceleration methods. We also analyze the acceleration methods in terms of CNN architecture compression, algorithm optimization, and hardware-based improvement. At last, we give a discussion on different perspectives of these acceleration and optimization methods within each level. The discussion shows that the methods in each level still have large exploration space. By incorporating such a wide range of disciplines, we expect to provide a comprehensive reference for researchers who are interested in CNN acceleration.Comment: submitted to Neurocomputin

    Stochastic Collocation with Non-Gaussian Correlated Process Variations: Theory, Algorithms and Applications

    Full text link
    Stochastic spectral methods have achieved great success in the uncertainty quantification of many engineering problems, including electronic and photonic integrated circuits influenced by fabrication process variations. Existing techniques employ a generalized polynomial-chaos expansion, and they almost always assume that all random parameters are mutually independent or Gaussian correlated. However, this assumption is rarely true in real applications. How to handle non-Gaussian correlated random parameters is a long-standing and fundamental challenge. A main bottleneck is the lack of theory and computational methods to perform a projection step in a correlated uncertain parameter space. This paper presents an optimization-based approach to automatically determinate the quadrature nodes and weights required in a projection step, and develops an efficient stochastic collocation algorithm for systems with non-Gaussian correlated parameters. We also provide some theoretical proofs for the complexity and error bound of our proposed method. Numerical experiments on synthetic, electronic and photonic integrated circuit examples show the nearly exponential convergence rate and excellent efficiency of our proposed approach. Many other challenging uncertainty-related problems can be further solved based on this work.Comment: 14 pages,11 figure. 4 table

    A Comprehensive Survey of Recent Advancements in Molecular Communication

    Full text link
    With much advancement in the field of nanotechnology, bioengineering and synthetic biology over the past decade, microscales and nanoscales devices are becoming a reality. Yet the problem of engineering a reliable communication system between tiny devices is still an open problem. At the same time, despite the prevalence of radio communication, there are still areas where traditional electromagnetic waves find it difficult or expensive to reach. Points of interest in industry, cities, and medical applications often lie in embedded and entrenched areas, accessible only by ventricles at scales too small for conventional radio waves and microwaves, or they are located in such a way that directional high frequency systems are ineffective. Inspired by nature, one solution to these problems is molecular communication (MC), where chemical signals are used to transfer information. Although biologists have studied MC for decades, it has only been researched for roughly 10 year from a communication engineering lens. Significant number of papers have been published to date, but owing to the need for interdisciplinary work, much of the results are preliminary. In this paper, the recent advancements in the field of MC engineering are highlighted. First, the biological, chemical, and physical processes used by an MC system are discussed. This includes different components of the MC transmitter and receiver, as well as the propagation and transport mechanisms. Then, a comprehensive survey of some of the recent works on MC through a communication engineering lens is provided. The paper ends with a technology readiness analysis of MC and future research directions.Comment: Accepted for publication in IEEE Communications Surveys & Tutorial

    Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges

    Full text link
    With the emerging big data applications of Machine Learning, Speech Recognition, Artificial Intelligence, and DNA Sequencing in recent years, computer architecture research communities are facing the explosive scale of various data explosion. To achieve high efficiency of data-intensive computing, studies of heterogeneous accelerators which focus on latest applications, have become a hot issue in computer architecture domain. At present, the implementation of heterogeneous accelerators mainly relies on heterogeneous computing units such as Application-specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among the typical heterogeneous architectures above, FPGA-based reconfigurable accelerators have two merits as follows: First, FPGA architecture contains a large number of reconfigurable circuits, which satisfy requirements of high performance and low power consumption when specific applications are running. Second, the reconfigurable architectures of employing FPGA performs prototype systems rapidly and features excellent customizability and reconfigurability. Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures. To better review the related work of reconfigurable computing accelerators recently, this survey reserves latest high-level research products of reconfigurable accelerator architectures and algorithm applications as the basis. In this survey, we compare hot research issues and concern domains, furthermore, analyze and illuminate advantages, disadvantages, and challenges of reconfigurable accelerators. In the end, we prospect the development tendency of accelerator architectures in the future, hoping to provide a reference for computer architecture researchers

    Molecular Communication Systems Design for Future City

    Full text link
    An area of interest in the modern age is the human migration from rural areas to cities. Cities are characterized by a dense concentration of buildings and key infrastructures. However, what has been lacking is a pervasive sensor technology that can monitor the performance of these structures. Partly, this is due to the fact that the information collected from sensors cannot be easily transported from the embedded location to an external data hub. Examples of health monitoring in structures include monitoring corrosion, fracture stress, and material delamination. The scenario examples include: pipelines, tunnel networks, and some industrial and medical areas such as hospitals (minimise electromagnetic interference), and turbines.Comment: PhD report. arXiv admin note: text overlap with arXiv:1310.0070 by other author

    Recent Advances in Physical Reservoir Computing: A Review

    Full text link
    Reservoir computing is a computational framework suited for temporal/sequential data processing. It is derived from several recurrent neural network models, including echo state networks and liquid state machines. A reservoir computing system consists of a reservoir for mapping inputs into a high-dimensional space and a readout for pattern analysis from the high-dimensional states in the reservoir. The reservoir is fixed and only the readout is trained with a simple method such as linear regression and classification. Thus, the major advantage of reservoir computing compared to other recurrent neural networks is fast learning, resulting in low training cost. Another advantage is that the reservoir without adaptive updating is amenable to hardware implementation using a variety of physical systems, substrates, and devices. In fact, such physical reservoir computing has attracted increasing attention in diverse fields of research. The purpose of this review is to provide an overview of recent advances in physical reservoir computing by classifying them according to the type of the reservoir. We discuss the current issues and perspectives related to physical reservoir computing, in order to further expand its practical applications and develop next-generation machine learning systems.Comment: 62 pages, 13 figure
    • …
    corecore