2,939 research outputs found

    Connections between two cycles — a new design of dense processor interconnection networks

    Get PDF
    AbstractIn this paper we attempt to maximize the order of graphs of given degree Δ and diameter D. These graphs, which are known as (Δ, D) graphs, are used as dense interconnection networks, i.e., processors with relatively few links are connected with relatively short paths. The method described in this paper uses periodic connections between two cycles of the same length. The results obtained give a significant improvement of the known lower bounds in many cases. Large bipartite graphs with a given degree and diameter were also obtained by our method. Again, the improvement of the lower bounds is significant

    New results for the degree/diameter problem

    Full text link
    The results of computer searches for large graphs with given (small) degree and diameter are presented. The new graphs are Cayley graphs of semidirect products of cyclic groups and related groups. One fundamental use of our ``dense graphs'' is in the design of efficient communication network topologies.Comment: 15 page

    Introduction to a system for implementing neural net connections on SIMD architectures

    Get PDF
    Neural networks have attracted much interest recently, and using parallel architectures to simulate neural networks is a natural and necessary application. The SIMD model of parallel computation is chosen, because systems of this type can be built with large numbers of processing elements. However, such systems are not naturally suited to generalized communication. A method is proposed that allows an implementation of neural network connections on massively parallel SIMD architectures. The key to this system is an algorithm permitting the formation of arbitrary connections between the neurons. A feature is the ability to add new connections quickly. It also has error recovery ability and is robust over a variety of network topologies. Simulations of the general connection system, and its implementation on the Connection Machine, indicate that the time and space requirements are proportional to the product of the average number of connections per neuron and the diameter of the interconnection network

    Efficient Interconnection Schemes for VLSI and Parallel Computation

    Get PDF
    This thesis is primarily concerned with two problems of interconnecting components in VLSI technologies. In the first case, the goal is to construct efficient interconnection networks for general-purpose parallel computers. The second problem is a more specialized problem in the design of VLSI chips, namely multilayer channel routing. In addition, a final part of this thesis provides lower bounds on the area required for VLSI implementations of finite-state machines. This thesis shows that networks based on Leiserson\u27s fat-tree architecture are nearly as good as any network built in a comparable amount of physical space. It shows that these universal networks can efficiently simulate competing networks by means of an appropriate correspondence between network components and efficient algorithms for routing messages on the universal network. In particular, a universal network of area A can simulate competing networks with O(lg^3A) slowdown (in bit-times), using a very simple randomized routing algorithm and simple network components. Alternatively, a packet routing scheme of Leighton, Maggs, and Rao can be used in conjunction with more sophisticated switching components to achieve O(lg^2 A) slowdown. Several other important aspects of universality are also discussed. It is shown that universal networks can be constructed in area linear in the number of processors, so that there is no need to restrict the density of processors in competing networks. Also results are presented for comparisons between networks of different size or with processors of different sizes (as determined by the amount of attached memory). Of particular interest is the fact that a universal network built from sufficiently small processors can simulate (with the slowdown already quoted) any competing network of comparable size regardless of the size of processors in the competing network. In addition, many of the results given do not require the usual assumption of unit wire delay. Finally, though most of the discussion is in the two-dimensional world, the results are shown to apply in three dimensions by way of a simple demonstration of general results on graph layout in three dimensions. The second main problem considered in this thesis is channel routing when many layers of interconnect are available, a scenario that is becoming more and more meaningful as chip fabrication technologies advance. This thesis describes a system MulCh for multilayer channel routing which extends the Chameleon system developed at U. C. Berkeley. Like Chameleon, MulCh divides a multilayer problem into essentially independent subproblems of at most three layers, but unlike Chameleon, MulCh considers the possibility of using partitions comprised of a single layer instead of only partitions of two or three layers. Experimental results show that MulCh often performs better than Chameleon in terms of channel width, total net length, and number of vias. In addition to a description of MulCh as implemented, this thesis provides improved algorithms for subtasks performed by MulCh, thereby indicating potential improvements in the speed and performance of multilayer channel routing. In particular, a linear time algorithm is given for determining the minimum width required for a single-layer channel routing problem, and an algorithm is given for maintaining the density of a collection of nets in logarithmic time per net insertion. The last part of this thesis shows that straightforward techniques for implementing finite-state machines are optimal in the worst case. Specifically, for any s and k, there is a deterministic finite-state machine with s states and k symbols such that any layout algorithm requires (ks lg s) area to lay out its realization. For nondeterministic machines, there is an analogous lower bound of (ks^2) area

    MorphIC: A 65-nm 738k-Synapse/mm2^2 Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning

    Full text link
    Recent trends in the field of neural network accelerators investigate weight quantization as a means to increase the resource- and power-efficiency of hardware devices. As full on-chip weight storage is necessary to avoid the high energy cost of off-chip memory accesses, memory reduction requirements for weight storage pushed toward the use of binary weights, which were demonstrated to have a limited accuracy reduction on many applications when quantization-aware training techniques are used. In parallel, spiking neural network (SNN) architectures are explored to further reduce power when processing sparse event-based data streams, while on-chip spike-based online learning appears as a key feature for applications constrained in power and resources during the training phase. However, designing power- and area-efficient spiking neural networks still requires the development of specific techniques in order to leverage on-chip online learning on binary weights without compromising the synapse density. In this work, we demonstrate MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning rule and a hierarchical routing fabric for large-scale chip interconnection. The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF) neurons and more than two million plastic synapses for an active silicon area of 2.86mm2^2 in 65nm CMOS, achieving a high density of 738k synapses/mm2^2. MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff on the MNIST classification task compared to previously-proposed SNNs, while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE Transactions on Biomedical Circuits and Systems journal (2019), the fully-edited paper is available at https://ieeexplore.ieee.org/document/876400

    Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems

    Get PDF
    Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ). This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)

    A review of High Performance Computing foundations for scientists

    Full text link
    The increase of existing computational capabilities has made simulation emerge as a third discipline of Science, lying midway between experimental and purely theoretical branches [1, 2]. Simulation enables the evaluation of quantities which otherwise would not be accessible, helps to improve experiments and provides new insights on systems which are analysed [3-6]. Knowing the fundamentals of computation can be very useful for scientists, for it can help them to improve the performance of their theoretical models and simulations. This review includes some technical essentials that can be useful to this end, and it is devised as a complement for researchers whose education is focused on scientific issues and not on technological respects. In this document we attempt to discuss the fundamentals of High Performance Computing (HPC) [7] in a way which is easy to understand without much previous background. We sketch the way standard computers and supercomputers work, as well as discuss distributed computing and discuss essential aspects to take into account when running scientific calculations in computers.Comment: 33 page
    • 

    corecore