Search CORE

1,240 research outputs found

Parallel Architectures for Planetary Exploration Requirements (PAPER)

Author: Cezzar Ruknet
Sen Ranjan K.
Publication venue
Publication date
Field of study

The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified

NASA Technical Reports Server

Expansion of layouts of complete binary trees into grids

Author: Lin Y.-B.
Miller Z.
Perkel M.
Pritikin D.
Sudborough I.H.
Publication venue: Elsevier B.V.
Publication date: 28/09/2003
Field of study

AbstractLet Th be the complete binary tree of height h. Let M be the infinite grid graph with vertex set Z2, where two vertices (x1,y1) and (x2,y2) of M are adjacent if and only if |x1−x2|+|y1−y2|=1. Suppose that T is a tree which is a subdivision of Th and is also isomorphic to a subgraph of M. Motivated by issues in optimal VLSI design, we show that the point expansion ratio n(T)/n(Th)=n(T)/(2h+1−1) is bounded below by 1.122 for h sufficiently large. That is, we give bounds on how many vertices of degree 2 must be inserted along the edges of Th in order that the resulting tree can be laid out in the grid. Concerning the constructive end of VLSI design, suppose that T is a tree which is a subdivision of Th and is also isomorphic to a subgraph of the n×n grid graph. Define the expansion ratio of such a layout to be n2/n(Th)=n2/(2h+1−1). We show constructively that the minimum possible expansion ratio over all layouts of Th is bounded above by 1.4656 for sufficiently large h. That is, we give efficient layouts of complete binary trees into square grids, making improvements upon the previous work of others. We also give bounds for the point expansion and expansion problems for layouts of Th into extended grids, i.e. grids with added diagonals

Elsevier - Publisher Connector

Fixed Orientation Interconnection Problems: Theory, Algorithms and Applications

Author: Zachariasen Martin
Publication venue: Museum Tusculanum
Publication date: 01/01/2009
Field of study

Copenhagen University Research Information System

A new-generation class of parallel architectures and their performance evaluation

Author: Wang Qian
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/1999
Field of study

The development of computers with hundreds or thousands of processors and capability for very high performance is absolutely essential for many computation problems, such as weather modeling, fluid dynamics, and aerodynamics. Several interconnection networks have been proposed for parallel computers. Nevertheless, the majority of them are plagued by rather poor topological properties that result in large memory latencies for DSM (Distributed Shared-Memory) computers. On the other hand, scalable networks with very good topological properties are often impossible to build because of their prohibitively high VLSI (e.g., wiring) complexity. Such a network is the generalized hypercube (GH). The GH supports full-connectivity of its nodes in each dimension and is characterized by outstanding topological properties. In addition, low-dimensional GHs have very large bisection widths. We propose in this dissertation a new class of processor interconnections, namely HOWs (Highly Overlapping Windows), that are more generic than the GH, are highly scalable, and have comparable performance. We analyze the communications capabilities of 2-D HOW systems and demonstrate that in practical cases HOW systems perform much better than binary hypercubes for important communications patterns. These properties are in addition to the good scalability and low hardware complexity of HOW systems. We present algorithms for one-to-one, one-to-all broadcasting, all-to-all broadcasting, one-to-all personalized, and all-to-all personalized communications on HOW systems. These algorithms are developed and evaluated for several communication models. In addition, we develop techniques for the efficient embedding of popular topologies, such as the ring, the torus, and the hypercube, into 1-D and 2-D HOW systems. The objective is to show that 2-D HOW systems are not only scalable and easy to implement, but they also result in good embedding of several classical topologies

Digital Commons @ New Jersey Institute of Technology (NJIT)

YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

Author: Andri Renzo
Benini Luca
Cavigelli Lukas
Rossi Davide
Publication venue
Publication date: 24/02/2017
Field of study

Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work, we present an accelerator optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm

^2

and with a power dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Representing Graph Families with Edge Grammars

Author: Francine Bennan
Francine Berman
Gregory Shannon
Gregory Shannon
Gregory Shannont
Purdue E-pubs
Publication venue: 'Purdue University (bepress)'
Publication date: 08/05/1985
Field of study

An edge grammar is a formal mechanism for representing families of related graphs (binary trees, hypercubes, meshes, etc.). Given an edge grammar, larger graphs in the family are derived from simple basis graphs using edge rewriting rules. A drawback to many graph grammars is that they cannot represent some important, highly regular graph families such as the family of shuffie-exchange graphs. Edge grammars, however, exist for all "computable " graph families, and simple edge gramma.rs exist for most regular graph families. In this paper, we define and illuskate edge grammars and analyze them in the context of formal language theory. Our results include hierarchy and decidability properties. Since this work originally was motivated by a need to represent graph families found in parallel computation, the application of edge grammars in this context is also discussed

CiteSeerX

Purdue E-Pubs

Evaluating the communications capabilities of the generalized hypercube interconnection network

Author: Krishnamurthy Sanjay
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/1998
Field of study

This thesis presents results of evaluating the communications capabilities of the generalized hypercube interconnection network. The generalized hypercube has outstanding topological properties, but it has not been implemented in a large scale because of its very high wiring complexity. For this reason, this network has not been studied extensively in the past. However, recent and expected technological advancements will soon render this network viable for massively parallel systems. We first present implementations of randomized many-to-all broadcasting and multicasting on generalized hypercubes, using as the basis the one-to-all broadcast algorithm presented in [3]. We test the proposed implementations under realistic communication traffic patterns and message generations, for the all-port model of communication. Our results show that the size of the intermediate message buffers has a significant effect on the total communication time, and this effect becomes very dramatic for large systems with large numbers of dimensions. We also propose a modification of this multicast algorithm that applies congestion control to improve its performance. The results illustrate a significant improvement in the total execution time and a reduction in the number of message contentions, and also prove that the generalized hypercube is a very versatile interconnection network

Digital Commons @ New Jersey Institute of Technology (NJIT)

Efficient Physical Embedding of Topologically Complex Information Processing Networks in Brains and Computer Circuits

Author: AL Barabasi
Andreas Meyer-Lindenberg
BC Bernhardt
BC Bernhardt
BL Chen
C Song
C Zhang
CR Houser
D Attwell
D Meunier
D Meunier
D Stroobandt
Daniel L. Greenfield
Daniel R. Weinberger
Danielle S. Bassett
DB Tower
DJ Watts
DS Bassett
E Bullmore
E Dubois
EC Bush
Edward T. Bullmore
EG Jones
ET Bullmore
ET Bullmore
ET Bullmore
F Brglez
G Concas
G Schlenska
GB West
GB West
H Simon
HB Bakoglu
HD Frahm
HM Ozaktas
IC Wright
J Jiang
J Ozik
J Partzsch
JA Fodor
JE Niven
JK Rilling
JP Lerch
JW Prothero
K Shahookar
Karl J. Friston
L Deuker
L Hagen
L Zhang
L Zhang
M Abeles
M Kaiser
M Kaiser
M Müller-Linow
M Sales-Pardo
MA Changizi
MA Hofman
MA Hofman
MEJ Newman
MG Kitzbichler
N Kashtan
N Masuda
O Sporns
P Christie
P Hagmann
P Verplaetse
PA Robinson
PW Woodruff
R Insausti
RT Gray
S Herculano-Houzel
S Maslov
S Reda
Simon W. Moore
V Beiu
VD Blondel
W Callebaut
WE Donath
Y Choe
Y He
Y He
YT Wu
Publication venue: Public Library of Science
Publication date: 01/04/2010
Field of study

Nervous systems are information processing networks that evolved by natural selection, whereas very large scale integrated (VLSI) computer circuits have evolved by commercially driven technology development. Here we follow historic intuition that all physical information processing systems will share key organizational properties, such as modularity, that generally confer adaptivity of function. It has long been observed that modular VLSI circuits demonstrate an isometric scaling relationship between the number of processing elements and the number of connections, known as Rent's rule, which is related to the dimensionality of the circuit's interconnect topology and its logical capacity. We show that human brain structural networks, and the nervous system of the nematode C. elegans, also obey Rent's rule, and exhibit some degree of hierarchical modularity. We further show that the estimated Rent exponent of human brain networks, derived from MRI data, can explain the allometric scaling relations between gray and white matter volumes across a wide range of mammalian species, again suggesting that these principles of nervous system design are highly conserved. For each of these fractal modular networks, the dimensionality of the interconnect topology was greater than the 2 or 3 Euclidean dimensions of the space in which it was embedded. This relatively high complexity entailed extra cost in physical wiring: although all networks were economically or cost-efficiently wired they did not strictly minimize wiring costs. Artificial and biological information processing systems both may evolve to optimize a trade-off between physical cost and topological complexity, resulting in the emergence of homologous principles of economical, fractal and modular design across many different kinds of nervous and computational networks

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central