Search CORE

111 research outputs found

Submicron Systems Architecture Project : Semiannual Technical Report

Author: Martin Alain J.
Seitz Charles L.
Van de Snepscheut Jan L. A.
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1992
Field of study

The Mosaic C is an experimental fine-grain multicomputer based on single-chip nodes. The Mosaic C chip includes 64KB of fast dynamic RAM, processor, packet interface, ROM for bootstrap and self-test, and a two-dimensional selftimed router. The chip architecture provides low-overhead and low-latency handling of message packets, and high memory and network bandwidth. Sixty-four Mosaic chips are packaged by tape-automated bonding (TAB) in an 8 x 8 array on circuit boards that can, in turn, be arrayed in two dimensions to build arbitrarily large machines. These 8 x 8 boards are now in prototype production under a subcontract with Hewlett-Packard. We are planning to construct a 16K-node Mosaic C system from 256 of these boards. The suite of Mosaic C hardware also includes host-interface boards and high-speed communication cables. The hardware developments and activities of the past eight months are described in section 2.1. The programming system that we are developing for the Mosaic C is based on the same message-passing, reactive-process, computational model that we have used with earlier multicomputers, but the model is implemented for the Mosaic in a way that supports finegrain concurrency. A process executes only in response to receiving a message, and may in execution send messages, create new processes, and modify its persistent variables before it either exits or becomes dormant in preparation for receiving another message. These computations are expressed in an object-oriented programming notation, a derivative of C++ called C+-. The computational model and the C+- programming notation are described in section 2.2. The Mosaic C runtime system, which is written in C+-, provides automatic process placement and highly distributed management of system resources. The Mosaic C runtime system is described in section 2.3

Caltech Authors

Highly parallel computation

Author: Denning Peter J.
Tichy Walter F.
Publication venue
Publication date
Field of study

Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

NASA Technical Reports Server

A Distributed Discrete-Time Neural Network Architecture for Pattern Allocation and Control

Author: Chronopoulos A.T.
Sarangapani Jagannathan
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2002
Field of study

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Efficient Mapping of Neural Network Models on a Class of Parallel Architectures.

Author: Arad Behnam Seyed
Publication venue: LSU Digital Commons
Publication date: 01/01/1997
Field of study

This dissertation develops a formal and systematic methodology for efficient mapping of several contemporary artificial neural network (ANN) models on k-ary n-cube parallel architectures (KNC\u27s). We apply the general mapping to several important ANN models including feedforward ANN\u27s trained with backpropagation algorithm, radial basis function networks, cascade correlation learning, and adaptive resonance theory networks. Our approach utilizes a parallel task graph representing concurrent operations of the ANN model during training. The mapping of the ANN is performed in two steps. First, the parallel task graph of the ANN is mapped to a virtual KNC of compatible dimensionality. This involves decomposing each operation into its atomic tasks. Second, the dimensionality of the virtual KNC architecture is recursively reduced through a sequence of transformations until a desired metric is optimized. We refer to this process as folding the virtual architecture. The optimization criteria we consider in this dissertation are defined in terms of the iteration time of the algorithm on the folded architecture. If necessary, the mapping scheme may utilize a subset of the processors of a given KNC architecture if it results in the most efficient simulation. A unique feature of our mapping is that it systematically selects an appropriate degree of parallelism leading to a highly efficient realization of the ANN model on KNC architectures. A novel feature of our work is its ability to efficiently map unit-allocating ANN\u27s. These networks possess a dynamic structure which grows during training. We present a highly efficient scheme for simulating such networks on existing KNC parallel architectures. We assume an upper bound on size of the neural network We perform the folding such that the iteration time of the largest network is minimized. We show that our mapping leads to near-optimal simulation of smaller instances of the neural network. In addition, based on our mapping no data migration or task rescheduling is needed as the size of network grows

Louisiana State University

Parallel mapping and circuit partitioning heuristics based on mean field annealing

Author: Bultan Tevfik
Publication venue: Bilkent University
Publication date: 01/01/1992
Field of study

Ankara : Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent University, 1992.Thesis (Master's) -- Bilkent University, 1992.Includes bibliographical references.Moan Field Annealinp; (MFA) aJgoritlim, receñí,ly proposc'd for solving com binatorial optimization problems, combines the characteristics of nenral networks and simulated annealing. In this thesis, MFA is formulated for tlie mapping i)roblcm and the circuit partitioning problem. EHicient implementation schemes, which decrease the complexity of the proposed algorithms by asymptotical factors, are also given. Perlormances of the proposed MFA algorithms are evaluated in comparison with two well-known heuristics: simulated annealing and Kernighan-Lin. Results of the experiments indicate that MFA can be used as an alternative heuristic for the mapping problem and the circuit partitioning problem. Inherent parallelism of the MFA is exploited by designing efficient parallel algorithms for the proposed MFA heuristics. Parallel MFA algorithms proposed for solving the circuit partitioning problem are implemented on an iPS(J/2’ hypercube multicompute.r. Experimental results show that the proposed heuristics can be efficiently parallelized, which is crucial for algorithms that solve such computationally hard problems.Bultan, TevfikM.S

Bilkent University Institutional Repository

Automatic visual recognition using parallel machines

Author: Chen Yui-Liang
Publication venue: Digital Commons @ NJIT
Publication date: 31/10/1995
Field of study

Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

Digital Commons @ New Jersey Institute of Technology (NJIT)

Centre for Information Science Research Annual Report, 1987-1991

Author: Australian National University
Publication venue
Publication date: 09/11/2018
Field of study

Annual reports from various departments of the AN

The Australian National University

Parallel Computers and Complex Systems

Author: G.C. Fox
P.D. Coddington
Publication venue: University Press
Publication date
Field of study

We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines

CiteSeerX