16 research outputs found
Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems
Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ).
This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)
Recommended from our members
The NON-VON Supercomputer Project: Current Ideology and Three-Year Plan
While we have learned a great deal during the first two years of the NON-VON Supercomputer Project, I am reluctant to commit myself at this point to anything that might be called a "position" regarding the direction and ultimate outcome of current research in the field of parallel architectures. In part, my hesitation reflects an appreciation for the difficulty of objectively assessing the state of the field as a whole while enmeshed in the "cult of personality" surrounding a particular machine. Fortunately, our local dogma has not yet become so rigid as to preclude the possibility of significant revisions of our beliefs in response to the experiences and ideas of our colleagues. At the same time, it is clear that our understanding of the essential issues of parallel machine design in general is colored by the particular challenges we have faced in the context of the NON-VON Project. In the following discussion, I will thus try to avoid any claims regarding the ideological correctness or historical inevitability of any of the architectural principles to which I now subscribe. In their place, I will attempt to list a few of our current architectural objectives, and to outline our tentative hardware implementation plans for the next three years. Software considerations will not be discussed in this document, despite the fact that they have occupied a large fraction of our time
Extending Static Synchronization Beyond SIMD and VLIW
A key advantage of SIMD (Single Instruction stream, Multiple Data stream) architectures is that synchronization is effected statically at compile-time, hence the execution-time cost of synchronization between “processes” is essentially zero. VLIW (Very Long Instruction Word) machines are successful in large part because they preserve this property while providing more flexibility in terms of what kinds of operations can be parallelized. In this paper, we propose a new kind of architecture —- the “static barrier MIMD” or SBM — which can be viewed as a further generalization of the parallel execution abilities of static synchronization machines. Barrier MIMDs are asynchronous Multiple Instruction stream Multiple Data stream architectures capable of parallel execution of loops, subprogram calls, and variable execution- time instructions; however, little or no run-time synchronization is needed. When a group of processors within a barrier MIMD has just encountered a barrier, any conceptual synchronizations between the processors are statically accomplished with zero cost — as in a SIMD or VLIW and using similar compiler technology. Unlike these machines, however, as execution continues the relative timing of processors may become less precisely knowable as a static, compile-time, quantity. Where this imprecision becomes too large, the compiler simply inserts a synchronization barrier to insure that timing imprecision at that point is zero, and again employs purely static, implicit, synchronization. Both the architecture and the supporting compiler technology are discussed in detail
Recommended from our members
Image Understanding Algorithms on Fine-Grained Tree-Structured SIMD Machines
An Important goal for researchers In computer vision is the construction vision systems that Interpret Image data in real time. Such systems typically require a large amount of computation for processing raw Image data at the lowest level, and for sophisticated decision making at the highest level Recent advances In VLSI circuitry· have led to several proposals for parallel architectures for computer vision systems. In this theSIS. we demonstrate that fine-grained tree-structured SIMD machines, which have favorable characteristics for efficient VLSI Implementation, can be used for the rapid execution of a wide range of Image understanding tasks We also Identify the limitations of these architectures and propose methods to ameliorate these difficulties. The NON-VON supercomputer, currently being constructed at Columbia University, is an example of such an architecture. The major contribution of this thesis IS the development and analysis of several parallel Image understanding algorithms for the class of architectures under consideration The algorithms developed In this research have been selected to span different levels of computer vision tasks They Include Image correlation, hlstogrammlng, connected component labeling, the computation of geometric properties, set operations, the Hough transform
method for detecting object boundaries, and the correspondence problem In
moving light display applications. The algorithms Incorporate novel approaches to reduce the effects of communication bottleneck usually associated With tree architecture
Recommended from our members
The DADO Parallel Computer
DADO is a parallel, tree-structured machine designed to provide significant performance improvements in the execution of large production systems. A full-scale production version of the DADO machine would comprise a large (on the order of a hundred thousand) set of processing elements (PE's), each containing its own processor, a small amount (8K bytes, in the current prototype design) of local random access memory, and a specialized I/O switch. The PE's are interconnected to form a complete binary tree. This paper describes the organization of, and programming language for two prototypes of the DADO system. We also detail a general procedure for the parallel execution of production systems on the DADO machine and outline how this procedure can be extended to include commutative and multiple, independent production systems. We then compare this with the RETE machine matching algorithm, and indicate how PROLOG programs may be implemented directly on DADO
Recommended from our members
Tree Machines: Architectures and Algorithms A Survey Paper
Recent advances in very large scale integrated (VLSI) circuit technology have lead to a surge in research aimed at finding new computer organizations that support a great deal of concurrency computer organizations based on tree structures appear well-suited to several kinds of parallel computations. In this paper we will discuss the performance of tree machines as well as Issues related to their implementation in VLSI. Examples of tree machines are presented, with an emphasis on the way the processing elements communicate in the machine. A taxonomy of tree algorithms based on a taxonomy of parallel algorithms proposed by Kung in 1979 is Introduced. Examples of tree algorithms are also given
Parallel architectures for image analysis
This thesis is concerned with the problem of designing an architecture specifically for the application of image analysis and object recognition. Image analysis is a complex subject area that remains only partially defined and only partially solved. This makes the task of designing an architecture aimed at efficiently implementing image analysis and recognition algorithms a difficult one.
Within this work a massively parallel heterogeneous architecture, the Warwick Pyramid Machine is described. This architecture consists of SIMD, MIMD and MSIMD modes of parallelism each directed at a different part of the problem. The performance of this architecture is analysed with respect to many tasks drawn from very different areas of the image analysis problem. These tasks include an efficient straight line extraction algorithm and a robust and novel geometric model based recognition system. The straight line extraction method is based on the local extraction of line segments using a Hough style algorithm followed by careful global matching and merging. The recognition system avoids quantising the pose space, hence overcoming many of the problems inherent with this class of methods and includes an analytical verification stage. Results and detailed implementations of both of these tasks are given
A multiple-SIMD architecture for image and tracking analysis
The computational requirements for real-time image based applications are such as to warrant the use of a parallel architecture. Commonly used parallel architectures conform to the classifications of Single Instruction Multiple Data (SIMD), or Multiple Instruction Multiple Data (MIMD). Each class of architecture has its advantages and dis-advantages. For example, SIMD architectures can be used on data-parallel problems, such as the processing of an image. Whereas MIMD architectures are more flexible and better suited to general purpose computing. Both types of processing are typically required for the analysis of the contents of an image.
This thesis describes a novel massively parallel heterogeneous architecture, implemented as the Warwick Pyramid Machine. Both SIMD and MIMD processor types are combined within this architecture. Furthermore, the SIMD array is partitioned, into smaller SIMD sub-arrays, forming a Multiple-SIMD array. Thus, local data parallel, global data parallel, and control parallel processing are supported.
After describing the present options available in the design of massively parallel machines and the nature of the image analysis problem, the architecture of the Warwick Pyramid Machine is described in some detail. The performance of this architecture is then analysed, both in terms of peak available computational power and in terms of representative applications in image analysis and numerical computation. Two tracking applications are also analysed to show the performance of this architecture. In addition, they illustrate the possible partitioning of applications between the SIMD and MIMD processor arrays.
Load-balancing techniques are then described which have the potential to increase the utilisation of the Warwick Pyramid Machine at run-time. These include mapping techniques for image regions across the Multiple-SIMD arrays, and for the compression of sparse data. It is envisaged that these techniques may be found useful in other parallel systems
Solution of partial differential equations on vector and parallel computers
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed