587 research outputs found
Recommended from our members
On the Application of Massively Parallel SIMD Tree Machines to Certain Intermediate-Level Vision Tasks
In this paper, we examine the implementation of two middle-level image understanding tasks on fine-grained tree-structured SIMD machines, which have highly efficient VLSI implementations. We first present one such massively parallel machine called NON-VON, and summarize the cost/performance trade-offs of such machines for vision taks. We follow with a more detailed description of the NON-VON architecture (a prototype of which has been operational since January 1985), and of the high-level parallel language in which our algorithms have been written and simulated. The heart of the paper consists of the description and analysis of algorithms for a representative Hough transform, and of an algorithm for the interpretation of moving light displays. Novel algorithmic techniques are motivated and described, and simulation timings are presented and discussed. We conclude that it is possible to exploit the available massive parallelism while avoiding many of the communication bottlenecks common at this level of image understanding, by carefully and inexpensively duplicating data and/or control information, and by delaying or avoiding the reporting of intermediate results
Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems
Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ).
This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)
Recommended from our members
SIMD Tree Algorithms for Image Correlation
This paper examines the applicability of fine-grained tree-structured SIMD machines, which are amenable to highly efficient VLSI implementation to image correlation which is a representative of image window-based operations. Several algorithms are presented for image shifting and correlation operations. A particular massively parallel machine called NON-VON is used for purposes of explication and performance evaluation. Although the most recent version of the NON-VON architecture also supports other interconnection topologies and execution modes, only its tree-structured communication capabilities and its SIMD mode of execution are considered in this paper. Novel algorithmic techniques are described, such as vertical pipelining, subproblem partitioning, associative matching, and data duplication that effectively exploit the massive parallelism available in fine-grained SIMD tree machines while avoiding communication bottlenecks. Simulation results are presented and compared with results obtained or forecast for other highly parallel machines. The relative advantages and limitations of the class of machines under consideration are then outlined
Performance analysis of massively parallel embedded hardware architectures for retinal image processing
This paper examines the implementation of a retinal vessel tree extraction technique on different hardware platforms and architectures. Retinal vessel tree extraction is a representative application of those found in the domain of medical image processing. The low signal-to-noise ratio of the images leads to a large amount of low-level tasks in order to meet the accuracy requirements. In some applications, this might compromise computing speed. This paper is focused on the assessment of the performance of a retinal vessel tree extraction method on different hardware platforms. In particular, the retinal vessel tree extraction method is mapped onto a massively parallel SIMD (MP-SIMD) chip, a massively parallel processor array (MPPA) and onto an field-programmable gate arrays (FPGA)This work is funded by Xunta de Galicia under the projects 10PXIB206168PR and 10PXIB206037PR and the program Maria BarbeitoS
Dynamically reconfigurable architecture for embedded computer vision systems
The objective of this research work is to design, develop and implement a new architecture which integrates on the same chip all the processing levels of a complete Computer Vision system, so that the execution is efficient without compromising the power consumption while keeping a reduced cost. For this purpose, an analysis and classification of different mathematical operations and algorithms commonly used in Computer Vision are carried out, as well as a in-depth review of the image processing capabilities of current-generation hardware devices. This permits to determine the requirements and the key aspects for an efficient architecture. A representative set of algorithms is employed as benchmark to evaluate the proposed architecture, which is implemented on an FPGA-based system-on-chip. Finally, the prototype is compared to other related approaches in order to determine its advantages and weaknesses
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems
Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications
NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
- …