Search CORE

587 research outputs found

Recommended from our members

On the Application of Massively Parallel SIMD Tree Machines to Certain Intermediate-Level Vision Tasks

Author: Ibrahim Hussein
Kender John R.
Shaw David Elliot
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1985
Field of study

In this paper, we examine the implementation of two middle-level image understanding tasks on fine-grained tree-structured SIMD machines, which have highly efficient VLSI implementations. We first present one such massively parallel machine called NON-VON, and summarize the cost/performance trade-offs of such machines for vision taks. We follow with a more detailed description of the NON-VON architecture (a prototype of which has been operational since January 1985), and of the high-level parallel language in which our algorithms have been written and simulated. The heart of the paper consists of the description and analysis of algorithms for a representative Hough transform, and of an algorithm for the interpretation of moving light displays. Novel algorithmic techniques are motivated and described, and simulation timings are presented and discussed. We conclude that it is possible to exploit the available massive parallelism while avoiding many of the communication bottlenecks common at this level of image understanding, by carefully and inexpensively duplicating data and/or control information, and by delaying or avoiding the reporting of intermediate results

Columbia University Academic Commons

Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems

Author: Katriel Ron
Publication venue: ScholarlyCommons
Publication date: 28/09/1987
Field of study

Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ). This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)

ScholarlyCommons@Penn

Recommended from our members

SIMD Tree Algorithms for Image Correlation

Author: Ibrahim Hussein
Kender John R.
Shaw David Elliot
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1986
Field of study

This paper examines the applicability of fine-grained tree-structured SIMD machines, which are amenable to highly efficient VLSI implementation to image correlation which is a representative of image window-based operations. Several algorithms are presented for image shifting and correlation operations. A particular massively parallel machine called NON-VON is used for purposes of explication and performance evaluation. Although the most recent version of the NON-VON architecture also supports other interconnection topologies and execution modes, only its tree-structured communication capabilities and its SIMD mode of execution are considered in this paper. Novel algorithmic techniques are described, such as vertical pipelining, subproblem partitioning, associative matching, and data duplication that effectively exploit the massive parallelism available in fine-grained SIMD tree machines while avoiding communication bottlenecks. Simulation results are presented and compared with results obtained or forecast for other highly parallel machines. The relative advantages and limitations of the class of machines under consideration are then outlined

Columbia University Academic Commons

Performance analysis of massively parallel embedded hardware architectures for retinal image processing

Author: Alejandro Nieto
David L Vilariño
Roberto R Osorio
Victor Brea
Publication venue: Springer Nature
Publication date: 01/01/2011
Field of study

This paper examines the implementation of a retinal vessel tree extraction technique on different hardware platforms and architectures. Retinal vessel tree extraction is a representative application of those found in the domain of medical image processing. The low signal-to-noise ratio of the images leads to a large amount of low-level tasks in order to meet the accuracy requirements. In some applications, this might compromise computing speed. This paper is focused on the assessment of the performance of a retinal vessel tree extraction method on different hardware platforms. In particular, the retinal vessel tree extraction method is mapped onto a massively parallel SIMD (MP-SIMD) chip, a massively parallel processor array (MPPA) and onto an field-programmable gate arrays (FPGA)This work is funded by Xunta de Galicia under the projects 10PXIB206168PR and 10PXIB206037PR and the program Maria BarbeitoS

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Repositorio Institucional da Universidade de Santiago de Compostela

Dynamically reconfigurable architecture for embedded computer vision systems

Author: Nieto Lareo Alejandro Manuel
Publication venue
Publication date: 01/01/2013
Field of study

The objective of this research work is to design, develop and implement a new architecture which integrates on the same chip all the processing levels of a complete Computer Vision system, so that the execution is efficient without compromising the power consumption while keeping a reduced cost. For this purpose, an analysis and classification of different mathematical operations and algorithms commonly used in Computer Vision are carried out, as well as a in-depth review of the image processing capabilities of current-generation hardware devices. This permits to determine the requirements and the key aspects for an efficient architecture. A representative set of algorithms is employed as benchmark to evaluate the proposed architecture, which is implemented on an FPGA-based system-on-chip. Finally, the prototype is compared to other related approaches in order to determine its advantages and weaknesses

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional da Universidade de Santiago de Compostela

Parallel Architectures and Parallel Algorithms for Integrated Vision Systems

Author: Choudhary Alok Nidhi
Publication venue
Publication date
Field of study

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems

NASA Technical Reports Server

Computer vision algorithms on reconfigurable logic arrays

Author: A.K. Jain
N.K. Ratha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

Author: Fischer James R.
Grosch Chester
Mcanulty Michael
Odonnell John
Storey Owen
Publication venue
Publication date
Field of study

NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

NASA Technical Reports Server

PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

Author: Ahmed Fasih
Andreas Klöckner
Bell
Bryan Catanzaro
Buck
Chandler
Dalcín
Eich
Feldman
Flanagan
Frigo
Group
Hestenes
Hesthaven
Kennedy
Klöckner
Lam
Langtangen
Lindholm
McCarthy
McCool
Nicolas Pinto
Oliphant
Owens
Paul Ivanov
Pinto
Pinto
Prud’homme
Reynders
Seiler
Stein
Valiant
van Hateren
Veldhuizen
Wang
Whaley
Yunsup Lee
Publication venue: 'Elsevier BV'
Publication date: 29/03/2011
Field of study

High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie

arXiv.org e-Print Archive

Crossref

Performance analysis of massively parallel embedded hardware architectures for retinal image processing

Author: A Agarwal
A Duller
A Lopich
A Nieto
Alejandro Nieto
AW Topol
B Hutchings
C Alonso-Montes
C Alonso-Montes
C Alonso-Montes
C Resco
C Wilson
CA Montes
D Foty
D Ortiz
D Vilarino
D Vilariño
David L Vilariño
F Hannig
F Paillet
H Kurino
H Rode
I Bankman
J Curreri
J Lowell
J Staal
K DeHaven
L Cohen
M Butts
M Kass
M Niemeijer
N Patton
N Salem
N-X Lion
P Dudek
P Dudek
P Dudek
P Foldesy
Q Li
Roberto R Osorio
T Komuro
T Saegusa
Victor Brea
W Dally
W MacLean
W Schroder-Preikschat
Á Rodríguez-Vázquez
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref