Search CORE

241 research outputs found

Parallel Architectures and Parallel Algorithms for Integrated Vision Systems

Author: Choudhary Alok Nidhi
Publication venue
Publication date
Field of study

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems

NASA Technical Reports Server

Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems

Author: Katriel Ron
Publication venue: ScholarlyCommons
Publication date: 28/09/1987
Field of study

Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ). This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)

ScholarlyCommons@Penn

Improving GPU performance : reducing memory conflicts and latency

Author: Braak van den, G.J.W.
Publication venue: Technische Universiteit Eindhoven
Publication date: 25/11/2015
Field of study

Pure OAI Repository

Improving GPU performance : reducing memory conflicts and latency

Author: Braak van den, G.J.W.
Publication venue: Technische Universiteit Eindhoven
Publication date: 25/11/2015
Field of study

Pure OAI Repository

Computer vision algorithms on reconfigurable logic arrays

Author: A.K. Jain
N.K. Ratha
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

GPU accelerated parallel Iris segmentation

Author: Kurani Kritika
Publication venue
Publication date: 12/05/2014
Field of study

A biometric system provides automatic identification of an individual based on a unique feature or characteristic possessed by the person. Iris recognition systems are the most definitive biometric system since complex random iris patterns are unique to each individual and do not change with time. Iris Recognition is basically divided into three steps, namely, Iris Segmentation or Localization, Feature Extraction and Template Matching. To get a performance gain for the entire system it becomes vital to improve performance of each individual process. Localization of the iris borders in an eye image can be considered as a vital step in the iris recognition process due to high processing required. The Iris Segmentation algorithms are currently implemented on general purpose sequential processing systems, such as common Central Processing Units (CPUs). In this thesis, an attempt has been made to present a more straight and parallel processing alternative using the graphics processing unit (GPU), which originally was used exclusively for visualization purposes, and has evolved into an extremely powerful coprocessor, offering an opportunity to increase speed and potentially intensify the resulting system performance. To realize a speedup in Iris Segmentation, NVIDIA’s Compute Unified Device Architecture (CUDA) programming model has been used. Iris Localization is achieved by implementing Hough Circular Transform on edge image obtained by using Canny edge detection technique. Parallelism is employed in Hough Transformation step

ethesis@nitr

Real-time registration and simulation in medical imaging

Author: Shams Ramtin
Publication venue
Publication date: 21/11/2018
Field of study

The Australian National University

Programming issues for video analysis on Graphics Processing Units

Author: Gómez Luna Juan
Publication venue: Universidad de Córdoba, Servicio de Publicaciones
Publication date: 01/01/2012
Field of study

El procesamiento de vídeo es la parte del procesamiento de señales, donde las señales de entrada y/o de salida son secuencias de vídeo. Cubre una amplia variedad de aplicaciones que son, en general, de cálculo intensivo, debido a su complejidad algorítmica. Por otra parte, muchas de estas aplicaciones exigen un funcionamiento en tiempo real. El cumplimiento de estos requisitos hace necesario el uso de aceleradores hardware como las Unidades de Procesamiento Gráfico (GPU). El procesamiento de propósito general en GPU representa una tendencia exitosa en la computación de alto rendimiento, desde el lanzamiento de la arquitectura y el modelo de programación NVIDIA CUDA. Esta tesis doctoral trata sobre la paralelización eficiente de aplicaciones de procesamiento de vídeo en GPU. Este objetivo se aborda desde dos vertientes: por un lado, la programación adecuada de la GPU para aplicaciones de vídeo; por otro lado, la GPU debe ser considerada como parte de un sistema heterogéneo. Dado que las secuencias de vídeo se componen de fotogramas, que son estructuras de datos regulares, muchos componentes de las aplicaciones de vídeo son inherentemente paralelizables. Sin embargo, otros componentes son irregulares en el sentido de que llevan a cabo cálculos que dependen de la carga de trabajo, sufren contención en la escritura, contienen partes inherentemente secuenciales o desbalanceadas en carga... Esta tesis propone estrategias para hacer frente a estos aspectos, a través de varios casos de estudio. También se describe una aproximación optimizada al cálculo de histogramas basada en un modelo de rendimiento de la memoria. Las secuencias de vídeo son flujos continuos que deben ser transferidos desde el ¿host¿ (CPU) al dispositivo (GPU), y los resultados del dispositivo al ¿host¿. Esta tesis doctoral propone el uso de CUDA streams para implementar el paradigma de ¿stream processing¿ en la GPU, con el fin de controlar la ejecución simultánea de las transferencias de datos y de la computación. También propone modelos de rendimiento que permiten una ejecución óptima

Repositorio Institucional de la Universidad de Córdoba

Fast algorithm for real-time rings reconstruction

Author: Ammendola R.
Bauce Matteo
Biagioni A.
Capuani S.
Chiozzi Stefano
Cotta Ramusino Angelo
Di Domenico Giovanni
Fantechi R.
Fiorini Massimiliano
Giagu S.
Gianoli Alberto
Graverini E.
Lamanna Gianluca
Lonardo A.
Messina A.
Neri Ilaria
Palombo Marco
Pantaleo F.
Paolucci P.S.
Piandani R.
Pontisso L.
Rescigno M.
Simula F.
Sozzi Marco
Vicini P.
Publication venue: Verlag Deutsches Elektronen-Synchrotron
Publication date: 01/01/2015
Field of study

The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of μs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

DESY Publication Database

DESY

Archivio istituzionale della ricerca - Università di Ferrara

Archivio della ricerca- Università di Roma La Sapienza

CERN Document Server

NETRA - A Parallel Architecture for Integrated Vision Systems II: Algorithms and Performance Evaluation

Author: Ahuja Narendra
Choudhary Alok N.
Patel Janak H.
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/12/1989
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNational Aeronautics and Space Administration / NASA NAG-1-61

Illinois Digital Environment for Access to Learning and Scholarship Repository

NASA Technical Reports Server