Search CORE

47 research outputs found

Efficient implementation of video processing algorithms on FPGA

Author: Sims Oliver
Publication venue
Publication date: 01/01/2007
Field of study

The work contained in this portfolio thesis was carried out as part of an Engineering Doctorate (Eng.D) programme from the Institute for System Level Integration. The work was sponsored by Thales Optronics, and focuses on issues surrounding the implementation of video processing algorithms on field programmable gate arrays (FPGA). A description is given of FPGA technology and the currently dominant methods of designing and verifying firmware. The problems of translating a description of behaviour into one of structure are discussed, and some of the latest methodologies for tackling this problem are introduced. A number of algorithms are then looked at, including methods of contrast enhancement, deconvolution, and image fusion. Algorithms are characterised according to the nature of their execution flow, and this is used as justification for some of the design choices that are made. An efficient method of performing large two-dimensional convolutions is also described. The portfolio also contains a discussion of an FPGA implementation of a PID control algorithm, an overview of FPGA dynamic reconfigurability, and the development of a demonstration platform for rapid deployment of video processing algorithms in FPGA hardware

Glasgow Theses Service

OpenGrey Repository

Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs using Machine Learning

Author: Dublish Saumay
Nagarajan Vijayanand
Topham Nigel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/03/2019
Field of study

Crossref

Edinburgh Research Explorer

Hardware acceleration of the trace transform for vision applications

Author: Fahmy Suhaib A.
Publication venue: Department of Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2008
Field of study

Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

Spiral - Imperial College Digital Repository

Recommended from our members

A graph theoretic approach to transputer network design for computer vision

Author: Omarouayache S.
Publication venue
Publication date
Field of study

The work in this thesis is concerned with parallel architectures based on the Inmos transputer-type processors and parallelisation of some computer vision tasks chosen from low to high level. The transputer is a microprocessor with a micro-programmed scheduler and four serial communication links. It directly supports parallel processing since several transputers can be connected through their links to co-operate on solving a problem. Also several processes can be run on the same transputer. A major issue in parallel processing is the communication overhead introduced by parallelising a given task. This overhead is not present in sequential processing and must be curbed if the implementation of a task on a parallel machine is to be successful. The interconnection network underlying the architecture of a parallel computer is therefore of the utmost importance. Computer Vision consists of a hierarchy of tasks ranging from low-level operations dealing with large amounts of relatively simple data to high level operations handling increasingly complex structures. In this work a novel edge detector based on adaptive filtering and an edge detector operating on colour images are presented and implemented on a number of transputers. These parallel implementations together with implementations of vector quantisation, Fourier descriptors for shape discrimination, the Hough transform and the Maximum clique algorithm, offer a notable performance increase when compared with sequential implementations. However, every algorithm required the design of a specific network of transputers to take advantage of the parallelism and data dependencies inherent in each. Consequently, attention is focused on the topology of interconnection networks. In particular, the communication requirements of computer vision algorithms as identified by the various computer vision tasks are analysed. These requirements together with graph theoretical considerations are then used to suggest a topology for large transputer networks. The latter is based on sub-graphs, with proven performance when used to implement interconnection networks, combined to form an architecture with improved performance. This architecture consists of a fixed structure supplemented with a dynamically reconfigured network. After describing this topology, a routing algorithm that conveys messages along shortest paths in the network is given and implemented. And finally, some practical issues in the use of transputers are considered and solutions proposed

City Research Online

Architecture matérielle logicielle pour l'exécution à latence réduite d'applications de télécommunications émergentes sur centre de données

Author: Gémieux Michel
Publication venue
Publication date: 01/04/2020
Field of study

RÉSUMÉ L’industrie des technologies de l’information et des communications fait face à une demande croissante de services sans fil et Internet omniprésents. Cette demande est alimentée par une explosion du nombre d’appareils mobiles riches en multimédia. Il a été estimé qu’à partir de cette année, 2020, le volume de trafic de données mobiles doublera chaque année pour plusieurs années. En conséquence, il en résulte une augmentation significative des dépenses en capital pour les systèmes construits sur les technologies actuelles de réseau d’accès ra-dio qui sont essentiellement basées sur des architectures avec une structure fixe utilisant des plates-formes propriétaires et des mécanismes de contrôle et de gestion de réseau distribués. D’autre part, pour garantir la qualité de service requise, les sous-systèmes sont dimensionnés en fonction des demandes de pointe. Par conséquent, l’extension du réseau aura un impact considérable sur les dépenses d’exploitation. La recherche proposée vise à développer une architecture matérielle et logicielle adaptée à une grappe d’unités de traitement virtualisée pour les signaux en bande de base d’accès radio en nuagique. Ce type d’architecture de-vra prendre en charge le traitement en temps réel avec des processeurs généralistes sur une plateforme hétérogène. Cela soulève deux défis principaux : la planification des tâches en temps réel et leur exécution d’une manière plus déterministe par rapport aux plates-formes généralistes existantes. Ainsi, les mécanismes d’allocation et de gestion des ressources dans les grappes informatiques doivent être revus. Le deuxième défi est d’obtenir un comporte-ment à faible variance qui implique deux préoccupations majeures : le temps de calcul et le délai de communication. Essentiellement, la variation du temps de calcul est inhérente à tous les processeurs généralistes. Néanmoins, l’infrastructure de communication des grappes informatiques existantes ne fournit aucun soutien pour les communications à faible variance. La recherche proposée est divisée en deux principaux sujets : Le calcul dynamique, l’allocation et la gestion des ressources réseau dans une grappeinformatique (hétérogène) : les algorithmes d’allocation dynamique des ressources et de planification des tâches en temps réel formeront la fonctionnalité de base prise en charge par le plan de contrôle. Afin de répondre aux fortes contraintes en temps réel de cette classe d’applications, une implémentation matérielle parallèle basée sur circuit logique programmable (FPGA) du plan de contrôle est proposée.----------ABSTRACT The Information and Communications Technology industry is facing an increasing demand for ubiquitous wireless and Internet services introduced by an explosion of multimedia-rich mobile devices. It is estimated that starting this year, 2020, the volume of mobile data traÿcs will double every year. Consequently, it results in significant increases of capital expenditures for systems built on the current Radio Access Network technologies, which are essentially based on architectures with a fixed structure (not reconfigurable) using proprietary platforms with distributed network control and management mechanisms. To ensure the required quality of service, subsystems are dimensioned with respect to the peak demands. Therefore, network expansion will considerably impact on operating expenditures. This thesis aims at developing an architecture at both hardware and software levels suitable for a virtualized Baseband Processing Unit pool in Cloud Radio Acces Network in order to support real-time processing in a General Purpose Processor based platform. This raises two main challenges: scheduling tasks in real-time and executing them in a manner that is reduces variance compared to the existing General Purpose Processor based platforms. Real-time tasks from radio air interface in the Cloud Radio Access Network must be scheduled at a finer grain and must be completed within a given timeslot. Thus, mechanisms for resource allocation and management in computing clusters must be revisited. The second challenge is obtaining a behavior with reduced variability that involves two major concerns: computing time and communication delay. Nevertheless, the communication infrastructure of existing computing clusters does not provide any support for low variance communications. The proposed research is divided into the following main subjects:Adaptive computing and network resource allocation and management in (hetero-geneous) computing clusters: The algorithms for dynamic resources allocation and real-time task scheduling will form the core functionality that the control plane will support. In order to meet the hard real-time constraints of that class of applications, a parallel Field Programable Gate Array based hardware implementation of the control plane is proposed

PolyPublie

A transputer based parallel database system.

Author: Gray Jonathan Patrick.
Publication venue: Sheffield Hallam University,
Publication date
Field of study

A sophisticated database application generation environment known as DB4GL has been developed at Sheffield City Polytechnic. A unique feature of DB4GL is the object-oriented application model used to specify and generate database applications. Although DB4GL has many advanced and powerful features, such as a self-describing data dictionary and extensive integrity rule processing facilities; the system has not been designed for high performance in either the generation tools or the generated database applications. The Parallel-DB4GL (P-DB4GL) project represents an attempt to improve the performance of the generated database applications, by constructing a new concurrent implementation of DB4GL for execution on transputer-based parallel hardware. This thesis describes the DB4GL system as developed to the commencement of the P-DB4GL project. A prototype P-DB4GL system has been implemented that demonstrates how significant performance gains can be obtained from a concurrent implementation on transputer-based parallel hardware. Based on the successful results of this prototype system, designs for a fully functional multiprocessor P-DB4GL system are proposed. The details of this prototype and the fully functional designs are presented in this thesis. The thesis also provides an evaluation of the P-DB4GL project as a whole, and concludes with some suggestions for further research in the areas of parallel databases and object-oriented system implementation

Sheffield Hallam University Research Archive

The application of optimal transputer architecture to concurrent processing in the implementation of vision processing algorithms

Author: Bennett Ian
Publication venue
Publication date: 01/05/1989
Field of study

University of South Wales Research Explorer

High level design and control of adaptive multiprocessor system-on-chips

Author: An Xin
Publication venue: HAL CCSD
Publication date: 16/10/2013
Field of study

The design of modern embedded systems is getting more and more complex, as more func- tionality is integrated into these systems. At the same time, in order to meet the compu- tational requirements while keeping a low level power consumption, MPSoCs have emerged as the main solutions for such embedded systems. Furthermore, embedded systems are be- coming more and more adaptive, as the adaptivity can bring a number of benefits, such as software flexibility and energy efficiency. This thesis targets the safe design of such adaptive MPSoCs. First, each system configuration must be analyzed concerning its functional and non- functional properties. We present an abstract design and analysis framework, which allows for faster and cost-effective implementation decisions. This framework is intended as an intermediate reasoning support for system level software/hardware co-design environments. It can prune the design space at its largest, and identify candidate design solutions in a fast and efficient way. In the framework, we use an abstract clock-based encoding to model system behaviors. Different mapping and scheduling scenarios of applications on MPSoCs are analyzed via clock traces representing system simulations. Among properties of interest are functional behavioral correctness, temporal performance and energy consumption. Second, the reconfiguration management of adaptive MPSoCs must be addressed. We are specially interested in MPSoCs implemented on reconfigurable hardware architectures (i.e., FPGA fabrics), which provide a good flexibility and computational efficiency for adap- tive MPSoCs. We propose a general design framework based on the discrete controller syn- thesis (DCS) technique to address this issue. The main advantage of this technique is that it allows the automatic controller synthesis w.r.t. a given specification of control objectives. In the framework, the system reconfiguration behavior is modeled in terms of synchronous parallel automata. The reconfiguration management computation problem w.r.t. multiple objectives regarding e.g., resource usages, performance and power consumption is encoded as a DCS problem. The existing BZR programming language and Sigali tool are employed to perform DCS and generate a controller that satisfies the system requirements. Finally, we investigate two different ways of combining the two proposed design frame- works for adaptive MPSoCs. Firstly, they are combined to construct a complete design flow for adaptive MPSoCs. Secondly, they are combined to present how the designed run-time manager by the second framework can be integrated into the first framework so that high level simulations can be performed to assess the run-time manager.La conception de systèmes embarqués modernes est de plus en plus complexe, car plus de fonctionnalités sont intégrées dans ces systèmes. En même temps, afin de répondre aux exigences de calcul tout en conservant une consommation d'énergie de faible niveau, MPSoCs sont apparus comme les principales solutions pour tels systèmes embarqués. En outre, les systèmes embarqués sont de plus en plus adaptatifs, comme l’adaptabilité peut apporter un certain nombre d'avantages, tels que la flexibilité du logiciel et l'efficacité énergétique. Cette thèse vise la conception sécuritaire de ces MPSoCs adaptatifs. Tout d'abord, chaque configuration de système doit être analysée en ce qui concerne ses propriétés fonctionnelles et non fonctionnelles. Nous présentons un cadre abstraite de conception et d’analyse qui permet des décisions d’implémentation plus rapide et plus rentable. Ce cadre est conçu comme un support de raisonnement intermédiaire pour les environnements de co-conception de logiciel / matériel au niveau de système. Il peut élaguer l'espace de conception à sa plus grande portée, et identifier les candidats de solutions de conception de manière rapide et efficace. Dans ce cadre, nous utilisons un codage basé sur l’horloge abstrait pour modéliser les comportements du système. Différents scénarios d'applications de mapping et de planification sur MPSoCs sont analysés via les traces d'horloge qui représentent les simulations du système. Les propriétés d'intérêt sont l’exactitude du comportement fonctionnel, la performance temporelle et la consommation d'énergie. Deuxièmement, la gestion de la reconfiguration de MPSoCs adaptatifs doit être abordée. Nous sommes particulièrement intéressés par les MPSoCs implémentés sur des architectures reconfigurables de hardware (ex. FPGA tissus) qui offrent une bonne flexibilité et une efficacité de calcul pour les MPSoCs adaptatifs. Nous proposons un cadre général de conception basésur la technique de la synthèse de contrôleurs discrets (SCD) pour résoudre ce problème. L’avantage principal de cette technique est qu'elle permet une synthèse d'un contrôleur automatique vis-à-vis d’une spécification donnée des objectifs de contrôle. Dans ce cadre, le comportement de reconfiguration du système est modélisé en termes d'automates synchrones en parallèle. Le problème de calcul de la gestion reconfiguration vis-à-vis de multiples objectifs concernant, par exemple, les usages des ressources, la performance et la consommation d’énergie est codé comme un problème de SCD . Le langage de programmation BZR existant et l’outil Sigali sont employés pour effectuer SCD et générer un contrôleur qui satisfait aux exigences du système. Finalement, nous étudions deux façons différentes de combiner les deux cadres de conception proposées pour MPSoCs adaptatifs. Tout d'abord, ils sont combinés pour construire un flot de conception complet pour MPSoCs adaptatifs. Deuxièmement, ils sont combinés pour présenter la façon dont le gestionnaire d'exécution conçu dans le second cadre peut être intégré dans le premier cadre de sorte que les simulations de haut niveau peuvent être effectuées pour évaluer le gestionnaire d'exécution

Thèses en Ligne

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server