Search CORE

834 research outputs found

A systematic implementation of image processing algorithms on configurable computing hardware

Author: Levine Benjamin Alexander
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/1999
Field of study

Configurable computing hardware has many advantages over both general-purpose processors and application specific hardware. However, the difficulty of using this type of hardware has limited its use. An automated system for implementing image Processing applications in configurable hardware, called CHAMPION, is under development at the University of Tennessee. CHAMPION will map applications in the Khoros Cantata graphical programming environment to hardware. A relatively complex automatic target recognition (ATR) application was manually mapped from Cantata to a commercially available configurable computing platform. This manual implementation was done to assist in the development of function libraries and hardware for use in the CHAMPION systems, as well as to develop procedures to perform the application mapping. The mapping techniques used were developed in such a way that they could serve as the basis for the automated system. Many important considerations for the mapping process were identified and included in the mapping algorithms. The manual mapping was successful, allowing the ATR application to be run on a Wildforce-XL configurable computing board. The successful application implementation validated the basic hardware design and mapping concepts to be used in CHAMPION. Nearly a tenfold performance increase was realized in the hardware implementation and performance bottlenecks were identified which should enable even greater performance improvements to be realized in the automated system. The manual implementation also helped to identify some of the challenges that must be overcome to complete the development of the automated system

University of Tennessee, Knoxville: Trace

FPGAs in Industrial Control Applications

Author: Bahri Imene
Cirstea Marcian N.
Idkhajine Lahoucine
Monmasson Eric
Naouar Mohamed W.
Tisan Alin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2011
Field of study

The aim of this paper is to review the state-of-the-art of Field Programmable Gate Array (FPGA) technologies and their contribution to industrial control applications. Authors start by addressing various research fields which can exploit the advantages of FPGAs. The features of these devices are then presented, followed by their corresponding design tools. To illustrate the benefits of using FPGAs in the case of complex control applications, a sensorless motor controller has been treated. This controller is based on the Extended Kalman Filter. Its development has been made according to a dedicated design methodology, which is also discussed. The use of FPGAs to implement artificial intelligence-based industrial controllers is then briefly reviewed. The final section presents two short case studies of Neural Network control systems designs targeting FPGAs

HAL-CentraleSupelec

Crossref

Anglia Ruskin Research

Hal-Diderot

HAL-Rennes 1

Efficient Hardware Architectures for Accelerating Deep Neural Networks: Survey

Author: Boppu Srinivas
Cenkeramaddi Linga Reddy
Dhilleswararao Pudi
Manikandan M. Sabarimalai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

In the modern-day era of technology, a paradigm shift has been witnessed in the areas involving applications of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Specifically, Deep Neural Networks (DNNs) have emerged as a popular field of interest in most AI applications such as computer vision, image and video processing, robotics, etc. In the context of developed digital technologies and the availability of authentic data and data handling infrastructure, DNNs have been a credible choice for solving more complex real-life problems. The performance and accuracy of a DNN is a way better than human intelligence in certain situations. However, it is noteworthy that the DNN is computationally too cumbersome in terms of the resources and time to handle these computations. Furthermore, general-purpose architectures like CPUs have issues in handling such computationally intensive algorithms. Therefore, a lot of interest and efforts have been invested by the research fraternity in specialized hardware architectures such as Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), and Coarse Grained Reconfigurable Array (CGRA) in the context of effective implementation of computationally intensive algorithms. This paper brings forward the various research works carried out on the development and deployment of DNNs using the aforementioned specialized hardware architectures and embedded AI accelerators. The review discusses the detailed description of the specialized hardware-based accelerators used in the training and/or inference of DNN. A comparative study based on factors like power, area, and throughput, is also made on the various accelerators discussed. Finally, future research and development directions are discussed, such as future trends in DNN implementation on specialized hardware accelerators. This review article is intended to serve as a guide for hardware architectures for accelerating and improving the effectiveness of deep learning research.publishedVersio

Agder University Research Archive

Design for novel enhanced weightless neural network and multi-classifier.

Author: Lorrentz Pierre
Publication venue
Publication date: 01/11/2009
Field of study

Weightless neural systems have often struggles in terms of speed, performances, and memory issues. There is also lack of sufficient interfacing of weightless neural systems to others systems. Addressing these issues motivates and forms the aims and objectives of this thesis. In addressing these issues, algorithms are formulated, classifiers, and multi-classifiers are designed, and hardware design of classifier are also reported. Specifically, the purpose of this thesis is to report on the algorithms and designs of weightless neural systems. A background material for the research is a weightless neural network known as Probabilistic Convergent Network (PCN). By introducing two new and different interfacing method, the word "Enhanced" is added to PCN thereby giving it the name Enhanced Probabilistic Convergent Network (EPCN). To solve the problem of speed and performances when large-class databases are employed in data analysis, multi-classifiers are designed whose composition vary depending on problem complexity. It also leads to the introduction of a novel gating function with application of EPCN as an intelligent combiner. For databases which are not very large, single classifiers suffices. Speed and ease of application in adverse condition were considered as improvement which has led to the design of EPCN in hardware. A novel hashing function is implemented and tested on hardware-based EPCN. Results obtained have indicated the utility of employing weightless neural systems. The results obtained also indicate significant new possible areas of application of weightless neural systems

Kent Academic Repository

Automatic mapping of graphical programming applications to microelectronic technologies

Author: Ong Sze-Wei
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2001
Field of study

Adaptive computing systems (ACSs) and application-specific integrated circuits (ASICs) can serve as flexible hardware accelerators for applications in domains such as image processing and digital signal processing. However, the mapping of applications onto ACSs and ASICs using the traditional methods can take months for a hardware engineer to develop and debug. In this dissertation, a new approach for automatic mapping of software applications onto ACSs and ASICs has been developed, implemented and validated. This dissertation presents the design flow of the software environment called CHAMPION, which is being developed at the University of Tennessee. This environment permits high-level design entry using the Cantata graphical programming software fromKRI. Using Cantata as the design entry, CHAMPION hides from the user the low-level details of the hardware architecture and the finer issues of application mapping onto the hardware. Validation of the CHAMPION environment was performed using multiple applications of moderate complexity. In one case, theapplication mapping time which required six weeks to perform manually took only six minutes for CHAMPION, yet comparable results were produced. Furthermore, the CHAMPION environment was constructed such that retargeting to a new adaptive computing system could be accomplished in just a few hours as opposed to weeks using manual methods. Thus, CHAMPION permits both ACSs and ASICs to be utilized by a wider audience and application development accomplished in less time

University of Tennessee, Knoxville: Trace

Real-Time Image Analysis of Living Cellular-Biology Measurements of Intelligent Chemistry

Author: Budge Scott E.
Majors P.D.
Rex B.
Solinsky J.C.
Publication venue: Hosted by Utah State University Libraries
Publication date: 01/05/2003
Field of study

This paper reports on the Pacific Northwest National Laboratory (PNNL) DOE Initiative in Image Science and Technology (ISAT) research, which is developing algorithms and software tool sets for remote sensing and biological applications. In particular, the PNNL ISAT work is applying these research results to the automated analysis of real-time cellular biology imagery to assist the biologist in determining the correct data collection region for the current state of a conglomerate of living cells in three-dimensional motion. The real-time computation of the typical 120 MB/sec multi-spectral data sets is executed in a Field Programmable Gate Array (FPGA) technology, which has very high processing rates due to large-scale parallelism. The outcome of this artificial vision work will allow the biologist to work with imagery as a creditable set of dye-tagged chemistry measurements in formats for individual cell tracking through regional feature extraction, and animation visualization through individual object isolation/characterization of the microscopy imagery

DigitalCommons@USU

Performance and energy-efficient implementation of a smart city application on FPGAs

Author: Arslan Arif
Felipe A. Barrigon
Francesco Gregoretti
Javed Iqbal
Javier L. L. Segura
Liang Ma
Luciano Lavagno
Manuel Palomino
Mihai Teodor Lazarescu
Publication venue
Publication date: 01/01/2018
Field of study

The continuous growth of modern cities and the request for better quality of life, coupled with the increased availability of computing resources, lead to an increased attention to smart city services. Smart cities promise to deliver a better life to their inhabitants while simultaneously reducing resource requirements and pollution. They are thus perceived as a key enabler to sustainable growth. Out of many other issues, one of the major concerns for most cities in the world is traffic, which leads to a huge waste of time and energy, and to increased pollution. To optimize traffic in cities, one of the first steps is to get accurate information in real time about the traffic flows in the city. This can be achieved through the application of automated video analytics to the video streams provided by a set of cameras distributed throughout the city. Image sequence processing can be performed both peripherally and centrally. In this paper, we argue that, since centralized processing has several advantages in terms of availability, maintainability and cost, it is a very promising strategy to enable effective traffic management even in large cities. However, the computational costs are enormous, and thus require an energy-efficient High-Performance Computing approach. Field Programmable Gate Arrays (FPGAs) provide comparable computational resources to CPUs and GPUs, yet require much lower amounts of energy per operation (around 6

\times

and 10

\times

for the application considered in this case study). They are thus preferred resources to reduce both energy supply and cooling costs in the huge datacenters that will be needed by Smart Cities. In this paper, we describe efficient implementations of high-performance algorithms that can process traffic camera image sequences to provide traffic flow information in real-time at a low energy and power cost

Open Access Repository

Developments and experimental evaluation of partitioning algorithms for adaptive computing systems

Author: Kerkiz Nabil
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2000
Field of study

Multi-FPGA systems offer the potential to deliver higher performance solutions than traditional computers for some low-level computing tasks. This requires a flexible hardware substrate and an automated mapping system. CHAMPION is an automated mapping system for implementing image processing applications in multi-FPGA systems under development at the University of Tennessee. CHAMPION will map applications in the Khoros Cantata graphical programming environment to hardware. The work described in this dissertation involves the automation of the CHAMPION backend design flow, which includes the partitioning problem, netlist to structural VHDL conversion, synthesis and placement and routing, and host code generation. The primary goal is to investigate the development and evaluation of three different k-way partitioning approaches. In the first and the second approaches, we discuss the development and implementation of two existing algorithms. The first approach is a hierarchical partitioning method based on topological ordering (HP). The second approach is a recursive algorithm based on the Fiduccia and Mattheyses bipartitioning heuristic (RP). We extend these algorithms to handle the multiple constraints imposed by adaptive computing systems. We also introduce a new recursive partitioning method based on topological ordering and levelization (RPL). In addition to handling the partitioning constraints, the new approach efficiently addresses the problem of minimizing the number of FPGAs used and the amount of computation, thereby overcoming some of the weaknesses of the HP and RP algorithms

University of Tennessee, Knoxville: Trace

Can my chip behave like my brain?

Author: George Suma
Publication venue: Georgia Institute of Technology
Publication date: 27/05/2016
Field of study

Many decades ago, Carver Mead established the foundations of neuromorphic systems. Neuromorphic systems are analog circuits that emulate biology. These circuits utilize subthreshold dynamics of CMOS transistors to mimic the behavior of neurons. The objective is to not only simulate the human brain, but also to build useful applications using these bio-inspired circuits for ultra low power speech processing, image processing, and robotics. This can be achieved using reconfigurable hardware, like field programmable analog arrays (FPAAs), which enable configuring different applications on a cross platform system. As digital systems saturate in terms of power efficiency, this alternate approach has the potential to improve computational efficiency by approximately eight orders of magnitude. These systems, which include analog, digital, and neuromorphic elements combine to result in a very powerful reconfigurable processing machine.Ph.D

Scholarly Materials And Research @ Georgia Tech

Approaches for MATLAB Applications Acceleration Using High Performance Reconfigurable Computers

Author: Merchant Saumil Girish
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2003
Field of study

A lot of raw computing power is needed in many scientific computing applications and simulations. MATLAB®† is one of the popular choices as a language for technical computing. Presented here are approaches for MATLAB based applications acceleration using High Performance Reconfigurable Computing (HPRC) machines. Typically, these are a cluster of Von Neumann architecture based systems with none or more FPGA reconfigurable boards. As a case study, an Image Correlation Algorithm has been ported on this architecture platform. As a second case study, the recursive training process in an Artificial Neural Network (ANN) to realize an optimum network has been accelerated, by porting it to HPC Systems. The approaches taken are analyzed with respect to target scenarios, end users perspective, programming efficiency and performance. Disclaimer: Some material in this text has been used and reproduced with appropriate references and permissions where required. † MATLAB® is a registered trademark of The Mathworks, Inc. ©1994-2003

University of Tennessee, Knoxville: Trace