Search CORE

616 research outputs found

Recommended from our members

Parallel pipelined histogram architectures

Author: J. Cadenas
Muller
P. Huerta
R.S. Sherratt
Shahbahrami
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2011
Field of study

Proposed is a unique cell histogram architecture which will process k data items in parallel to compute 2q histogram bins per time step. An array of m/2q cells computes an m-bin histogram with a speed-up factor of k; k ⩾ 2 makes it faster than current dual-ported memory implementations. Furthermore, simple mechanisms for conflict-free storing of the histogram bins into an external memory array are discussed

Central Archive at the University of Reading

Crossref

Kent Academic Repository

High-performance hardware accelerators for image processing in space applications

Author: Rolfo Daniele
Publication venue: Politecnico di Torino
Publication date: 01/01/2015
Field of study

Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable. In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars. The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris. The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high. A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase. In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions. Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware. For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations. The thesis is organized in four main chapters. Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms. Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions. Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation. Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

Author: Ngo Hau Trung
Publication venue: ODU Digital Commons
Publication date: 01/01/2007
Field of study

The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption

Old Dominion University

Neuromorphic deep convolutional neural network learning systems for FPGA in real time

Author: Tapiador Morales Ricardo
Publication venue
Publication date: 13/12/2019
Field of study

Deep Learning algorithms have become one of the best approaches for pattern recognition in several fields, including computer vision, speech recognition, natural language processing, and audio recognition, among others. In image vision, convolutional neural networks stand out, due to their relatively simple supervised training and their efficiency extracting features from a scene. Nowadays, there exist several implementations of convolutional neural networks accelerators that manage to perform these networks in real time. However, the number of operations and power consumption of these implementations can be reduced using a different processing paradigm as neuromorphic engineering. Neuromorphic engineering field studies the behavior of biological and inner systems of the human neural processing with the purpose of design analog, digital or mixed-signal systems to solve problems inspired in how human brain performs complex tasks, replicating the behavior and properties of biological neurons. Neuromorphic engineering tries to give an answer to how our brain is capable to learn and perform complex tasks with high efficiency under the paradigm of spike-based computation. This thesis explores both frame-based and spike-based processing paradigms for the development of hardware architectures for visual pattern recognition based on convolutional neural networks. In this work, two FPGA implementations of convolutional neural networks accelerator architectures for frame-based using OpenCL and SoC technologies are presented. Followed by a novel neuromorphic convolution processor for spike-based processing paradigm, which implements the same behaviour of leaky integrate-and-fire neuron model. Furthermore, it reads the data in rows being able to perform multiple layers in the same chip. Finally, a novel FPGA implementation of Hierarchy of Time Surfaces algorithm and a new memory model for spike-based systems are proposed

idUS. Depósito de Investigación Universidad de Sevilla

Intelligent Wireless Sensor Nodes for Human Footstep Sound Classification for Security Application

Author: Chakrabarti Indrajit
Duggisetty Divya Lakshmi
Mukhopadhyay Anand Kumar
Prabhakar Naligala Moses
Sharad Mrigank
Publication venue
Publication date: 23/12/2019
Field of study

Sensor nodes present in a wireless sensor network (WSN) for security surveillance applications should preferably be small, energy-efficient and inexpensive with on-sensor computational abilities. An appropriate data processing scheme in the sensor node can help in reducing the power dissipation of the transceiver through compression of information to be communicated. In this paper, authors have attempted a simulation-based study of human footstep sound classification in natural surroundings using simple time-domain features. We used a spiking neural network (SNN), a computationally low weight classifier, derived from an artificial neural network (ANN), for classification. A classification accuracy greater than 85% is achieved using an SNN, degradation of ~5% as compared to ANN. The SNN scheme, along with the required feature extraction scheme, can be amenable to low power sub-threshold analog implementation. Results show that all analog implementation of the proposed SNN scheme can achieve significant power savings over the digital implementation of the same computing scheme and also over other conventional digital architectures using frequency-domain feature extraction and ANN-based classification.Comment: 12 pages, Journa

arXiv.org e-Print Archive

Neuromorphic audio processing through real-time embedded spiking neural networks.

Author: Domínguez Morales Juan Pedro
Publication venue
Publication date: 03/12/2018
Field of study

In this work novel speech recognition and audio processing systems based on a spiking artificial cochlea and neural networks are proposed and implemented. First, the biological behavior of the animal’s auditory system is analyzed and studied, along with the classical mechanisms of audio signal processing for sound classification, including Deep Learning techniques. Based on these studies, novel audio processing and automatic audio signal recognition systems are proposed, using a bio-inspired auditory sensor as input. A desktop software tool called NAVIS (Neuromorphic Auditory VIsualizer) for post-processing the information obtained from spiking cochleae was implemented, allowing to analyze these data for further research. Next, using a 4-chip SpiNNaker hardware platform and Spiking Neural Networks, a system is proposed for classifying different time-independent audio signals, making use of a Neuromorphic Auditory Sensor and frequency studies obtained with NAVIS. To prove the robustness and analyze the limitations of the system, the input audios were disturbed, simulating extreme noisy environments. Deep Learning mechanisms, particularly Convolutional Neural Networks, are trained and used to differentiate between healthy persons and pathological patients by detecting murmurs from heart recordings after integrating the spike information from the signals using a neuromorphic auditory sensor. Finally, a similar approach is used to train Spiking Convolutional Neural Networks for speech recognition tasks. A novel SCNN architecture for timedependent signals classification is proposed, using a buffered layer that adapts the information from a real-time input domain to a static domain. The system was deployed on a 48-chip SpiNNaker platform. Finally, the performance and efficiency of these systems were evaluated, obtaining conclusions and proposing improvements for future works.Premio Extraordinario de Doctorado U

idUS. Depósito de Investigación Universidad de Sevilla

Formal and Informal Methods for Multi-Core Design Space Exploration

Author: Kempf Jean-Francois
Lebeltel Olivier
Maler Oded
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2014
Field of study

We propose a tool-supported methodology for design-space exploration for embedded systems. It provides means to define high-level models of applications and multi-processor architectures and evaluate the performance of different deployment (mapping, scheduling) strategies while taking uncertainty into account. We argue that this extension of the scope of formal verification is important for the viability of the domain.Comment: In Proceedings QAPL 2014, arXiv:1406.156

arXiv.org e-Print Archive

Directory of Open Access Journals

Recommended from our members

Nanowire nanocomputer as a finite-state machine

Author: Das Shamik
Ellenbogen James C.
Klemic James F.
Lieber Charles M.
Yan Hao
Yao Jun
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 05/03/2014
Field of study

Implementation of complex computer circuits assembled from the bottom up and integrated on the nanometer scale has long been a goal of electronics research. It requires a design and fabrication strategy that can address individual nanometer-scale electronic devices, while enabling large-scale assembly of those devices into highly-organized, integrated computational circuits. We describe how such a strategy has led to the design, construction, and demonstration of a nanoelectronic finite-state machine (nanoFSM). The system was fabricated using a design- oriented approach enabled by a deterministic, bottom-up assembly process that does not require individual nanowire registration. This methodology allowed construction of the nanoFSM through modular design employing a multi-tile architecture. Each tile/module consists of two interconnected crossbar nanowire arrays, with each cross-point consisting of a programmable nanowire transistor node. The nanoFSM integrates 180 programmable nanowire transistor nodes in three tiles or six total crossbar arrays, and incorporates both sequential and arithmetic logic, with extensive inter-tile and intra-tile communication that exhibits rigorous input/output (I/O) matching. Our system realizes the complete 2-bit logic flow and clocked control over state registration that are required for a FSM or computer. The programmable multi-tile circuit was also re-programmed to a functionally-distinct 2-bit full adder with 32-set matched and complete logic output. These steps forward and the ability of our new design-oriented deterministic methodology to yield more extensive multi-tile systems, suggest that proposed general-purpose nanocomputers can be realized in the near future.Chemistry and Chemical BiologyEngineering and Applied Science

Harvard University - DASH

Time-to-digital converters and histogram builders in SPAD arrays for pulsed-LiDAR

Author: Incoronato A.
Madonini F.
Sesta V.
Villa F.
Publication venue
Publication date: 01/01/2023
Field of study

Light Detection and Ranging (LiDAR) is a 3D imaging technique widely used in many applications such as augmented reality, automotive, machine vision, spacecraft navigation and landing. Pulsed-LiDAR is one of the most diffused LiDAR techniques which relies on the measurement of the round-trip travel time of an optical pulse back-scattered from a distant target. Besides the light source and the detector, Time-to-Digital Converters (TDCs) are fundamental components in pulsed-LiDAR systems, since they allow to measure the back-scattered photon arrival times and their performance directly impact on LiDAR system requirements (i.e., range, precision, and measurements rate). In this work, we present a review of recent TDC architectures suitable to be integrated in SPAD-based CMOS arrays and a review of data processing solutions to derive the TOF information. Furthermore, main TDC parameters and processing techniques are described and analyzed considering pulsed-LiDAR requirements

Archivio istituzionale della ricerca - Politecnico di Milano

A novel algorithm and hardware architecture for fast video-based shape reconstruction of space debris

Author: Di Carlo S.
Prinetto P.
Rolfo D.
Sansonne N.
Trotta P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In order to enable the non-cooperative rendezvous, capture, and removal of large space debris, automatic recognition of the target is needed. Video-based techniques are the most suitable in the strict context of space missions, where low-energy consumption is fundamental, and sensors should be passive in order to avoid any possible damage to external objects as well as to the chaser satellite. This paper presents a novel fast shape-from-shading (SfS) algorithm and a field-programmable gate array (FPGA)-based system hardware architecture for video-based shape reconstruction of space debris. The FPGA-based architecture, equipped with a pair of cameras, includes a fast image pre-processing module, a core implementing a feature-based stereo-vision approach, and a processor that executes the novel SfS algorithm. Experimental results show the limited amount of logic resources needed to implement the proposed architecture, and the timing improvements with respect to other state-of-the-art SfS methods. The remaining resources available in the FPGA device can be exploited to integrate other vision-based techniques to improve the comprehension of debris model, allowing a fast evaluation of associated kinematics in order to select the most appropriate approach for capture of the target space debris

Crossref

Springer - Publisher Connector

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino