4,870 research outputs found

    CMOS-3D smart imager architectures for feature detection

    Get PDF
    This paper reports a multi-layered smart image sensor architecture for feature extraction based on detection of interest points. The architecture is conceived for 3-D integrated circuit technologies consisting of two layers (tiers) plus memory. The top tier includes sensing and processing circuitry aimed to perform Gaussian filtering and generate Gaussian pyramids in fully concurrent way. The circuitry in this tier operates in mixed-signal domain. It embeds in-pixel correlated double sampling, a switched-capacitor network for Gaussian pyramid generation, analog memories and a comparator for in-pixel analog-to-digital conversion. This tier can be further split into two for improved resolution; one containing the sensors and another containing a capacitor per sensor plus the mixed-signal processing circuitry. Regarding the bottom tier, it embeds digital circuitry entitled for the calculation of Harris, Hessian, and difference-of-Gaussian detectors. The overall system can hence be configured by the user to detect interest points by using the algorithm out of these three better suited to practical applications. The paper describes the different kind of algorithms featured and the circuitry employed at top and bottom tiers. The Gaussian pyramid is implemented with a switched-capacitor network in less than 50 μs, outperforming more conventional solutions.Xunta de Galicia 10PXIB206037PRMinisterio de Ciencia e Innovación TEC2009-12686, IPT-2011-1625-430000Office of Naval Research N00014111031

    Efficient DSP and Circuit Architectures for Massive MIMO: State-of-the-Art and Future Directions

    Full text link
    Massive MIMO is a compelling wireless access concept that relies on the use of an excess number of base-station antennas, relative to the number of active terminals. This technology is a main component of 5G New Radio (NR) and addresses all important requirements of future wireless standards: a great capacity increase, the support of many simultaneous users, and improvement in energy efficiency. Massive MIMO requires the simultaneous processing of signals from many antenna chains, and computational operations on large matrices. The complexity of the digital processing has been viewed as a fundamental obstacle to the feasibility of Massive MIMO in the past. Recent advances on system-algorithm-hardware co-design have led to extremely energy-efficient implementations. These exploit opportunities in deeply-scaled silicon technologies and perform partly distributed processing to cope with the bottlenecks encountered in the interconnection of many signals. For example, prototype ASIC implementations have demonstrated zero-forcing precoding in real time at a 55 mW power consumption (20 MHz bandwidth, 128 antennas, multiplexing of 8 terminals). Coarse and even error-prone digital processing in the antenna paths permits a reduction of consumption with a factor of 2 to 5. This article summarizes the fundamental technical contributions to efficient digital signal processing for Massive MIMO. The opportunities and constraints on operating on low-complexity RF and analog hardware chains are clarified. It illustrates how terminals can benefit from improved energy efficiency. The status of technology and real-life prototypes discussed. Open challenges and directions for future research are suggested.Comment: submitted to IEEE transactions on signal processin

    An Efficient Energy Aware Adaptive System-On-Chip Architecture For Real-Time Video Analytics

    Get PDF
    The video analytics applications which are mostly running on embedded devices have become prevalent in today’s life. This proliferation has necessitated the development of System-on-Chips (SoC) to perform utmost processing in a single chip rather than discrete components. Embedded vision is bounded by stringent requirements, namely real-time performance, limited energy, and Adaptivity to cope with the standards evolution. Additionally, to design such complex SoCs, particularly in Zynq All Programmable SoC, the traditional hardware/software codesign approaches, which rely on software profiling to perform the hardware/software partitioning, have fallen short of achieving this task because profiling cannot predict the performance of application on hardware, thus, a model that relates the application characteristics to the platform performance is inevitable. Delivering real-time performance for the fast-growing video resolutions while maintaining the architecture flexibility is non-viable on processors, Graphic Processing Unit, Digital Signal Processor, and Application Specific Integrated Circuit. Furthermore, with semiconductor technology scaling, increased power dissipation is expected; whereas, the battery capacity is not expected to increase significantly. A Performance model for Zynq is developed using analytical method and used in hardware/software codesign to facilitate algorithms mapping to hardware. Afterwards, an SoC for real-time video analytics is realized on Zynq using Harris corner detection algorithm. A careful analysis of the algorithm and efficient utilization of Zynq resources results in highly parallelized and pipelined architecture outperforms the state-of-the-art. Running on a developed energy-aware adaptive SoC and utilizing dynamic partial reconfiguration, a context-aware configuration scheduler adheres to operating context and trades off between video resolution and energy consumption to sustain the uttermost operation time while delivering real-time performance. A realtime corners detection at 79.8, 176.9, and 504.2 frame per second for HD1080, HD720, and VGA, respectively, is achieved which outperform the state-of-the-art for HD720 by 31 times and for VGA by 3.5 times. The scheduler configures, at run-time, the appropriate hardware that satisfies the operating context and user-defined constraints among the accelerators that are developed for HD1080, HD720, and VGA video standards. The self-adaptive method achieves 1.77 times longer operation time than a parametrized IP core for the same battery capacity, with negligible reconfiguration energy overhead. A marginal effect of reconfiguration time overhead is observed, for instance, only two video frames are dropped for HD1080p60 during the reconfiguration. Facilitating the design process by using analytical modeling, and the efficient utilization of Zynq resources along with self-adaptivity results in an efficient energyaware SoC that provides real-time performance for video analytics

    A visual sensor network for object recognition: Testbed realization

    Get PDF
    This work describes the implementation of an object recognition service on top of energy and resource-constrained hardware. A complete pipeline for object recognition based on the BRISK visual features is implemented on Intel Imote2 sensor devices. The reference implementation is used to assess the performance of the object recognition pipeline in terms of processing time and recognition accuracy

    A hardware accelerator for ORB-SLAM

    Get PDF
    Simultaneous Localization And Mapping (SLAM) is a key component of self-driving cars. We study ORB-SLAM, a SLAM state-of-the-art solution, and develop a hardware accelerator for a critical part of it: ORB feature extraction. The accelerator achieve 8x speedup and 2000x energy consumption reduction

    High-performance hardware accelerators for image processing in space applications

    Get PDF
    Mars is a hard place to reach. While there have been many notable success stories in getting probes to the Red Planet, the historical record is full of bad news. The success rate for actually landing on the Martian surface is even worse, roughly 30%. This low success rate must be mainly credited to the Mars environment characteristics. In the Mars atmosphere strong winds frequently breath. This phenomena usually modifies the lander descending trajectory diverging it from the target one. Moreover, the Mars surface is not the best place where performing a safe land. It is pitched by many and close craters and huge stones, and characterized by huge mountains and hills (e.g., Olympus Mons is 648 km in diameter and 27 km tall). For these reasons a mission failure due to a landing in huge craters, on big stones or on part of the surface characterized by a high slope is highly probable. In the last years, all space agencies have increased their research efforts in order to enhance the success rate of Mars missions. In particular, the two hottest research topics are: the active debris removal and the guided landing on Mars. The former aims at finding new methods to remove space debris exploiting unmanned spacecrafts. These must be able to autonomously: detect a debris, analyses it, in order to extract its characteristics in terms of weight, speed and dimension, and, eventually, rendezvous with it. In order to perform these tasks, the spacecraft must have high vision capabilities. In other words, it must be able to take pictures and process them with very complex image processing algorithms in order to detect, track and analyse the debris. The latter aims at increasing the landing point precision (i.e., landing ellipse) on Mars. Future space-missions will increasingly adopt Video Based Navigation systems to assist the entry, descent and landing (EDL) phase of space modules (e.g., spacecrafts), enhancing the precision of automatic EDL navigation systems. For instance, recent space exploration missions, e.g., Spirity, Oppurtunity, and Curiosity, made use of an EDL procedure aiming at following a fixed and precomputed descending trajectory to reach a precise landing point. This approach guarantees a maximum landing point precision of 20 km. By comparing this data with the Mars environment characteristics, it is possible to understand how the mission failure probability still remains really high. A very challenging problem is to design an autonomous-guided EDL system able to even more reduce the landing ellipse, guaranteeing to avoid the landing in dangerous area of Mars surface (e.g., huge craters or big stones) that could lead to the mission failure. The autonomous behaviour of the system is mandatory since a manual driven approach is not feasible due to the distance between Earth and Mars. Since this distance varies from 56 to 100 million of km approximately due to the orbit eccentricity, even if a signal transmission at the light speed could be possible, in the best case the transmission time would be around 31 minutes, exceeding so the overall duration of the EDL phase. In both applications, algorithms must guarantee self-adaptability to the environmental conditions. Since the Mars (and in general the space) harsh conditions are difficult to be predicted at design time, these algorithms must be able to automatically tune the internal parameters depending on the current conditions. Moreover, real-time performances are another key factor. Since a software implementation of these computational intensive tasks cannot reach the required performances, these algorithms must be accelerated via hardware. For this reasons, this thesis presents my research work done on advanced image processing algorithms for space applications and the associated hardware accelerators. My research activity has been focused on both the algorithm and their hardware implementations. Concerning the first aspect, I mainly focused my research effort to integrate self-adaptability features in the existing algorithms. While concerning the second, I studied and validated a methodology to efficiently develop, verify and validate hardware components aimed at accelerating video-based applications. This approach allowed me to develop and test high performance hardware accelerators that strongly overcome the performances of the actual state-of-the-art implementations. The thesis is organized in four main chapters. Chapter 2 starts with a brief introduction about the story of digital image processing. The main content of this chapter is the description of space missions in which digital image processing has a key role. A major effort has been spent on the missions in which my research activity has a substantial impact. In particular, for these missions, this chapter deeply analizes and evaluates the state-of-the-art approaches and algorithms. Chapter 3 analyzes and compares the two technologies used to implement high performances hardware accelerators, i.e., Application Specific Integrated Circuits (ASICs) and Field Programmable Gate Arrays (FPGAs). Thanks to this information the reader may understand the main reasons behind the decision of space agencies to exploit FPGAs instead of ASICs for high-performance hardware accelerators in space missions, even if FPGAs are more sensible to Single Event Upsets (i.e., transient error induced on hardware component by alpha particles and solar radiation in space). Moreover, this chapter deeply describes the three available space-grade FPGA technologies (i.e., One-time Programmable, Flash-based, and SRAM-based), and the main fault-mitigation techniques against SEUs that are mandatory for employing space-grade FPGAs in actual missions. Chapter 4 describes one of the main contribution of my research work: a library of high-performance hardware accelerators for image processing in space applications. The basic idea behind this library is to offer to designers a set of validated hardware components able to strongly speed up the basic image processing operations commonly used in an image processing chain. In other words, these components can be directly used as elementary building blocks to easily create a complex image processing system, without wasting time in the debug and validation phase. This library groups the proposed hardware accelerators in IP-core families. The components contained in a same family share the same provided functionality and input/output interface. This harmonization in the I/O interface enables to substitute, inside a complex image processing system, components of the same family without requiring modifications to the system communication infrastructure. In addition to the analysis of the internal architecture of the proposed components, another important aspect of this chapter is the methodology used to develop, verify and validate the proposed high performance image processing hardware accelerators. This methodology involves the usage of different programming and hardware description languages in order to support the designer from the algorithm modelling up to the hardware implementation and validation. Chapter 5 presents the proposed complex image processing systems. In particular, it exploits a set of actual case studies, associated with the most recent space agency needs, to show how the hardware accelerator components can be assembled to build a complex image processing system. In addition to the hardware accelerators contained in the library, the described complex system embeds innovative ad-hoc hardware components and software routines able to provide high performance and self-adaptable image processing functionalities. To prove the benefits of the proposed methodology, each case study is concluded with a comparison with the current state-of-the-art implementations, highlighting the benefits in terms of performances and self-adaptability to the environmental conditions

    SA-FEMIP: A Self-Adaptive Features Extractor and Matcher IP-Core Based on Partially Reconfigurable FPGAs for Space Applications

    Get PDF
    Video-based navigation (VBN) is increasingly used in space applications to enable autonomous entry, descent, and landing of aircrafts. VBN algorithms require real-time performances and high computational capabilities, especially to perform features extraction and matching (FEM). In this context, field-programmable gate arrays (FPGAs) can be employed as efficient hardware accelerators. This paper proposes an improved FPGA-based FEM module. Online self-adaptation of the parameters of both the image noise filter and the features extraction algorithm is adopted to improve the algorithm robustness. Experimental results demonstrate the effectiveness of the proposed self-adaptive module. It introduces a marginal resource overhead and no timing performance degradation when compared with the reference state-of-the-art architecture
    corecore