162 research outputs found
Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device
Currently, most designers face a daunting task to
research different design flows and learn the intricacies of
specific software from various manufacturers in
hardware/software co-design. An urgent need of creating a
scalable hardware/software co-design platform has become a key
strategic element for developing hardware/software integrated
systems. In this paper, we propose a new design flow for building
a scalable co-design platform on FPGA-based system-on-chip.
We employ an integrated approach to implement a histogram
oriented gradients (HOG) and a support vector machine (SVM)
classification on a programmable device for pedestrian tracking.
Not only was hardware resource analysis reported, but the
precision and success rates of pedestrian tracking on nine open
access image data sets are also analysed. Finally, our proposed
design flow can be used for any real-time image processingrelated
products on programmable ZYNQ-based embedded
systems, which benefits from a reduced design time and provide a
scalable solution for embedded image processing products
An Optimized and Fast Scheme for Real-time Human Detection using Raspberry Pi
This paper has been presented at : The International Conference on Digital Image Computing: Techniques and Applications (DICTA 2016)Real-time human detection is a challenging task due to appearance variance, occlusion and rapidly changing content; therefore it requires efficient hardware and optimized software. This paper presents a real-time human detection scheme on a Raspberry Pi. An efficient algorithm for human detection is proposed by processing regions of interest (ROI) based upon foreground estimation. Different number of scales have been considered for computing Histogram of Oriented Gradients (HOG) features for the selected ROI. Support vector machine (SVM) is employed for classification of HOG feature vectors into detected and non-detected human regions. Detected human regions are further filtered by analyzing the area of overlapping regions. Considering the limited capabilities of Raspberry Pi, the proposed scheme is evaluated using six different testing schemes on Town Centre and CAVIAR datasets. Out of these six testing schemes, Single Window with two Scales (SW2S) processes 3 frames per second with acceptable less accuracy than the original HOG. The proposed algorithm is about 8 times faster than the original multi-scale HOG and recommended to be used for real-time human detection on a Raspberry Pi
Low-power pedestrian detection system on FPGA
Pedestrian detection is one of the key problems in the emerging self-driving car industry. In addition, the Histogram of Gradients (HOG) algorithm proved to provide good accuracy for pedestrian detection. Many research works focused on accelerating HOG algorithm on FPGA(Field-Programmable Gate Array) due to its low-power and high-throughput characteristics. In this paper, we present an energy-efficient HOG-based implementation for pedestrian detection system on a low-cost FPGA system-on-chip platform. The hardware accelerator implements the HOG computation and the Support Vector Machine classifier, the rest of the algorithm is mapped to software in the embedded processor. The hardware runs at 50 Mhz (lower frequency than previous works), thus achieving the best pixels processed per clock and the lower power design
Recommended from our members
Computer Vision System-On-Chip Designs for Intelligent Vehicles
Intelligent vehicle technologies are growing rapidly that can enhance road safety, improve transport efficiency, and aid driver operations through sensors and intelligence. Advanced driver assistance system (ADAS) is a common platform of intelligent vehicle technologies. Many sensors like LiDAR, radar, cameras have been deployed on intelligent vehicles. Among these sensors, optical cameras are most widely used due to their low costs and easy installation. However, most computer vision algorithms are complicated and computationally slow, making them difficult to be deployed on power constraint systems. This dissertation investigates several mainstream ADAS applications, and proposes corresponding efficient digital circuits implementations for these applications. This dissertation presents three ways of software / hardware algorithm division for three ADAS applications: lane detection, traffic sign classification, and traffic light detection. Using FPGA to offload critical parts of the algorithm, the entire computer vision system is able to run in real time while maintaining a low power consumption and a high detection rate. Catching up with the advent of deep learning in the field of computer vision, we also present two deep learning based hardware implementations on application specific integrated circuits (ASIC) to achieve even lower power consumption and higher accuracy.
The real time lane detection system is implemented on Xilinx Zynq platform, which has a dual core ARM processor and FPGA fabric. The Xilinx Zynq platform integrates the software programmability of an ARM processor with the hardware programmability of an FPGA. For the lane detection task, the FPGA handles the majority of the task: region-of-interest extraction, edge detection, image binarization, and hough transform. After then, the ARM processor takes in hough transform results and highlights lanes using the hough peaks algorithm. The entire system is able to process 1080P video stream at a constant speed of 69.4 frames per second, realizing real time capability.
An efficient system-on-chip (SOC) design which classifies up to 48 traffic signs in real time is presented in this dissertation. The traditional histogram of oriented gradients (HoG) and support vector machine (SVM) are proven to be very effective on traffic sign classification with an average accuracy rate of 93.77%. For traffic sign classification, the biggest challenge comes from the low execution efficiency of the HoG on embedded processors. By dividing the HoG algorithm into three fully pipelined stages, as well as leveraging extra on-chip memory to store intermediate results, we successfully achieved a throughput of 115.7 frames per second at 1080P resolution. The proposed generic HoG hardware implementation could also be used as an individual IP core by other computer vision systems.
A real time traffic signal detection system is implemented to present an efficient hardware implementation of the traditional grass-fire blob detection. The traditional grass-fire blob detection method iterates the input image multiple times to calculate connected blobs. In digital circuits, five extra on-chip block memories are utilized to save intermediate results. By using additional memories, all connected blob information could be obtained through one-pass image traverse. The proposed hardware friendly blob detection can run at 72.4 frames per second with 1080P video input. Applying HoG + SVM as feature extractor and classifier, 92.11% recall rate and 99.29% precision rate are obtained on red lights, and 94.44% recall rate and 98.27% precision rate on green lights.
Nowadays, convolutional neural network (CNN) is revolutionizing computer vision due to learnable layer by layer feature extraction. However, when coming into inference, CNNs are usually slow to train and slow to execute. In this dissertation, we studied the implementation of principal component analysis based network (PCANet), which strikes a balance between algorithm robustness and computational complexity. Compared to a regular CNN, the PCANet only needs one iteration training, and typically at most has a few tens convolutions on a single layer. Compared to hand-crafted features extraction methods, the PCANet algorithm well reflects the variance in the training dataset and can better adapt to difficult conditions. The PCANet algorithm achieves accuracy rates of 96.8% and 93.1% on road marking detection and traffic light detection, respectively. Implementing in Synopsys 32nm process technology, the proposed chip can classify 724,743 32-by-32 image candidates in one second, with only 0.5 watt power consumption.
In this dissertation, binary neural network (BNN) is adopted as a potential detector for intelligent vehicles. The BNN constrains all activations and weights to be +1 or -1. Compared to a CNN with the same network configuration, the BNN achieves 50 times better resource usage with only 1% - 2% accuracy loss. Taking car detection and pedestrian detection as examples, the BNN achieves an average accuracy rate of over 95%. Furthermore, a BNN accelerator implemented in Synopsys 32nm process technology is presented in our work. The elastic architecture of the BNN accelerator makes it able to process any number of convolutional layers with high throughput. The BNN accelerator only consumes 0.6 watt and doesn\u27t rely on external memory for storage
Real-time pedestrian recognition on low computational resources
Pedestrian recognition has successfully been applied to security, autonomous
cars, Aerial photographs. For most applications, pedestrian recognition on
small mobile devices is important. However, the limitations of the computing
hardware make this a challenging task. In this work, we investigate real-time
pedestrian recognition on small physical-size computers with low computational
resources for faster speed. This paper presents three methods that work on the
small physical size CPUs system. First, we improved the Local Binary Pattern
(LBP) features and Adaboost classifier. Second, we optimized the Histogram of
Oriented Gradients (HOG) and Support Vector Machine. Third, We implemented fast
Convolutional Neural Networks (CNNs). The results demonstrate that the three
methods achieved real-time pedestrian recognition at an accuracy of more than
95% and a speed of more than 5 fps on a small physical size computational
platform with a 1.8 GHz Intel i5 CPU. Our methods can be easily applied to
small mobile devices with high compatibility and generality
Hardware accelerated real-time Linux video anonymizer
Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresOs Sistemas Embebidos estão presentes atualmente numa variada gama de equipamentos do
quotidiano do ser humano. Desde TV-boxes, televisões, routers até ao indispensável telemóvel.
O Sistema Operativo Linux, com a sua filosofia de distribuição ”one-size-fits-all” tornou-se
uma alternativa viável, fornecendo um vasto suporte de hardware, técnicas de depuração, suporte
dos protocolos de comunicação de rede, entre outros serviços, que se tornaram no conjunto
standard de requisitos na maioria dos sistemas embebidos atuais.
Este sistema operativo torna-se apelativo pela sua filosofia open-source que disponibiliza ao
utilizador um vasto conjunto de bibliotecas de software que possibilitam o desenvolvimento num
determinado domínio com maior celeridade e facilidade de integração de software complexo.
Os algoritmos deMachine Learning são desenvolvidos para a automização de tarefas e estão
presentes nas mais variadas tecnologias, desde o sistema de foco de imagem nosmartphone até
ao sistema de deteção dos limites de faixa de rodagem de um sistema de condução autónoma.
Estes são algoritmos que quando compilados para as plataformas de sistemas embebidos,
resultam num esforço de processamento e de consumo de recursos, como o footprint de memória,
que na maior parte dos casos supera em larga escala o conjunto de recursos disponíveis para a
aplicação do sistema, sendo necessária a implementação de componentes que requerem maior
poder de processamento através de elementos de hardware para garantir que as métricas tem porais sejam satisfeitas.
Esta dissertação propõe-se, por isso, à criação de um sistema de anonimização de vídeo
que adquire, processa e manipula as frames, com o intuito de garantir o anonimato, mesmo na
transmissão.
A sua implementação inclui técnicas de Deteção de Objectos, fazendo uso da combinação
das tecnologias de aceleração por hardware: paralelização e execução em hardware especial izado. É proposta então uma implementação restringida tanto temporalmente como no consumo
de recursos ao nível do hardware e software.Embedded Systems are currently present in a wide range of everyday equipment. From TV-boxes,
televisions and routers to the indispensable smartphone.
Linux Operating System, with its ”one-size-fits-all” distribution philosophy, has become a
viable alternative, providing extensive support for hardware, debugging techniques, network com munication protocols, among other functionalities, which have become the standard set of re quirements in most modern embedded systems.
This operating system is appealing due to its open-source philosophy, which provides the
user with a vast set of software libraries that enable development in a given domain with greater
speed and ease the integration of complex software.
Machine Learning algorithms are developed to execute tasks autonomously, i.e., without
human supervision, and are present in the most varied technologies, from the image focus system
on the smartphone to the detection system of the lane limits of an autonomous driving system.
These are algorithms that, when compiled for embedded systems platforms, require an ef fort to process and consume resources, such as the memory footprint, which in most cases far
outweighs the set of resources available for the application of the system, requiring the imple mentation of components that need greater processing power through elements of hardware to
ensure that the time metrics are satisfied.
This dissertation proposes the creation of a video anonymization system that acquires, pro cesses, and manipulates the frames, in order to guarantee anonymity, even during the transmis sion.
Its implementation includes Object Detection techniques, making use of the combination
of hardware acceleration technologies: parallelization and execution in specialized hardware.
An implementation is then proposed, restricted both in time and in resource consumption at
hardware and software levels
Histogram of oriented gradients front end processing: an FPGA based processor approach
The Field Programmable Gate Array (FPGA) implementation of the commonly used Histogram of Oriented Gradients (HOG) algorithm is explored. The HOG algorithm is employed to extract features for object detection. A key focus has been to explore the use of a new FPGA-based processor which has been targeted at image processing. The paper gives details of the mapping and scheduling factors that influence the performance and the stages that were undertaken to allow the algorithm to be deployed on FPGA hardware, whilst taking into account the specific IPPro architecture features. We show that multi-core IPPro performance can exceed that of against state-of-the-art FPGA designs by up to 3.2 times with reduced design and implementation effort and increased flexibility all on a low cost, Zynq programmable system
Real-Time Robot Vision on Low-Performance Computing Hardware
Small robots have numerous interesting applications in domains like industry, education, scientific research, and services. For most applications vision is important, however, the limitations of the computing hardware make this a challenging task. In this paper, we address the problem of real-time object recognition and propose the Fast Regions of Interest Search (FROIS) algorithm to quickly find the ROIs of the objects in small robots with low-performance hardware. Subsequently, we use two methods to analyze the ROIs. First, we develop a Convolutional Neural Network on a desktop and deploy it onto the low-performance hardware for object recognition. Second, we adopt the Histogram of Oriented Gradients descriptor and linear Support Vector Machines classifier and optimize the HOG component for faster speed. The experimental results show that the methods work well on our small robots with Raspberry Pi 3 embedded 1.2 GHz ARM CPUs to recognize the objects. Furthermore, we obtain valuable insights about the trade-offs between speed and accuracy
Entwicklung von Objekterkennungssystemen für SoC-FPGA Plattformen
Object detection is a crucial and challenging task in the field of embedded vision and robotics. Over the last 15 years, various object detection algorithms/systems (e.g., face detection, traffic sign detection) are proposed by different researchers and companies. Most of these research are either focused on improving the detection performance or devoted to boosting the detection speed, which reveals two typically employed criteria while evaluating an object detection algorithm/system: detection accuracy and execution speed. Considering these two factors and the application of object detection to the domains such as service robots and Advanced Driving Assistance System (ADAS), FPGA is a promising platform to achieve accurate object detection in real-time. Therefore, FPGA-based robust object detection systems are designed in this work.
The main work of this thesis can be divided into two parts: promising algorithm obtaining and hardware design on SoC-FPGA. Firstly, representative object detection algorithms are selected, implemented, and evaluated. Thereafter, a generalized object detection framework is created. With this framework, pedestrian detection, traffic sign detection, and head detection algorithms are realized and tested. The experiments verify that promising detection results can be obtained by employing the generalized object detection framework. For the work of hardware design on FPGA, the platform of the object detection system, which consists of stereo OV7670 cameras, Xilinx Zedboard, and a monitor that can visualize the detection results, is created. After that, IP cores that correspond to each block of the framework are designed. Configurable parameters are provided by each IP core so that the IPs, especially feature calculation IP and feature scaler IP, can be correctly instanced according to the fast feature pyramid theory. Finally, by employing the designed IP cores, pedestrian detection system, traffic sign detection system, and head detection system are designed and evaluated. The on-board testing results show that real-time object (e.g., pedestrian, traffic sign, head) detection with promising accuracy can all be achieved. In addition, with the generalized object detection framework and the designed IP-toolbox, the object detection system that targets any instance of objects can be designed and implemented rapidly.Objekterkennung ist eine essenzielle und herausfordernde Aufgabe in den Forschungsgebieten der Embedded Vision und der Robotik. In den letzten 15 Jahren wurden verschiedene Objekterkennungsalgorithmen und Systeme (z.B. Gesichtserkennung) aus der Forschung sowie der Industrie präsentiert. Der Forschungsschwerpunkt liegt dabei typischerweise entweder bei der Verbesserung der Erkennungsqualität oder bei der Beschleunigung der Erkennungsgeschwindigkeit. Hieraus leiten sich direkt die Kriterien für den Einsatz von Objekterkennungsalgorithmen und Systemen ab: Erkennungsgenauigkeit und die Erkennungslatenz. Unter Berücksichtigung dieser Kriterien und dem konkreten Einsatzgebiet der Objekterkennung für Serviceroboter und ADAS haben sich FPGA Plattformen als vielversprechende Kandidaten für die Implementierung von hoch genauen Objekterkennungsalgorithmen mit strikten Echtzeitanforderungen herausgestellt. Auf Bases dessen wurden im Rahmen dieser Arbeit eine robuste Objekterkennung entwickelt.
Der Fokus dieser Arbeit ist geteilt in zwei Aspekte: Identifikation und Analyse geeigneter Objekterkennungsalgorithmen und deren Implementierung in einer FPGA basierten SoC-Plattform. Zu Beginn wurden eine Reihe von repräsentativen Algorithmen ausgewählt, in einer Testumgebung implementiert und evaluiert. Darauf aufbauend wurde ein Entwicklungs-Frameworks für die Erkennung von Passanten, Verkehrszeichen sowie Kopfdetektion entwickelt und analysiert. Für die Entwicklung einer FPGA basierten Plattform wurde ein Objekterkennungssystem erstellt. Eine Sammlung aus IP-Cores, die das Entwicklungs-Framework bilden, wurden für die FPGA Plattform implementiert. Die IP-Cores bieten eine Reihe von Konfigurationsparametern für den flexiblen Einsatz von neuen Komponenten, im Besonderen IP-Cores für die Berechnung und Skalierung von Erkennungseigenschaften, basierend auf der Fast Feature Pyramid Theorie. Abschliessend wurden die entwickelten IP-Cores zur Erkennung von Passanten, Verkehrszeichen sowie die Kopfdetektion integriert und gemeinsam evaluiert. Die gesammelten Ergebnisse aus verschiedenen Testscenarios der entwickelten FPGA-Plattform zeigen, dass die Objekterkennung in Echtzeit mit vielversprechender Genauigkeit erreichbar ist. Darüber hinaus können mit dem generalisierten Objekterkennungsrahmen und der entwickelten IP-Toolbox schnell und flexibel belibige Objekterkennungssysteme entwickelt und implementiert werden
- …