342,162 research outputs found

    Real-time human action recognition on an embedded, reconfigurable video processing architecture

    Get PDF
    Copyright @ 2008 Springer-Verlag.In recent years, automatic human motion recognition has been widely researched within the computer vision and image processing communities. Here we propose a real-time embedded vision solution for human motion recognition implemented on a ubiquitous device. There are three main contributions in this paper. Firstly, we have developed a fast human motion recognition system with simple motion features and a linear Support Vector Machine (SVM) classifier. The method has been tested on a large, public human action dataset and achieved competitive performance for the temporal template (eg. “motion history image”) class of approaches. Secondly, we have developed a reconfigurable, FPGA based video processing architecture. One advantage of this architecture is that the system processing performance can be reconfiured for a particular application, with the addition of new or replicated processing cores. Finally, we have successfully implemented a human motion recognition system on this reconfigurable architecture. With a small number of human actions (hand gestures), this stand-alone system is performing reliably, with an 80% average recognition rate using limited training data. This type of system has applications in security systems, man-machine communications and intelligent environments.DTI and Broadcom Ltd

    FPGA implementation of real-time human motion recognition on a reconfigurable video processing architecture

    Get PDF
    In recent years, automatic human motion recognition has been widely researched within the computer vision and image processing communities. Here we propose a real-time embedded vision solution for human motion recognition implemented on a ubiquitous device. There are three main contributions in this paper. Firstly, we have developed a fast human motion recognition system with simple motion features and a linear Support Vector Machine(SVM) classifier. The method has been tested on a large, public human action dataset and achieved competitive performance for the temporal template (eg. ``motion history image") class of approaches. Secondly, we have developed a reconfigurable, FPGA based video processing architecture. One advantage of this architecture is that the system processing performance can be reconfigured for a particular application, with the addition of new or replicated processing cores. Finally, we have successfully implemented a human motion recognition system on this reconfigurable architecture. With a small number of human actions (hand gestures), this stand-alone system is performing reliably, with an 80% average recognition rate using limited training data. This type of system has applications in security systems, man-machine communications and intelligent environments

    Coagulation time detection by means of a real-time image processing

    Get PDF
    Several techniques for semi-automatic or automatic detection of coagulation time in blood or in plasma analysis are available in the literature. However, these techniques are either complex and demand for specialized equipment, or allow the analysis of very few samples in parallel. In this paper a new system based on computer vision is presented. An easy image processing algorithm has been developed, which leads to an accurate estimation of the coagulation time of several samples in parallel. The estimation can be performed in real time using transputer architecture supported by a PC.Peer ReviewedPostprint (published version

    Dynamically reconfigurable architecture for embedded computer vision systems

    Get PDF
    The objective of this research work is to design, develop and implement a new architecture which integrates on the same chip all the processing levels of a complete Computer Vision system, so that the execution is efficient without compromising the power consumption while keeping a reduced cost. For this purpose, an analysis and classification of different mathematical operations and algorithms commonly used in Computer Vision are carried out, as well as a in-depth review of the image processing capabilities of current-generation hardware devices. This permits to determine the requirements and the key aspects for an efficient architecture. A representative set of algorithms is employed as benchmark to evaluate the proposed architecture, which is implemented on an FPGA-based system-on-chip. Finally, the prototype is compared to other related approaches in order to determine its advantages and weaknesses

    AIDI: An adaptive image denoising FPGA-based IP-core for real-time applications

    Get PDF
    The presence of noise in images can significantly impact the performances of digital image processing and computer vision algorithms. Thus, it should be removed to improve the robustness of the entire processing flow. The noise estimation in an image is also a key factor, since, to be more effective, algorithms and denoising filters should be tuned to the actual level of noise. Moreover, the complexity of these algorithms brings a new challenge in real-time image processing applications, requiring high computing capacity. In this context, hardware acceleration is crucial, and Field Programmable Gate Arrays (FPGAs) best fit the growing demand of computational capabilities. This paper presents an Adaptive Image Denoising IP-core (AIDI) for real-time applications. The core first estimates the level of noise in the input image, then applies an adaptive Gaussian smoothing filter to remove the estimated noise. The filtering parameters are computed on-the-fly, adapting them to the level of noise in the image, and pixel by pixel, to preserve image information (e.g., edges or corners). The FPGA-based architecture is presented, highlighting its improvements w.r.t. a standard static filtering approac

    Semantic Image Segmentation via Deep Parsing Network

    Full text link
    This paper addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts. Unlike previous works that optimized MRFs using iterative algorithm, we solve MRF by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass. Specifically, DPN extends a contemporary CNN architecture to model unary terms and additional layers are carefully devised to approximate the mean field algorithm (MF) for pairwise terms. It has several appealing properties. First, different from the recent works that combined CNN and MRF, where many iterations of MF were required for each training image during back-propagation, DPN is able to achieve high performance by approximating one iteration of MF. Second, DPN represents various types of pairwise terms, making many existing works as its special cases. Third, DPN makes MF easier to be parallelized and speeded up in Graphical Processing Unit (GPU). DPN is thoroughly evaluated on the PASCAL VOC 2012 dataset, where a single DPN model yields a new state-of-the-art segmentation accuracy.Comment: To appear in International Conference on Computer Vision (ICCV) 201

    Extending SSL patches spatial relations in Vision Transformers for object detection and instance segmentation tasks

    Get PDF
    Vision Transformer (ViT) architecture has become a de-facto standard in computer vision, achieving state-of-the-art performances in various tasks. This popularity is given by a remarkable computational efficiency and its global processing self-attention mechanism. However, in contrast with convolutional neural networks (CNNs), ViTs require large amounts of data to improve their generalization ability. In particular, for small datasets, their lack of inductive bias (i.e. translational equivariance, locality) can lead to poor results. To overcome the issue, SSL techniques based on the understanding of spatial relations among image patches without human annotations (e.g. positions, angles and euclidean distances) are extremely useful and easy to integrate in ViTs architecture. The correspondent model, dubbed RelViT, showed to improve overall image classification accuracy, optimizing tokens encoding and providing new visual representation of the data. This work proves the effectiveness of SSL strategies also for object detection and instance segmentation tasks. RelViT outperforms standard ViT architecture on multiple datasets in the majority of the related benchmarking metrics. In particular, testing on a small subset of COCO, results showed a gain of +2.70%, +2.20% in mAP for image segmentation and object detection respectively.Vision Transformer (ViT) architecture has become a de-facto standard in computer vision, achieving state-of-the-art performances in various tasks. This popularity is given by a remarkable computational efficiency and its global processing self-attention mechanism. However, in contrast with convolutional neural networks (CNNs), ViTs require large amounts of data to improve their generalization ability. In particular, for small datasets, their lack of inductive bias (i.e. translational equivariance, locality) can lead to poor results. To overcome the issue, SSL techniques based on the understanding of spatial relations among image patches without human annotations (e.g. positions, angles and euclidean distances) are extremely useful and easy to integrate in ViTs architecture. The correspondent model, dubbed RelViT, showed to improve overall image classification accuracy, optimizing tokens encoding and providing new visual representation of the data. This work proves the effectiveness of SSL strategies also for object detection and instance segmentation tasks. RelViT outperforms standard ViT architecture on multiple datasets in the majority of the related benchmarking metrics. In particular, testing on a small subset of COCO, results showed a gain of +2.70%, +2.20% in mAP for image segmentation and object detection respectively

    Deep interpretable architecture for plant diseases classification

    Full text link
    Recently, many works have been inspired by the success of deep learning in computer vision for plant diseases classification. Unfortunately, these end-to-end deep classifiers lack transparency which can limit their adoption in practice. In this paper, we propose a new trainable visualization method for plant diseases classification based on a Convolutional Neural Network (CNN) architecture composed of two deep classifiers. The first one is named Teacher and the second one Student. This architecture leverages the multitask learning to train the Teacher and the Student jointly. Then, the communicated representation between the Teacher and the Student is used as a proxy to visualize the most important image regions for classification. This new architecture produces sharper visualization than the existing methods in plant diseases context. All experiments are achieved on PlantVillage dataset that contains 54306 plant images.Comment: 10 pages, 8 figures, Submitted to Signal Processing Algorithms, Architectures, Arrangements and Applications (SPA2019), https://github.com/Tahedi1/Teacher_Student_Architectur
    corecore