3,243 research outputs found

    STV-based Video Feature Processing for Action Recognition

    Get PDF
    In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end

    An Efficient Boosted Classifier Tree-Based Feature Point Tracking System for Facial Expression Analysis

    Get PDF
    The study of facial movement and expression has been a prominent area of research since the early work of Charles Darwin. The Facial Action Coding System (FACS), developed by Paul Ekman, introduced the first universal method of coding and measuring facial movement. Human-Computer Interaction seeks to make human interaction with computer systems more effective, easier, safer, and more seamless. Facial expression recognition can be broken down into three distinctive subsections: Facial Feature Localization, Facial Action Recognition, and Facial Expression Classification. The first and most important stage in any facial expression analysis system is the localization of key facial features. Localization must be accurate and efficient to ensure reliable tracking and leave time for computation and comparisons to learned facial models while maintaining real-time performance. Two possible methods for localizing facial features are discussed in this dissertation. The Active Appearance Model is a statistical model describing an object\u27s parameters through the use of both shape and texture models, resulting in appearance. Statistical model-based training for object recognition takes multiple instances of the object class of interest, or positive samples, and multiple negative samples, i.e., images that do not contain objects of interest. Viola and Jones present a highly robust real-time face detection system, and a statistically boosted attentional detection cascade composed of many weak feature detectors. A basic algorithm for the elimination of unnecessary sub-frames while using Viola-Jones face detection is presented to further reduce image search time. A real-time emotion detection system is presented which is capable of identifying seven affective states (agreeing, concentrating, disagreeing, interested, thinking, unsure, and angry) from a near-infrared video stream. The Active Appearance Model is used to place 23 landmark points around key areas of the eyes, brows, and mouth. A prioritized binary decision tree then detects, based on the actions of these key points, if one of the seven emotional states occurs as frames pass. The completed system runs accurately and achieves a real-time frame rate of approximately 36 frames per second. A novel facial feature localization technique utilizing a nested cascade classifier tree is proposed. A coarse-to-fine search is performed in which the regions of interest are defined by the response of Haar-like features comprising the cascade classifiers. The individual responses of the Haar-like features are also used to activate finer-level searches. A specially cropped training set derived from the Cohn-Kanade AU-Coded database is also developed and tested. Extensions of this research include further testing to verify the novel facial feature localization technique presented for a full 26-point face model, and implementation of a real-time intensity sensitive automated Facial Action Coding System

    Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery

    Get PDF
    A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic. In this dissertation, I present a collaborative Spatial Pyramid Context-aware moving object detection and Tracking system. The proposed visual tracker is composed of one master tracker that usually relies on visual object features and two auxiliary trackers based on object temporal motion information that will be called dynamically to assist master tracker. SPCT utilizes image spatial context at different level to make the video tracking system resistant to occlusion, background noise and improve target localization accuracy and robustness. We chose a pre-selected seven-channel complementary features including RGB color, intensity and spatial pyramid of HoG to encode object color, shape and spatial layout information. We exploit integral histogram as building block to meet the demands of real-time performance. A novel fast algorithm is presented to accurately evaluate spatially weighted local histograms in constant time complexity using an extension of the integral histogram method. Different techniques are explored to efficiently compute integral histogram on GPU architecture and applied for fast spatio-temporal median computations and 3D face reconstruction texturing. We proposed a multi-component framework based on semantic fusion of motion information with projected building footprint map to significantly reduce the false alarm rate in urban scenes with many tall structures. The experiments on extensive VOTC2016 benchmark dataset and aerial video confirm that combining complementary tracking cues in an intelligent fusion framework enables persistent tracking for Full Motion Video and Wide Aerial Motion Imagery.Comment: PhD Dissertation (162 pages

    Real-time embedded eye detection system

    Get PDF
    The detection of a person’s eyes is a basic task in applications as important as iris recognition in biometric identification or fatigue detection in driving assistance systems. Current commercial and research systems use software frameworks that require a dedicated computer, whose power consumption, size, and price are significantly large. This paper presents a hardware-based embedded solution for eye detection in real-time. From an algorithmic point-of-view, the popular Viola-Jones approach has been redesigned to enable highly parallel, single-pass image-processing implementation. Synthesized and implemented in an All-Programmable System-on-Chip (AP SoC), this proposal allows us to process more than 88 frames per second (fps), taking the classifier less than 2 ms per image. Experimental validation has been successfully addressed in an iris recognition system that works with walking subjects. In this case, the prototype module includes a CMOS digital imaging sensor providing 16 Mpixels images, and it outputs a stream of detected eyes as 640 × 480 images. Experiments for determining the accuracy of the proposed system in terms of eye detection are performed in the CASIA-Iris-distance V4 database. Significantly, they show that the accuracy in terms of eye detection is 100%.This work has been partially developed within the project RTI2018-099522-B-C4X, funded by the Gobierno de España and FEDER funds, and the ARMORI project (CEIATECH-10) funded by the University of Málaga. Portions of the research in this paper use the CASIA-Iris V4 collected by the Chinese Academy of Sciences - Institute of Automation (CASIA)

    Robust Vehicle Detection and Distance Estimation Under Challenging Lighting Conditions

    Get PDF
    Avoiding high computational costs and calibration issues involved in stereo-vision-based algorithms, this paper proposes real-time monocular-vision-based techniques for simultaneous vehicle detection and inter-vehicle distance estimation, in which the performance and robustness of the system remain competitive, even for highly challenging benchmark datasets. This paper develops a collision warning system by detecting vehicles ahead and, by identifying safety distances to assist a distracted driver, prior to occurrence of an imminent crash. We introduce adaptive global Haar-like features for vehicle detection, tail-light segmentation, virtual symmetry detection, intervehicle distance estimation, as well as an efficient single-sensor multifeature fusion technique to enhance the accuracy and robustness of our algorithm. The proposed algorithm is able to detect vehicles ahead at both day or night and also for short- and long-range distances. Experimental results under various weather and lighting conditions (including sunny, rainy, foggy, or snowy) show that the proposed algorithm outperforms state-of-the-art algorithms

    Non-Verbal Feedback on User Interest Based on Gaze Direction and Head Pose

    Get PDF

    Texture-based Tracking in mm-wave Images

    Get PDF
    Current tracking methods rely on color-, intensity-, and edge-based features to compute a description of an image region. These approaches are not well-suited for low-quality images such as mm-wave data from full-body scanners. In order to perform tracking in such challenging grayscale images, we propose several enhancements and extensions to the Visual Tracking Decomposition (VTD) by Kwon and Lee. A novel region descriptor, which uses texture-based features, is presented and integrated into VTD. We improve VTD by adding a sophisticated weighting scheme for observations, better motion models, and a more realistic way for sampling and interaction. Our method not only outperforms VTD on mm-wave data but also has comparable results on normal-quality images. We are confident that our region descriptor can easily be extended to other kinds of features and applications such that tracking can be performed in a large variety of image data, especially low-resolution, low-illumination and noisy images
    corecore