8,966 research outputs found

    Shape Representations Using Nested Descriptors

    Get PDF
    The problem of shape representation is a core problem in computer vision. It can be argued that shape representation is the most central representational problem for computer vision, since unlike texture or color, shape alone can be used for perceptual tasks such as image matching, object detection and object categorization. This dissertation introduces a new shape representation called the nested descriptor. A nested descriptor represents shape both globally and locally by pooling salient scaled and oriented complex gradients in a large nested support set. We show that this nesting property introduces a nested correlation structure that enables a new local distance function called the nesting distance, which provides a provably robust similarity function for image matching. Furthermore, the nesting property suggests an elegant flower like normalization strategy called a log-spiral difference. We show that this normalization enables a compact binary representation and is equivalent to a form a bottom up saliency. This suggests that the nested descriptor representational power is due to representing salient edges, which makes a fundamental connection between the saliency and local feature descriptor literature. In this dissertation, we introduce three examples of shape representation using nested descriptors: nested shape descriptors for imagery, nested motion descriptors for video and nested pooling for activities. We show evaluation results for these representations that demonstrate state-of-the-art performance for image matching, wide baseline stereo and activity recognition tasks

    Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery

    Get PDF
    A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic. In this dissertation, I present a collaborative Spatial Pyramid Context-aware moving object detection and Tracking system. The proposed visual tracker is composed of one master tracker that usually relies on visual object features and two auxiliary trackers based on object temporal motion information that will be called dynamically to assist master tracker. SPCT utilizes image spatial context at different level to make the video tracking system resistant to occlusion, background noise and improve target localization accuracy and robustness. We chose a pre-selected seven-channel complementary features including RGB color, intensity and spatial pyramid of HoG to encode object color, shape and spatial layout information. We exploit integral histogram as building block to meet the demands of real-time performance. A novel fast algorithm is presented to accurately evaluate spatially weighted local histograms in constant time complexity using an extension of the integral histogram method. Different techniques are explored to efficiently compute integral histogram on GPU architecture and applied for fast spatio-temporal median computations and 3D face reconstruction texturing. We proposed a multi-component framework based on semantic fusion of motion information with projected building footprint map to significantly reduce the false alarm rate in urban scenes with many tall structures. The experiments on extensive VOTC2016 benchmark dataset and aerial video confirm that combining complementary tracking cues in an intelligent fusion framework enables persistent tracking for Full Motion Video and Wide Aerial Motion Imagery.Comment: PhD Dissertation (162 pages

    Improving Face Detection with TOE Cameras

    Get PDF

    An Evaluation Framework and Database for MoCap-Based Gait Recognition Methods

    Get PDF
    As a contribution to reproducible research, this paper presents a framework and a database to improve the development, evaluation and comparison of methods for gait recognition from Motion Capture (MoCap) data. The evaluation framework provides implementation details and source codes of state-of-the-art human-interpretable geometric features as well as our own approaches where gait features are learned by a modification of Fisher's Linear Discriminant Analysis with the Maximum Margin Criterion, and by a combination of Principal Component Analysis and Linear Discriminant Analysis. It includes a description and source codes of a mechanism for evaluating four class separability coefficients of feature space and four rank-based classifier performance metrics. This framework also contains a tool for learning a custom classifier and for classifying a custom query on a custom gallery. We provide an experimental database along with source codes for its extraction from the general CMU MoCap database

    Biologically inspired composite image sensor for deep field target tracking

    Get PDF
    The use of nonuniform image sensors in mobile based computer vision applications can be an effective solution when computational burden is problematic. Nonuniform image sensors are still in their infancy and as such have not been fully investigated for their unique qualities nor have they been extensively applied in practice. In this dissertation a system has been developed that can perform vision tasks in both the far field and the near field. In order to accomplish this, a new and novel image sensor system has been developed. Inspired by the biological aspects of the visual systems found in both falcons and primates, a composite multi-camera sensor was constructed. The sensor provides for expandable visual range, excellent depth of field, and produces a single compact output image based on the log-polar retinal-cortical mapping that occurs in primates. This mapping provides for scale and rotational tolerant processing which, in turn, supports the mitigation of perspective distortion found in strict Cartesian based sensor systems. Furthermore, the scale-tolerant representation of objects moving on trajectories parallel to the sensor\u27s optical axis allows for fast acquisition and tracking of objects moving at high rates of speed. In order to investigate how effective this combination would be for object detection and tracking at both near and far field, the system was tuned for the application of vehicle detection and tracking from a moving platform. Finally, it was shown that the capturing of license plate information in an autonomous fashion could easily be accomplished from the extraction of information contained in the mapped log-polar representation space. The novel composite log-polar deep-field image sensor opens new horizons for computer vision. This current work demonstrates features that can benefit applications beyond the high-speed vehicle tracking for drivers assistance and license plate capture. Some of the future applications envisioned include obstacle detection for high-speed trains, computer assisted aircraft landing, and computer assisted spacecraft docking

    Visual telemetry transmission in marine environment using Robot Operating System platform

    Get PDF
    Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) “Συστήματα Αυτοματισμού

    An Efficient Boosted Classifier Tree-Based Feature Point Tracking System for Facial Expression Analysis

    Get PDF
    The study of facial movement and expression has been a prominent area of research since the early work of Charles Darwin. The Facial Action Coding System (FACS), developed by Paul Ekman, introduced the first universal method of coding and measuring facial movement. Human-Computer Interaction seeks to make human interaction with computer systems more effective, easier, safer, and more seamless. Facial expression recognition can be broken down into three distinctive subsections: Facial Feature Localization, Facial Action Recognition, and Facial Expression Classification. The first and most important stage in any facial expression analysis system is the localization of key facial features. Localization must be accurate and efficient to ensure reliable tracking and leave time for computation and comparisons to learned facial models while maintaining real-time performance. Two possible methods for localizing facial features are discussed in this dissertation. The Active Appearance Model is a statistical model describing an object\u27s parameters through the use of both shape and texture models, resulting in appearance. Statistical model-based training for object recognition takes multiple instances of the object class of interest, or positive samples, and multiple negative samples, i.e., images that do not contain objects of interest. Viola and Jones present a highly robust real-time face detection system, and a statistically boosted attentional detection cascade composed of many weak feature detectors. A basic algorithm for the elimination of unnecessary sub-frames while using Viola-Jones face detection is presented to further reduce image search time. A real-time emotion detection system is presented which is capable of identifying seven affective states (agreeing, concentrating, disagreeing, interested, thinking, unsure, and angry) from a near-infrared video stream. The Active Appearance Model is used to place 23 landmark points around key areas of the eyes, brows, and mouth. A prioritized binary decision tree then detects, based on the actions of these key points, if one of the seven emotional states occurs as frames pass. The completed system runs accurately and achieves a real-time frame rate of approximately 36 frames per second. A novel facial feature localization technique utilizing a nested cascade classifier tree is proposed. A coarse-to-fine search is performed in which the regions of interest are defined by the response of Haar-like features comprising the cascade classifiers. The individual responses of the Haar-like features are also used to activate finer-level searches. A specially cropped training set derived from the Cohn-Kanade AU-Coded database is also developed and tested. Extensions of this research include further testing to verify the novel facial feature localization technique presented for a full 26-point face model, and implementation of a real-time intensity sensitive automated Facial Action Coding System
    corecore