510 research outputs found

    Probabilistic three-dimensional object tracking based on adaptive depth segmentation

    Get PDF
    Object tracking is one of the fundamental topics of computer vision with diverse applications. The arising challenges in tracking, i.e., cluttered scenes, occlusion, complex motion, and illumination variations have motivated utilization of depth information from 3D sensors. However, current 3D trackers are not applicable to unconstrained environments without a priori knowledge. As an important object detection module in tracking, segmentation subdivides an image into its constituent regions. Nevertheless, the existing range segmentation methods in literature are difficult to implement in real-time due to their slow performance. In this thesis, a 3D object tracking method based on adaptive depth segmentation and particle filtering is presented. In this approach, the segmentation method as the bottom-up process is combined with the particle filter as the top-down process to achieve efficient tracking results under challenging circumstances. The experimental results demonstrate the efficiency, as well as robustness of the tracking algorithm utilizing real-world range information

    Automating Manufacturing Surveillance Processes Using External Observers

    Get PDF
    An automated assembly system is an integral part of various manufacturing industries as it reduces production cycle-time resulting in lower costs and a higher rate of production. The modular system design integrates main assembly workstations and parts-feeding machines to build a fully assembled product or sub-assembly of a larger product. Machine operation failure within the subsystems and errors in parts loading lead to slower production and gradual accumulation of parts. Repeated human intervention is required to manually clear jams at varying locations of the subsystems. To ensure increased operator safety and reduction in cycle-time, visual surveillance plays a critical role in providing real-time alerts of spatiotemporal parts irregularities. In this study, surveillance videos are obtained using external observers to conduct spatiotemporal object segmentation within: digital assembly, linear conveyance system, and vibratory bowl parts-feeder machine. As the datasets have different anomaly specifications and visual characteristics, we follow a bottom-up architecture for motion-based and appearance-based segmentation using computer vision techniques and deep-learning models. To perform motion-based segmentation, we evaluate deep learning-based and classical techniques to compute optical flow for real-time moving-object detection. As local and global methods assume brightness constancy and flow smoothness, results showed fewer detections in presence of illumination variance and occlusion. Therefore, we utilize RAFT for optical flow and apply its iteratively updated flow field to create a pixel-based object tracker. The tracker differentiates previous and current moving parts in different colored segments and simultaneously visualizes the flow field to illustrate movement direction and magnitude. We compare the segmentation performance of the optical flow-based tracker with a space-time graph neural network (ST-GNN), and it shows increased accuracy in boundary mask IoU alignment than the pixel-based tracker. As the ST-GNN addresses the limited dataset challenge in our application by learning visual correspondence as a contrastive random walk in palindrome sequences, we proceed with ST-GNN to perform motion-based segmentation. As ST-GNN requires a first-frame annotation mask for initialization, we explore appearance-based segmentation methods to enable automatic ST-GNN initialization. We evaluate pixel-based, interactive-based, and supervised segmentation techniques on the bowl-feeder image dataset. Results illustrate that K-means applied with watershed segmentation and gaussian blur reduces superpixel oversegmentation and generates segmentation aligned with parts boundary. Using Watershed Segmentation on the bowl-feeder image dataset, 377 parts were detected and segmented of total 476 parts present within the machine. We find that GLCM and Gabor filter perform better in segmenting dense parts regions than graph-based and entropy-based segmentation. In comparison to entropy-based and graph-based methods, the GLCM and Gabor filter segment 467 and 476 parts, respectively, of total 476 parts present within the bowl-feeder. Although manual annotation decreases efficiency, we see that the GrabCut annotation tool generates segmentation masks with increased accuracy than the pre-trained interactive tool. Using the GrabCut annotation tool, all 216 parts present within the bowl-feeder machine are segmented. To ensure segmentation of all parts within the bowl-feeder, we train Detectron2 with data augmentation. We see that supervised segmentation outperforms pixel-based and interactive-based segmentation. To address illumination variance within datasets, we apply color-based segmentation by conversion of image datasets to HSV color space. We utilize the images, converted within the value channel of HSV representation, for background subtraction techniques to detect moving bowl-feeder parts in real-time. To resolve image registration errors due to lower image resolution, we create Flex-Sim synthetic dataset with various anomaly instances consisting of multiple camera viewpoints. We apply preprocessing methods and affine-based transformation with RANSAC for robust image registration. We compare color and texture-based handcrafted features of registered images to ensure complete image alignment. We evaluate the PatchCore Anomaly detection method, pre-trained on MVTec industrial dataset, to the Flex-Sim dataset. We find that generated segmentation maps detect various anomaly instances within the Flex-Sim dataset

    Motion and emotion : Semantic knowledge for hollywood film indexing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Human-Robot Trust Assessment From Physical Apprehension Signals

    Get PDF

    Layered graphical models for tracking partially-occluded moving objects in video (PhD thesis)

    Full text link
    Tracking multiple targets using fixed cameras with non-overlapping views is a challenging problem. One of the challenges is predicting and tracking through occlusions caused by other targets or by fixed objects in the scene. Considerable effort has been devoted toward developing appearance models that are robust to partial occlusions, tracking algorithms that cope with short-term loss of observations, and algorithms that learn static occlusion maps. In this thesis we consider scenarios where it is impossible to learn a static occlusion map. This is often the case when the scene consists of both people and large objects whose position is not permanently fixed. These objects may enter, leave or relocate within the scene during a short time span. We call such objects "relocatable objects" or "relocatable occluders." We develop a representation for scenes containing relocatable objects that can cause partial occlusions of people in a camera's field of view. In many practical applications, relocatable objects tend to appear often; therefore, models for them can be learned off-line and stored in a database. We formulate an occluder-centric representation, called a graphical model layer, where a person's motion in the ground plane is defined as a first-order Markov process on activity zones, while image evidence is aggregated in 2D observation regions that are depth-ordered with respect to the occlusion mask of the relocatable object. We represent real-world scenes as a composition of depth-ordered, interacting graphical model layers, and account for image evidence in a way that handles mutual overlap of the observation regions and their occlusions by the relocatable objects. These layers interact: proximate ground plane zones of different model instances are linked to allow a person to move between the layers, and image evidence is shared between the observation regions of these models. We demonstrate our formulation in tracking low-resolution, partially-occluded pedestrians in the vicinity of parked vehicles. In these scenarios some tracking formulations that rely on part-based person detectors may fail completely. Our pedestrian tracker fares well and compares favorably with the state-of-the-art pedestrian detectors---lowering false positives by twenty-nine percent and false negatives by forty-two percent---and a deformable-contour--based tracker

    Visual Human Tracking and Group Activity Analysis: A Video Mining System for Retail Marketing

    Get PDF
    Thesis (PhD) - Indiana University, Computer Sciences, 2007In this thesis we present a system for automatic human tracking and activity recognition from video sequences. The problem of automated analysis of visual information in order to derive descriptors of high level human activities has intrigued computer vision community for decades and is considered to be largely unsolved. A part of this interest is derived from the vast range of applications in which such a solution may be useful. We attempt to find efficient formulations of these tasks as applied to the extracting customer behavior information in a retail marketing context. Based on these formulations, we present a system that visually tracks customers in a retail store and performs a number of activity analysis tasks based on the output from the tracker. In tracking we introduce new techniques for pedestrian detection, initialization of the body model and a formulation of the temporal tracking as a global trans-dimensional optimization problem. Initial human detection is addressed by a novel method for head detection, which incorporates the knowledge of the camera projection model.The initialization of the human body model is addressed by newly developed shape and appearance descriptors. Temporal tracking of customer trajectories is performed by employing a human body tracking system designed as a Bayesian jump-diffusion filter. This approach demonstrates the ability to overcome model dimensionality ambiguities as people are leaving and entering the scene. Following the tracking, we developed a two-stage group activity formulation based upon the ideas from swarming research. For modeling purposes, all moving actors in the scene are viewed here as simplistic agents in the swarm. This allows to effectively define a set of inter-agent interactions, which combine to derive a distance metric used in further swarm clustering. This way, in the first stage the shoppers that belong to the same group are identified by deterministically clustering bodies to detect short term events and in the second stage events are post-processed to form clusters of group activities with fuzzy memberships. Quantitative analysis of the tracking subsystem shows an improvement over the state of the art methods, if used under similar conditions. Finally, based on the output from the tracker, the activity recognition procedure achieves over 80% correct shopper group detection, as validated by the human generated ground truth results

    Robust SLAM and motion segmentation under long-term dynamic large occlusions

    Get PDF
    Visual sensors are key to robot perception, which can not only help robot localisation but also enable robots to interact with the environment. However, in new environments, robots can fail to distinguish the static and dynamic components in the visual input. Consequently, robots are unable to track objects or localise themselves. Methods often require precise robot proprioception to compensate for camera movement and separate the static background from the visual input. However, robot proprioception, such as \ac{IMU} or wheel odometry, usually faces the problem of drift accumulation. The state-of-the-art methods demonstrate promising performance but either (1) require semantic segmentation, which is inaccessible in unknown environments, or (2) treat dynamic components as outliers -- which is unfeasible when dynamic objects occupy a large proportion of the visual input. This research work systematically unifies camera and multi-object tracking problems in indoor environments by proposing a multi-motion tracking system; and enables robots to differentiate the static and dynamic components in the visual input with the understanding of their own movements and actions. Detailed evaluation of both simulation environments and robotic platforms suggests that the proposed method outperforms the state-of-the-art dynamic SLAM methods when the majority of the camera view is occluded by multiple unmodeled objects over a long period of time

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    CGAMES'2009

    Get PDF
    corecore