6,092 research outputs found

    Classification of Occluded Objects using Fast Recurrent Processing

    Full text link
    Recurrent neural networks are powerful tools for handling incomplete data problems in computer vision, thanks to their significant generative capabilities. However, the computational demand for these algorithms is too high to work in real time, without specialized hardware or software solutions. In this paper, we propose a framework for augmenting recurrent processing capabilities into a feedforward network without sacrificing much from computational efficiency. We assume a mixture model and generate samples of the last hidden layer according to the class decisions of the output layer, modify the hidden layer activity using the samples, and propagate to lower layers. For visual occlusion problem, the iterative procedure emulates feedforward-feedback loop, filling-in the missing hidden layer activity with meaningful representations. The proposed algorithm is tested on a widely used dataset, and shown to achieve 2×\times improvement in classification accuracy for occluded objects. When compared to Restricted Boltzmann Machines, our algorithm shows superior performance for occluded object classification.Comment: arXiv admin note: text overlap with arXiv:1409.8576 by other author

    Observation-switching linear dynamic systems for tracking humans through unexpected partial occlusions by scene objects

    Get PDF
    This paper focuses on the problem of tracking people through occlusions by scene objects. Rather than relying on models of the scene to predict when occlusions will occur as other researchers have done, this paper proposes a linear dynamic system that switches between two alternatives of the position measurement in order to handle occlusions as they occur. The filter automatically switches between a foot-based measure of position (assuming z = Q) to a head-based position measure (given the person\u27s height) when an occlusion of the person\u27s lower body occurs. No knowledge of the scene or its occluding objects is used. Unlike similar research [2, 14], the approach does not assume a fixed height for people and so is able to track humans through occlusions even when they change height during the occlusion. The approach is evaluated on three furnished scenes containing tables, chairs, desks and partitions. Occlusions range from occlusions of legs, occlusions whilst being seated and near-total occlusions where only the person\u27s head is visible. Results show that the approach provides a significant reduction in false-positive tracks in a multi-camera environment, and more than halves the number of lost tracks in single monocular camera views

    Feature-based tracking of multiple people for intelligent video surveillance.

    Get PDF
    Intelligent video surveillance is the process of performing surveillance task automatically by a computer vision system. It involves detecting and tracking people in the video sequence and understanding their behavior. This thesis addresses the problem of detecting and tracking multiple moving people with unknown background. We have proposed a feature-based framework for tracking, which requires feature extraction and feature matching. We have considered color, size, blob bounding box and motion information as features of people. In our feature-based tracking system, we have proposed to use Pearson correlation coefficient for matching feature-vector with temporal templates. The occlusion problem has been solved by histogram backprojection. Our tracking system is fast and free from assumptions about human structure. We have implemented our tracking system using Visual C++ and OpenCV and tested on real-world images and videos. Experimental results suggest that our tracking system achieved good accuracy and can process videos in 10-15 fps.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2006 .A42. Source: Masters Abstracts International, Volume: 45-01, page: 0347. Thesis (M.Sc.)--University of Windsor (Canada), 2006

    SAE and ISO standards for warnings and other driver interface elements: a summary

    Full text link
    This document summarizes 8 SAE documents (4 information reports, 3 recommended practices, and 1 standard), 8 ISO documents (5 standards, 2 technical specifications, and 1 technical report), and 3 NCAP documents. Standards and Recommended Practices describe what must (“shall”) and should be. Information Reports generally provide useful information and guidance without requirements or recommendations. The SAE documents include J2395 (message priority), J2396 (definitions and measures for visual behavior), J2399 (ACC characteristics and user interface), J2400 (FCW operating characteristics and user interface), J2802 (blind spot system operating characteristics and user interface), J2808 (Road/LDW system user interface), J2830 (icon comprehension test), J2831 (recommendations for alphanumeric text messages). The ISO documents include PDTR 12204 (integration of safety warning signals to avoid conflicts), 15005 (dialog management principles and compliance procedures), CD 15006 (specification for auditory information), 15008 (specification and tests for visual information), 16951 (procedure to determine message priority), 17287 (procedure to assess suitability for use while driving), DTS 15007 (measurement of driver visual behavior).Hyundai-Kia America Technical Centerhttp://deepblue.lib.umich.edu/bitstream/2027.42/134039/1/103248.pdf-1Description of 103248.pdf : Final repor

    Data Hiding in Digital Video

    Get PDF
    With the rapid development of digital multimedia technologies, an old method which is called steganography has been sought to be a solution for data hiding applications such as digital watermarking and covert communication. Steganography is the art of secret communication using a cover signal, e.g., video, audio, image etc., whereas the counter-technique, detecting the existence of such as a channel through a statistically trained classifier, is called steganalysis. The state-of-the art data hiding algorithms utilize features; such as Discrete Cosine Transform (DCT) coefficients, pixel values, motion vectors etc., of the cover signal to convey the message to the receiver side. The goal of embedding algorithm is to maximize the number of bits sent to the decoder side (embedding capacity) with maximum robustness against attacks while keeping the perceptual and statistical distortions (security) low. Data Hiding schemes are characterized by these three conflicting requirements: security against steganalysis, robustness against channel associated and/or intentional distortions, and the capacity in terms of the embedded payload. Depending upon the application it is the designer\u27s task to find an optimum solution amongst them. The goal of this thesis is to develop a novel data hiding scheme to establish a covert channel satisfying statistical and perceptual invisibility with moderate rate capacity and robustness to combat steganalysis based detection. The idea behind the proposed method is the alteration of Video Object (VO) trajectory coordinates to convey the message to the receiver side by perturbing the centroid coordinates of the VO. Firstly, the VO is selected by the user and tracked through the frames by using a simple region based search strategy and morphological operations. After the trajectory coordinates are obtained, the perturbation of the coordinates implemented through the usage of a non-linear embedding function, such as a polar quantizer where both the magnitude and phase of the motion is used. However, the perturbations made to the motion magnitude and phase were kept small to preserve the semantic meaning of the object motion trajectory. The proposed method is well suited to the video sequences in which VOs have smooth motion trajectories. Examples of these types could be found in sports videos in which the ball is the focus of attention and exhibits various motion types, e.g., rolling on the ground, flying in the air, being possessed by a player, etc. Different sports video sequences have been tested by using the proposed method. Through the experimental results, it is shown that the proposed method achieved the goal of both statistical and perceptual invisibility with moderate rate embedding capacity under AWGN channel with varying noise variances. This achievement is important as the first step for both active and passive steganalysis is the detection of the existence of covert channel. This work has multiple contributions in the field of data hiding. Firstly, it is the first example of a data hiding method in which the trajectory of a VO is used. Secondly, this work has contributed towards improving steganographic security by providing new features: the coordinate location and semantic meaning of the object

    Analyzing Structured Scenarios by Tracking People and Their Limbs

    Get PDF
    The analysis of human activities is a fundamental problem in computer vision. Though complex, interactions between people and their environment often exhibit a spatio-temporal structure that can be exploited during analysis. This structure can be leveraged to mitigate the effects of missing or noisy visual observations caused, for example, by sensor noise, inaccurate models, or occlusion. Trajectories of people and their hands and feet, often sufficient for recognition of human activities, lead to a natural qualitative spatio-temporal description of these interactions. This work introduces the following contributions to the task of human activity understanding: 1) a framework that efficiently detects and tracks multiple interacting people and their limbs, 2) an event recognition approach that integrates both logical and probabilistic reasoning in analyzing the spatio-temporal structure of multi-agent scenarios, and 3) an effective computational model of the visibility constraints imposed on humans as they navigate through their environment. The tracking framework mixes probabilistic models with deterministic constraints and uses AND/OR search and lazy evaluation to efficiently obtain the globally optimal solution in each frame. Our high-level reasoning framework efficiently and robustly interprets noisy visual observations to deduce the events comprising structured scenarios. This is accomplished by combining First-Order Logic, Allen's Interval Logic, and Markov Logic Networks with an event hypothesis generation process that reduces the size of the ground Markov network. When applied to outdoor one-on-one basketball videos, our framework tracks the players and, guided by the game rules, analyzes their interactions with each other and the ball, annotating the videos with the relevant basketball events that occurred. Finally, motivated by studies of spatial behavior, we use a set of features from visibility analysis to represent spatial context in the interpretation of human spatial activities. We demonstrate the effectiveness of our representation on trajectories generated by humans in a virtual environment

    Beyond sensorimotor segregation: On mirror neurons and social affordance space tracking

    Get PDF
    Mirror neuron research has come a long way since the early 1990s, and many theorists are now stressing the heterogeneity and complexity of the sensorimotor properties of fronto-parietal circuits. However, core aspects of the initial ‘ mirror mechanism ’ theory, i.e. the idea of a symmetric encapsulated mirroring function translating sensory action perceptions into motor formats, still appears to be shaping much of the debate. This article challenges the empirical plausibility of the sensorimotor segregation implicit in the original mirror metaphor. It is proposed instead that the teleological organization found in the broader fronto-parietal circuits might be inherently sensorimotor. Thus the idea of an independent ‘purely perceptual’ goal understanding process is questioned. Further, it is hypothesized that the often asymmetric, heterogeneous and contextually modulated mirror and canonical neurons support a function of multisensory mapping and tracking of the perceiving agents affordance space. Such a shift in the interpretative framework offers a different theoretical handle on how sensorimotor processes might ground various aspects of intentional action choice and social cognition. Mirror neurons would under the proposed “social affordance model” be seen as dynamic parts of larger circuits, which support tracking of currently shared and competing action possibilities. These circuits support action selection processes—but also our understanding of the options and action potentials that we and perhaps others have in the affordance space. In terms of social cognition ‘ mirror ’ circuits might thus help us understand not only the intentional actions others are actually performing—but also what they could have done, did not do and might do shortly

    Representation, space and Hollywood Squares: Looking at things that aren't there anymore

    Get PDF
    It has been argued that the human cognitive system is capable of using spatial indexes or oculomotor coordinates to relieve working memory load (Ballard, Hayhoe, Pook & Rao, 1997) track multiple moving items through occlusion (Scholl & Pylyshyn, 1999) or link incompatible cognitive and sensorimotor codes (Bridgeman and Huemer, 1998). Here we examine the use of such spatial information in memory for semantic information. Previous research has often focused on the role of task demands and the level of automaticity in the encoding of spatial location in memory tasks. We present five experiments where location is irrelevant to the task, and participants' encoding of spatial information is measured implicitly by their looking behavior during recall. In a paradigm developed from Spivey and Geng (submitted), participants were presented with pieces of auditory, semantic information as part of an event occurring in one of four regions of a computer screen. In front of a blank grid, they were asked a question relating to one of those facts. Under certain conditions it was found that during the question period participants made significantly more saccades to the empty region of space where the semantic information had been previously presented. Our findings are discussed in relation to previous research on memory and spatial location, the dorsal and ventral streams of the visual system, and the notion of a cognitive-perceptual system using spatial indexes to exploit the stability of the external world
    • 

    corecore