2,941 research outputs found

    Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields

    Full text link
    This work presents a first evaluation of using spatio-temporal receptive fields from a recently proposed time-causal spatio-temporal scale-space framework as primitives for video analysis. We propose a new family of video descriptors based on regional statistics of spatio-temporal receptive field responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain and from object recognition to dynamic texture recognition. The time-recursive formulation enables computationally efficient time-causal recognition. The experimental evaluation demonstrates competitive performance compared to state-of-the-art. Especially, it is shown that binary versions of our dynamic texture descriptors achieve improved performance compared to a large range of similar methods using different primitives either handcrafted or learned from data. Further, our qualitative and quantitative investigation into parameter choices and the use of different sets of receptive fields highlights the robustness and flexibility of our approach. Together, these results support the descriptive power of this family of time-causal spatio-temporal receptive fields, validate our approach for dynamic texture recognition and point towards the possibility of designing a range of video analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure

    Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

    Full text link
    We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic tasks where spatio-temporal features are prevailing. In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. Inspired by the success of two-stream approaches in video classification, we propose to learn visual features by regressing both motion and appearance statistics along spatial and temporal dimensions, given only the input video data. Specifically, we extract statistical concepts (fast-motion region and the corresponding dominant direction, spatio-temporal color diversity, dominant color, etc.) from simple patterns in both spatial and temporal domains. Unlike prior puzzles that are even hard for humans to solve, the proposed approach is consistent with human inherent visual habits and therefore easy to answer. We conduct extensive experiments with C3D to validate the effectiveness of our proposed approach. The experiments show that our approach can significantly improve the performance of C3D when applied to video classification tasks. Code is available at https://github.com/laura-wang/video_repres_mas.Comment: CVPR 201

    Multimedia and Decision-Making Process

    Get PDF
    Multimedia technology has changed the way we use computers. Multimedia transforms com-puters into a second person. Multimedia technology has made it possible for us to see, hear, read, feel, and talk to computers. Multimedia technology has transformed our use and understanding of computers. On the other hand, multimedia presentation is one of the fastest-growing sectors of the computer industry. Applications have appeared in many areas, such as training, education, business presentation, merchandising, and communications.multimedia, decision, studies, mining, architecture

    The Evolution of First Person Vision Methods: A Survey

    Full text link
    The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

    Scour Damage to Vermont Bridges

    Get PDF
    Scour is by far the primary cause of bridge failures in the United States. Regionally, the vulnerability of bridges to flood damage became evident from the damage seen to Vermont bridges in the 2011 Tropical Storm Irene. Successfully mitigating scour-related problems associated with bridges depends on our ability to reliably estimate scour potential, design effective scour prevention and countermeasures, design safe and economical foundation elements accounting for scour potential, and design reliable and economically feasible monitoring systems. This report presents research on two particular aspects related to bridge scour – 1) System-level analysis of damage observed at Vermont bridges from Tropical Storm Irene. Example case studies are presented including description of the bridge damage, as well as the pre-storm condition of the bridges. Statistical comparison to non-damaged bridges is included to identify significant factors that determine bridge vulnerability to storm damage; and 2)Development of a low-cost scour sensor suitable for monitoring scour and redeposition continuously and communicating the readings wirelessly in real time to stake holders

    EEG analysis of visually-induced vection in left- and right-handers

    Get PDF

    Hybrid Video Stabilization for Mobile Vehicle Detection on SURF in Aerial Surveillance

    Get PDF
    Detection of moving vehicles in aerial video sequences is of great importance with many promising applications in surveillance, intelligence transportation, or public service applications such as emergency evacuation and policy security. However, vehicle detection is a challenging task due to global camera motion, low resolution of vehicles, and low contrast between vehicles and background. In this paper, we present a hybrid method to efficiently detect moving vehicle in aerial videos. Firstly, local feature extraction and matching were performed to estimate the global motion. It was demonstrated that the Speeded Up Robust Feature (SURF) key points were more suitable for the stabilization task. Then, a list of dynamic pixels was obtained and grouped for different moving vehicles by comparing the different optical flow normal. To enhance the precision of detection, some preprocessing methods were applied to the surveillance system, such as road extraction and other features. A quantitative evaluation on real video sequences indicated that the proposed method improved the detection performance significantly

    Unsupervised Methods for Camera Pose Estimation and People Counting in Crowded Scenes

    Get PDF
    Most visual crowd counting methods rely on training with labeled data to learn a mapping between features in the image and the number of people in the scene. However, the exact nature of this mapping may change as a function of different scene and viewing conditions, limiting the ability of such supervised systems to generalize to novel conditions, and thus preventing broad deployment. Here I propose an alternative, unsupervised strategy anchored on a 3D simulation that automatically learns how groups of people appear in the image and adapts to the signal processing parameters of the current viewing scenario. To implement this 3D strategy, knowledge of the camera parameters is required. Most methods for automatic camera calibration make assumptions about regularities in scene structure or motion patterns, which do not always apply. I propose a novel motion based approach for recovering camera tilt that does not require tracking. Having an automatic camera calibration method allows for the implementation of an accurate crowd counting algorithm that reasons in 3D. The system is evaluated on various datasets and compared against state-of-art methods
    • 

    corecore