11 research outputs found

    Unsupervised Segmentation of Action Segments in Egocentric Videos using Gaze

    Full text link
    Unsupervised segmentation of action segments in egocentric videos is a desirable feature in tasks such as activity recognition and content-based video retrieval. Reducing the search space into a finite set of action segments facilitates a faster and less noisy matching. However, there exist a substantial gap in machine understanding of natural temporal cuts during a continuous human activity. This work reports on a novel gaze-based approach for segmenting action segments in videos captured using an egocentric camera. Gaze is used to locate the region-of-interest inside a frame. By tracking two simple motion-based parameters inside successive regions-of-interest, we discover a finite set of temporal cuts. We present several results using combinations (of the two parameters) on a dataset, i.e., BRISGAZE-ACTIONS. The dataset contains egocentric videos depicting several daily-living activities. The quality of the temporal cuts is further improved by implementing two entropy measures.Comment: To appear in 2017 IEEE International Conference On Signal and Image Processing Application

    The Evolution of First Person Vision Methods: A Survey

    Full text link
    The emergence of new wearable technologies such as action cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart Glasses, Computer Vision, Video Analytics, Human-machine Interactio

    Hand contour detection in wearable camera video using an adaptive histogram region of interest

    Get PDF
    BACKGROUND: Monitoring hand function at home is needed to better evaluate the effectiveness of rehabilitation interventions. Our objective is to develop wearable computer vision systems for hand function monitoring. The specific aim of this study is to develop an algorithm that can identify hand contours in video from a wearable camera that records the user’s point of view, without the need for markers. METHODS: The two-step image processing approach for each frame consists of: (1) Detecting a hand in the image, and choosing one seed point that lies within the hand. This step is based on a priori models of skin colour. (2) Identifying the contour of the region containing the seed point. This is accomplished by adaptively determining, for each frame, the region within a colour histogram that corresponds to hand colours, and backprojecting the image using the reduced histogram. RESULTS: In four test videos relevant to activities of daily living, the hand detector classification accuracy was 88.3%. The contour detection results were compared to manually traced contours in 97 test frames, and the median F-score was 0.86. CONCLUSION: This algorithm will form the basis for a wearable computer-vision system that can monitor and log the interactions of the hand with its environment

    Computer Vision Algorithms for Mobile Camera Applications

    Get PDF
    Wearable and mobile sensors have found widespread use in recent years due to their ever-decreasing cost, ease of deployment and use, and ability to provide continuous monitoring as opposed to sensors installed at fixed locations. Since many smart phones are now equipped with a variety of sensors, including accelerometer, gyroscope, magnetometer, microphone and camera, it has become more feasible to develop algorithms for activity monitoring, guidance and navigation of unmanned vehicles, autonomous driving and driver assistance, by using data from one or more of these sensors. In this thesis, we focus on multiple mobile camera applications, and present lightweight algorithms suitable for embedded mobile platforms. The mobile camera scenarios presented in the thesis are: (i) activity detection and step counting from wearable cameras, (ii) door detection for indoor navigation of unmanned vehicles, and (iii) traffic sign detection from vehicle-mounted cameras. First, we present a fall detection and activity classification system developed for embedded smart camera platform CITRIC. In our system, the camera platform is worn by the subject, as opposed to static sensors installed at fixed locations in certain rooms, and, therefore, monitoring is not limited to confined areas, and extends to wherever the subject may travel including indoors and outdoors. Next, we present a real-time smart phone-based fall detection system, wherein we implement camera and accelerometer based fall-detection on Samsung Galaxy S™ 4. We fuse these two sensor modalities to have a more robust fall detection system. Then, we introduce a fall detection algorithm with autonomous thresholding using relative-entropy within the class of Ali-Silvey distance measures. As another wearable camera application, we present a footstep counting algorithm using a smart phone camera. This algorithm provides more accurate step-count compared to using only accelerometer data in smart phones and smart watches at various body locations. As a second mobile camera scenario, we study autonomous indoor navigation of unmanned vehicles. A novel approach is proposed to autonomously detect and verify doorway openings by using the Google Project Tango™ platform. The third mobile camera scenario involves vehicle-mounted cameras. More specifically, we focus on traffic sign detection from lower-resolution and noisy videos captured from vehicle-mounted cameras. We present a new method for accurate traffic sign detection, incorporating Aggregate Channel Features and Chain Code Histograms, with the goal of providing much faster training and testing, and comparable or better performance, with respect to deep neural network approaches, without requiring specialized processors. Proposed computer vision algorithms provide promising results for various useful applications despite the limited energy and processing capabilities of mobile devices

    Rethinking Pen Input Interaction: Enabling Freehand Sketching Through Improved Primitive Recognition

    Get PDF
    Online sketch recognition uses machine learning and artificial intelligence techniques to interpret markings made by users via an electronic stylus or pen. The goal of sketch recognition is to understand the intention and meaning of a particular user's drawing. Diagramming applications have been the primary beneficiaries of sketch recognition technology, as it is commonplace for the users of these tools to rst create a rough sketch of a diagram on paper before translating it into a machine understandable model, using computer-aided design tools, which can then be used to perform simulations or other meaningful tasks. Traditional methods for performing sketch recognition can be broken down into three distinct categories: appearance-based, gesture-based, and geometric-based. Although each approach has its advantages and disadvantages, geometric-based methods have proven to be the most generalizable for multi-domain recognition. Tools, such as the LADDER symbol description language, have shown to be capable of recognizing sketches from over 30 different domains using generalizable, geometric techniques. The LADDER system is limited, however, in the fact that it uses a low-level recognizer that supports only a few primitive shapes, the building blocks for describing higher-level symbols. Systems which support a larger number of primitive shapes have been shown to have questionable accuracies as the number of primitives increase, or they place constraints on how users must input shapes (e.g. circles can only be drawn in a clockwise motion; rectangles must be drawn starting at the top-left corner). This dissertation allows for a significant growth in the possibility of free-sketch recognition systems, those which place little to no drawing constraints on users. In this dissertation, we describe multiple techniques to recognize upwards of 18 primitive shapes while maintaining high accuracy. We also provide methods for producing confidence values and generating multiple interpretations, and explore the difficulties of recognizing multi-stroke primitives. In addition, we show the need for a standardized data repository for sketch recognition algorithm testing and propose SOUSA (sketch-based online user study application), our online system for performing and sharing user study sketch data. Finally, we will show how the principles we have learned through our work extend to other domains, including activity recognition using trained hand posture cues

    Wearable hand activity recognition for event summarization

    No full text
    In this paper we develop a first step towards the recognition of hand activity by detecting objects subject to manipulation, and use the results to build a visual summary of events. The motivation is to extract information from hand activity without requiring that the wearer is explicit as in gesture-based interaction. Our method uses simple image measurements within a probabilistic framework and allows real-time implementation. © 2005 IEEE
    corecore