27 research outputs found

    Multitask Learning to Improve Egocentric Action Recognition

    Get PDF
    In this work we employ multitask learning to capitalize on the structure that exists in related supervised tasks to train complex neural networks. It allows training a network for multiple objectives in parallel, in order to improve performance on at least one of them by capitalizing on a shared representation that is developed to accommodate more information than it otherwise would for a single task. We employ this idea to tackle action recognition in egocentric videos by introducing additional supervised tasks. We consider learning the verbs and nouns from which action labels consist of and predict coordinates that capture the hand locations and the gaze-based visual saliency for all the frames of the input video segments. This forces the network to explicitly focus on cues from secondary tasks that it might otherwise have missed resulting in improved inference. Our experiments on EPIC-Kitchens and EGTEA Gaze+ show consistent improvements when training with multiple tasks over the single-task baseline. Furthermore, in EGTEA Gaze+ we outperform the state-of-the-art in action recognition by 3.84%. Apart from actions, our method produces accurate hand and gaze estimations as side tasks, without requiring any additional input at test time other than the RGB video clips.Comment: 10 pages, 3 figures, accepted at the 5th Egocentric Perception, Interaction and Computing (EPIC) workshop at ICCV 2019, code repository: https://github.com/georkap/hand_track_classificatio

    Disentangling rodent behaviors to improve automated behavior recognition

    Get PDF
    Automated observation and analysis of behavior is important to facilitate progress in many fields of science. Recent developments in deep learning have enabled progress in object detection and tracking, but rodent behavior recognition struggles to exceed 75–80% accuracy for ethologically relevant behaviors. We investigate the main reasons why and distinguish three aspects of behavior dynamics that are difficult to automate. We isolate these aspects in an artificial dataset and reproduce effects with the state-of-the-art behavior recognition models. Having an endless amount of labeled training data with minimal input noise and representative dynamics will enable research to optimize behavior recognition architectures and get closer to human-like recognition performance for behaviors with challenging dynamics

    Reproducibility via coordinated standardization:A multi-center study in a Shank2 genetic rat model for Autism Spectrum Disorders

    Get PDF
    Inconsistent findings between laboratories are hampering scientific progress and are of increasing public concern. Differences in laboratory environment is a known factor contributing to poor reproducibility of findings between research sites, and well-controlled multisite efforts are an important next step to identify the relevant factors needed to reduce variation in study outcome between laboratories. Through harmonization of apparatus, test protocol, and aligned and non-aligned environmental variables, the present study shows that behavioral pharmacological responses in Shank2 knockout (KO) rats, a model of synaptic dysfunction relevant to autism spectrum disorders, were highly replicable across three research centers. All three sites reliably observed a hyperactive and repetitive behavioral phenotype in KO rats compared to their wild-type littermates as well as a dose-dependent phenotype attenuation following acute injections of a selective mGluR1 antagonist. These results show that reproducibility in preclinical studies can be obtained and emphasizes the need for high quality and rigorous methodologies in scientific research. Considering the observed external validity, the present study also suggests mGluR1 as potential target for the treatment of autism spectrum disorders

    Object Detection-Based Location and Activity Classification from Egocentric Videos: A Systematic Analysis

    Get PDF
    Egocentric vision has emerged in the daily practice of application domains such as lifelogging, activity monitoring, robot navigation and the analysis of social interactions. Plenty of research focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that indoor locations and daily activities can be characterized by the presence of specific objects. Objects can be obtained either from laborious human annotations or automatically, using vision-based detectors. We perform a study regarding the use of object detections as input for location and activity classification and analyze the influence of various detection parameters. We compare our detections against manually provided object labels and show that location classification is affected by detection quality and quantity. Utilization of the temporal structure in object detections mitigates the consequences of noisy ones. Moreover, we determine that the recognition of activities is related to the presence of specific objects and that the lack of explicit associations between certain activities and objects hurts classification performance for these activities. Finally, we discuss the outcomes of each task and our method’s potential for real-world applications

    Transfer Learning for Rodent Behavior Recognition

    No full text
    Many behavior recognition systems are trained and tested on single datasets limiting their application to comparable datasets. While retraining the system with a novel dataset is possible, it involves laborious annotation effort. We propose to minimize the annotation effort by reusing the knowledge obtained from previous datasets and adapting the recognition system to the novel data. To this end, we investigate the use of transfer learning in the context of rodent behavior recognition. Specifically, we look at two transfer learning methods with two different approaches and examine the implications of their respective assumptions on synthetic data. We further illustrate their performance in transferring a rat action classifier to a mouse action classifier. The performance results in the transfer task are promising. The classification accuracy improves substantially with only very few labeled examples from the novel dataset

    Action Detection from Egocentric Videos in Daily Living Scenarios

    No full text
    We are researching the use of egocentric vision in the area of Human Action Recognition. Inspired from recent advances in activity recognition from video using deep learning, we investigate the detection performance of Long Short-Term Memory networks on an elementary set of Activities of Daily Living, based on the detected objects in the scene

    Where Am I? Comparing CNN and LSTM for Location Classification in Egocentric Videos

    No full text
    Egocentric vision is a technology that exists in a variety of fields such as life-logging, sports recording and robot navigation. Plenty of research work focuses on location detection and activity recognition, with applications in the area of Ambient Assisted Living. The basis of this work is the idea that locations can be characterized by the presence of specific objects. Our objective is the recognition of locations in egocentric videos that mainly consist of indoor house scenes. We perform an extensive comparison between Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based classification methods that aim at finding the in-house location by classifying the detected objects which are extracted with a state-of-the-art object detector. We show that location classification is affected by the quality of the detected objects, i.e. the false detections among the correct ones in a series of frames, but this effect can be greatly limited by taking into account the temporal structure of the information by using LSTM. Finally, we argue about the potential for useful real-world applications
    corecore