504,274 research outputs found

    VideoGraph: Recognizing Minutes-Long Human Activities in Videos

    Get PDF
    Many human activities take minutes to unfold. To represent them, related works opt for statistical pooling, which neglects the temporal structure. Others opt for convolutional methods, as CNN and Non-Local. While successful in learning temporal concepts, they are short of modeling minutes-long temporal dependencies. We propose VideoGraph, a method to achieve the best of two worlds: represent minutes-long human activities and learn their underlying temporal structure. VideoGraph learns a graph-based representation for human activities. The graph, its nodes and edges are learned entirely from video datasets, making VideoGraph applicable to problems without node-level annotation. The result is improvements over related works on benchmarks: Epic-Kitchen and Breakfast. Besides, we demonstrate that VideoGraph is able to learn the temporal structure of human activities in minutes-long videos

    An Imperceptible Method to Monitor Human Activity by Using Sensor Data with CNN and Bi-directional LSTM

    Get PDF
    Deep learning (DL) algorithms have substantially increased research in recognizing day-to-day human activities All methods for recognizing human activities that are found through DL methods will only be useful if they work better in real-time applications.  Activities of elderly people need to be monitored to detect any abnormalities in their health and to suggest healthy life style based on their day to day activities. Most of the existing approaches used videos, static photographs for recognizing the activities. Those methods make the individual to feel anxious that they are being monitored. To address this limitation we utilized the cognitive outcomes of DL algorithms and used sensor data as input to the proposed model which is collected from smart home dataset for recognizing elderly people activity, without any interference in their privacy. At early stages human activities the input for human activity recognition by DL models are done using single sensor data which are static and lack in recognizing dynamic and multi sensor data. We propose a DL architecture based on the blend of deep Convolutional Neural Network (CNN) and Bi-directional Long Short-Term Memory (Bi-LSTM) in this research which replaces human intervention by automatically extracting features from multifunctional sensing devices to reliably recognize the activities. During the entire investigation process we utilized Tulum, a benchmark dataset that contains the logs of sensor data. We exhibit that our methodology outperforms by marking its accuracy as 98.76% and F1 score as 0.98

    Retrieving, annotating and recognizing human activities in web videos

    Get PDF
    Recent e orts in computer vision tackle the problem of human activity understanding in video sequences. Traditionally, these algorithms require annotated video data to learn models. In this work, we introduce a novel data collection framework, to take advantage of the large amount of video data available on the web. We use this new framework to retrieve videos of human activities, and build training and evaluation datasets for computer vision algorithms. We rely on Amazon Mechanical Turk workers to obtain high accuracy annotations. An agglomerative clustering technique brings the possibility to achieve reliable and consistent annotations for temporal localization of human activities in videos. Using two datasets, Olympics Sports and our novel Daily Human Activities dataset, we show that our collection/annotation framework can make robust annotations of human activities in large amount of video data
    corecore