57,954 research outputs found

    A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects

    Full text link
    Tracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over time. In particular, it is still difficult for state-of-the-art human trackers to recover complete human trajectories in crowded scenes with frequent human interactions. In this work, we consider the visibility status of a subject as a fluent variable, whose change is mostly attributed to the subject's interaction with the surrounding, e.g., crossing behind another object, entering a building, or getting into a vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the causal-effect relations between an object's visibility fluent and its activities, and develop a probabilistic graph model to jointly reason the visibility fluent change (e.g., from visible to invisible) and track humans in videos. We formulate this joint task as an iterative search of a feasible causal graph structure that enables fast search algorithm, e.g., dynamic programming method. We apply the proposed method on challenging video sequences to evaluate its capabilities of estimating visibility fluent changes of subjects and tracking subjects of interests over time. Results with comparisons demonstrate that our method outperforms the alternative trackers and can recover complete trajectories of humans in complicated scenarios with frequent human interactions.Comment: accepted by CVPR 201

    NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

    Full text link
    Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Fostering reflection in the training of speech-receptive action

    Get PDF
    Dieser Aufsatz erörtert Möglichkeiten und Probleme der Förderung kommunikativer Fertigkeiten durch die Unterstützung der Reflexion eigenen sprachrezeptiven Handelns und des Einsatzes von computerunterstützten Lernumgebungen für dessen Förderung. Kommunikationstrainings widmen sich meistens der Förderung des beobachtbaren sprachproduktiven Handelns (Sprechen). Die individuellen kognitiven Prozesse, die dem sprachrezeptiven Handeln (Hören und Verstehen) zugrunde liegen, werden häufig vernachlässigt. Dies wird dadurch begründet, dass sprachrezeptives Handeln in einer kommunikativen Situation nur schwer zugänglich und die Förderung der individuellen Prozesse sprachrezeptiven Handelns sehr zeitaufwändig ist. Das zentrale Lernprinzip - die Reflexion des eigenen sprachlich-kommunikativen Handelns - wird aus verschiedenen Perspektiven diskutiert. Vor dem Hintergrund der Reflexionsmodelle wird die computerunterstützte Lernumgebung CaiMan© vorgestellt und beschrieben. Daran anschließend werden sieben Erfolgsfaktoren aus der empirischen Forschung zur Lernumgebung CaiMan© abgeleitet. Der Artikel endet mit der Vorstellung von zwei empirischen Studien, die Möglichkeiten der Reflexionsunterstützung untersucheThis article discusses the training of communicative skills by fostering the reflection of speech-receptive action and the opportunities for using software for this purpose. Most frameworks for the training of communicative behavior focus on fostering the observable speech-productive action (i.e. speaking); the individual cognitive processes underlying speech-receptive action (hearing and understanding utterances) are often neglected. Computer-supported learning environments employed as cognitive tools can help to foster speech-receptive action. Seven success factors for the integration of software into the training of soft skills have been derived from empirical research. The computer-supported learning environment CaiMan© based on these ideas is presented. One central learning principle in this learning environment reflection of one's own action will be discussed from different perspectives. The article concludes with two empirical studies examining opportunities to foster reflecti

    Online real-time crowd behavior detection in video sequences

    Get PDF
    Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach
    • …
    corecore