5,417 research outputs found
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Research on depth-based human activity analysis achieved outstanding
performance and demonstrated the effectiveness of 3D representation for action
recognition. The existing depth-based and RGB+D-based action recognition
benchmarks have a number of limitations, including the lack of large-scale
training samples, realistic number of distinct class categories, diversity in
camera views, varied environmental conditions, and variety of human subjects.
In this work, we introduce a large-scale dataset for RGB+D human action
recognition, which is collected from 106 distinct subjects and contains more
than 114 thousand video samples and 8 million frames. This dataset contains 120
different action classes including daily, mutual, and health-related
activities. We evaluate the performance of a series of existing 3D activity
analysis methods on this dataset, and show the advantage of applying deep
learning methods for 3D-based human action recognition. Furthermore, we
investigate a novel one-shot 3D activity recognition problem on our dataset,
and a simple yet effective Action-Part Semantic Relevance-aware (APSR)
framework is proposed for this task, which yields promising results for
recognition of the novel action classes. We believe the introduction of this
large-scale dataset will enable the community to apply, adapt, and develop
various data-hungry learning techniques for depth-based and RGB+D-based human
activity understanding. [The dataset is available at:
http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Knowledge Graphs (KGs) play a pivotal role in advancing various AI
applications, with the semantic web community's exploration into multi-modal
dimensions unlocking new avenues for innovation. In this survey, we carefully
review over 300 articles, focusing on KG-aware research in two principal
aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal
tasks, and Multi-Modal Knowledge Graph (MM4KG), which extends KG studies into
the MMKG realm. We begin by defining KGs and MMKGs, then explore their
construction progress. Our review includes two primary task categories:
KG-aware multi-modal learning tasks, such as Image Classification and Visual
Question Answering, and intrinsic MMKG tasks like Multi-modal Knowledge Graph
Completion and Entity Alignment, highlighting specific research trajectories.
For most of these tasks, we provide definitions, evaluation benchmarks, and
additionally outline essential insights for conducting relevant research.
Finally, we discuss current challenges and identify emerging trends, such as
progress in Large Language Modeling and Multi-modal Pre-training strategies.
This survey aims to serve as a comprehensive reference for researchers already
involved in or considering delving into KG and multi-modal learning research,
offering insights into the evolving landscape of MMKG research and supporting
future work.Comment: Ongoing work; 41 pages (Main Text), 55 pages (Total), 11 Tables, 13
Figures, 619 citations; Paper list is available at
https://github.com/zjukg/KG-MM-Surve
A Survey of Quantum-Cognitively Inspired Sentiment Analysis Models
Quantum theory, originally proposed as a physical theory to describe the
motions of microscopic particles, has been applied to various non-physics
domains involving human cognition and decision-making that are inherently
uncertain and exhibit certain non-classical, quantum-like characteristics.
Sentiment analysis is a typical example of such domains. In the last few years,
by leveraging the modeling power of quantum probability (a non-classical
probability stemming from quantum mechanics methodology) and deep neural
networks, a range of novel quantum-cognitively inspired models for sentiment
analysis have emerged and performed well. This survey presents a timely
overview of the latest developments in this fascinating cross-disciplinary
area. We first provide a background of quantum probability and quantum
cognition at a theoretical level, analyzing their advantages over classical
theories in modeling the cognitive aspects of sentiment analysis. Then, recent
quantum-cognitively inspired models are introduced and discussed in detail,
focusing on how they approach the key challenges of the sentiment analysis
task. Finally, we discuss the limitations of the current research and highlight
future research directions
Recommended from our members
Exploiting Concepts In Videos For Video Event Detection
Video event detection is the task of searching videos for events of interest to a user where an event is a complex activity which is localized in time and space. The video event detection problem has gained more importance as the amount of online video is increasing by more than 300 hours every minute on Youtube alone.
In this thesis, we tackle three major video event detection problems: video event detection with exemplars (VED-ex), where a large number of example videos are associated with queries; video event detection with few exemplars (VED-ex_few), in which only a small number of example videos are associated with queries; and zero-shot video event detection (VED-zero), where no exemplar videos are associated with queries.
We first define a new way of describing videos concisely, one that is built around using query-independent concepts (e.g., a fixed set of concepts for all queries) with a space-efficient representation. Using query-independent concepts enables us to learn a retrieval model for any query without requiring a new set of concepts. Our space-efficient representation helps reduce the amount of time required to train/test a retrieval model and the amount of space to store video representations on disk.
When the number of example videos associated with a query decreases, the retrieval accuracy decreases as well. We present a method that incorporates multiple one-exemplar models into video event detection aiming at improving retrieval accuracies when there are few exemplars available. By incorporating multiple one-exemplar models into video event detection with few exemplars, we are able to obtain significant improvements in terms of mean average precision compared to the case of a monolithic model.
Having no exemplar videos associated with queries makes the video event detection problem more challenging as we cannot train a retrieval model using example videos. It is also more realistic since compiling a number of example videos might be costly. We tackle this problem by providing a new and effective zero-shot video event detection model that exploits dependencies of concepts in videos. Our dependency work uses a Markov Random Field (MRF) based retrieval model and assumes three dependency settings: 1) full independence, where each concept is considered independently; 2) spatial dependence, where the co-occurrence of two concepts in the same video frame is treated as important; and 3) temporal dependence, where having concepts co-occur in consecutive frames is treated as important. Our MRF based retrieval model improves retrieval accuracies significantly compared to the common bag-of-concepts approach with an independence assumption
- …