Search CORE

5,417 research outputs found

Understand video event by exploiting semantic and temporal information for classification and retrieval

Author: Li Chao
Publication venue: 'University of Queensland Library'
Publication date: 21/05/2018
Field of study

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

Author: Duan Ling-Yu
Kot Alex C.
Liu Jun
Perez Mauricio
Shahroudy Amir
Wang Gang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/06/2019
Field of study

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

arXiv.org e-Print Archive

Chalmers Research

Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model

Author: Gong SG
Hospedales TM
Li J
Xiang T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2011
Field of study

Crossref

Queen Mary Research Online

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Author: Chen Huajun
Chen Jiaoyan
Chen Xiang
Chen Zhuo
Fang Yin
Geng Yuxia
Guo Lingbing
Li Jiaqi
Li Qian
Liu Xiaoze
Pan Jeff Z.
Zhang Ningyu
Zhang Wen
Zhang Yichi
Zhu Yushan
Publication venue
Publication date: 26/02/2024
Field of study

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the semantic web community's exploration into multi-modal dimensions unlocking new avenues for innovation. In this survey, we carefully review over 300 articles, focusing on KG-aware research in two principal aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal tasks, and Multi-Modal Knowledge Graph (MM4KG), which extends KG studies into the MMKG realm. We begin by defining KGs and MMKGs, then explore their construction progress. Our review includes two primary task categories: KG-aware multi-modal learning tasks, such as Image Classification and Visual Question Answering, and intrinsic MMKG tasks like Multi-modal Knowledge Graph Completion and Entity Alignment, highlighting specific research trajectories. For most of these tasks, we provide definitions, evaluation benchmarks, and additionally outline essential insights for conducting relevant research. Finally, we discuss current challenges and identify emerging trends, such as progress in Large Language Modeling and Multi-modal Pre-training strategies. This survey aims to serve as a comprehensive reference for researchers already involved in or considering delving into KG and multi-modal learning research, offering insights into the evolving landscape of MMKG research and supporting future work.Comment: Ongoing work; 41 pages (Main Text), 55 pages (Total), 11 Tables, 13 Figures, 619 citations; Paper list is available at https://github.com/zjukg/KG-MM-Surve

arXiv.org e-Print Archive

A Survey of Quantum-Cognitively Inspired Sentiment Analysis Models

Author: Li Qiuchi
Liu Yaochen
Song Dawei
Wang Benyou
Zhang Yazhou
Publication venue
Publication date: 06/06/2023
Field of study

Quantum theory, originally proposed as a physical theory to describe the motions of microscopic particles, has been applied to various non-physics domains involving human cognition and decision-making that are inherently uncertain and exhibit certain non-classical, quantum-like characteristics. Sentiment analysis is a typical example of such domains. In the last few years, by leveraging the modeling power of quantum probability (a non-classical probability stemming from quantum mechanics methodology) and deep neural networks, a range of novel quantum-cognitively inspired models for sentiment analysis have emerged and performed well. This survey presents a timely overview of the latest developments in this fascinating cross-disciplinary area. We first provide a background of quantum probability and quantum cognition at a theoretical level, analyzing their advantages over classical theories in modeling the cognitive aspects of sentiment analysis. Then, recent quantum-cognitively inspired models are introduced and discussed in detail, focusing on how they approach the key challenges of the sentiment analysis task. Finally, we discuss the limitations of the current research and highlight future research directions

arXiv.org e-Print Archive

Recommended from our members

Exploiting Concepts In Videos For Video Event Detection

Author: Can Ethem
Publication venue: ScholarWorks@UMass Amherst
Publication date: 09/11/2015
Field of study

Video event detection is the task of searching videos for events of interest to a user where an event is a complex activity which is localized in time and space. The video event detection problem has gained more importance as the amount of online video is increasing by more than 300 hours every minute on Youtube alone. In this thesis, we tackle three major video event detection problems: video event detection with exemplars (VED-ex), where a large number of example videos are associated with queries; video event detection with few exemplars (VED-ex_few), in which only a small number of example videos are associated with queries; and zero-shot video event detection (VED-zero), where no exemplar videos are associated with queries. We first define a new way of describing videos concisely, one that is built around using query-independent concepts (e.g., a fixed set of concepts for all queries) with a space-efficient representation. Using query-independent concepts enables us to learn a retrieval model for any query without requiring a new set of concepts. Our space-efficient representation helps reduce the amount of time required to train/test a retrieval model and the amount of space to store video representations on disk. When the number of example videos associated with a query decreases, the retrieval accuracy decreases as well. We present a method that incorporates multiple one-exemplar models into video event detection aiming at improving retrieval accuracies when there are few exemplars available. By incorporating multiple one-exemplar models into video event detection with few exemplars, we are able to obtain significant improvements in terms of mean average precision compared to the case of a monolithic model. Having no exemplar videos associated with queries makes the video event detection problem more challenging as we cannot train a retrieval model using example videos. It is also more realistic since compiling a number of example videos might be costly. We tackle this problem by providing a new and effective zero-shot video event detection model that exploits dependencies of concepts in videos. Our dependency work uses a Markov Random Field (MRF) based retrieval model and assumes three dependency settings: 1) full independence, where each concept is considered independently; 2) spatial dependence, where the co-occurrence of two concepts in the same video frame is treated as important; and 3) temporal dependence, where having concepts co-occur in consecutive frames is treated as important. Our MRF based retrieval model improves retrieval accuracies significantly compared to the common bag-of-concepts approach with an independence assumption

ScholarWorks@UMass Amherst