Search CORE

9,692 research outputs found

TRECVid 2011 Experiments at Dublin City University

Author: Foley Colum
Guo Jinlin
Gurrin Cathal
Hopfgartner Frank
Scott David
Smeaton Alan F.
Publication venue
Publication date: 01/01/2012
Field of study

This year the iAd-DCU team participated in three of the assigned TRECVid 2011 tasks; Semantic Indexing (SIN), Interactive Known-Item Search (KIS) and Multimedia Event Detection (MED). For the SIN task we presented three full runs using global features, local features and fusion of global, local features and relationships between concepts respectively. The evaluation results show that local features achieve better performance, with marginal gains found when introducing global features and relationships between concepts. With regard to our KIS submission, similar to our 2010 KIS experiments, we have implemented an iPad interface to a KIS video search tool. The aim of this year’s experimentation was to evaluate different display methodologies for KIS interaction. For this work, we integrate a clustering element for keyframes, which operates over MPEG-7 features using k-means clustering. In addition, we employ concept detection, not simply for search, but as a means of choosing most representative keyframes for ranked items. For our experiments we compare the baseline non-clustering system to a clustering system on a topic by topic basis. Finally, for the first time this year the iAd group at DCU has been involved in the MED Task. Two techniques are compared, employing low-level features directly and using concepts as intermediate representations. Evaluation results show promising initial results when performing event detection using concepts as intermediate representations

Irish Universities

DCU Online Research Access Service

Enlighten

DAP3D-Net: Where, What and How Actions Occur in Videos?

Author: Liu Li
Shao Ling
Zhou Yi
Publication venue
Publication date: 10/02/2016
Field of study

Action parsing in videos with complex scenes is an interesting but challenging task in computer vision. In this paper, we propose a generic 3D convolutional neural network in a multi-task learning manner for effective Deep Action Parsing (DAP3D-Net) in videos. Particularly, in the training phase, action localization, classification and attributes learning can be jointly optimized on our appearancemotion data via DAP3D-Net. For an upcoming test video, we can describe each individual action in the video simultaneously as: Where the action occurs, What the action is and How the action is performed. To well demonstrate the effectiveness of the proposed DAP3D-Net, we also contribute a new Numerous-category Aligned Synthetic Action dataset, i.e., NASA, which consists of 200; 000 action clips of more than 300 categories and with 33 pre-defined action attributes in two hierarchical levels (i.e., low-level attributes of basic body part movements and high-level attributes related to action motion). We learn DAP3D-Net using the NASA dataset and then evaluate it on our collected Human Action Understanding (HAU) dataset. Experimental results show that our approach can accurately localize, categorize and describe multiple actions in realistic videos

arXiv.org e-Print Archive

Crossref

Exemplar Based Deep Discriminative and Shareable Feature Learning for Scene Image Classification

Author: Shuai Bing
Wang Gang
Yang Qingxiong
Zhao Lifan
Zuo Zhen
Publication venue
Publication date: 01/01/2015
Field of study

In order to encode the class correlation and class specific information in image representation, we propose a new local feature learning approach named Deep Discriminative and Shareable Feature Learning (DDSFL). DDSFL aims to hierarchically learn feature transformation filter banks to transform raw pixel image patches to features. The learned filter banks are expected to: (1) encode common visual patterns of a flexible number of categories; (2) encode discriminative information; and (3) hierarchically extract patterns at different visual levels. Particularly, in each single layer of DDSFL, shareable filters are jointly learned for classes which share the similar patterns. Discriminative power of the filters is achieved by enforcing the features from the same category to be close, while features from different categories to be far away from each other. Furthermore, we also propose two exemplar selection methods to iteratively select training data for more efficient and effective learning. Based on the experimental results, DDSFL can achieve very promising performance, and it also shows great complementary effect to the state-of-the-art Caffe features.Comment: Pattern Recognition, Elsevier, 201

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Learning to Select Pre-Trained Deep Representations with Bayesian Evidence Framework

Author: Choi Seungjin
Han Bohyung
Jang Taewoong
Kim Yong-Deok
Publication venue
Publication date: 24/04/2016
Field of study

We propose a Bayesian evidence framework to facilitate transfer learning from pre-trained deep convolutional neural networks (CNNs). Our framework is formulated on top of a least squares SVM (LS-SVM) classifier, which is simple and fast in both training and testing, and achieves competitive performance in practice. The regularization parameters in LS-SVM is estimated automatically without grid search and cross-validation by maximizing evidence, which is a useful measure to select the best performing CNN out of multiple candidates for transfer learning; the evidence is optimized efficiently by employing Aitken's delta-squared process, which accelerates convergence of fixed point update. The proposed Bayesian evidence framework also provides a good solution to identify the best ensemble of heterogeneous CNNs through a greedy algorithm. Our Bayesian evidence framework for transfer learning is tested on 12 visual recognition datasets and illustrates the state-of-the-art performance consistently in terms of prediction accuracy and modeling efficiency.Comment: Appearing in CVPR-2016 (oral presentation

arXiv.org e-Print Archive

Crossref

포항공과대학교

LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning

Author: A Chan
A. Bruderlin
B Solmaz
B Ulicny
B Zhou
D Helbing
F Lamarche
F Zhu
G Antonini
G Le Bon
Hans J. Eysenck
J Barraquand
J James
J van den Berg
J Xu
K Zhang
KK Reddy
L Pervin
Mehdi Moussaïd
R Geraerts
S Ali
S Ali
S Curtis
T Li
X Song
X Wang
Y Tsuduki
Publication venue
Publication date: 04/07/2016
Field of study

We present a novel procedural framework to generate an arbitrary number of labeled crowd videos (LCrowdV). The resulting crowd video datasets are used to design accurate algorithms or training models for crowded scene understanding. Our overall approach is composed of two components: a procedural simulation framework for generating crowd movements and behaviors, and a procedural rendering framework to generate different videos or images. Each video or image is automatically labeled based on the environment, number of pedestrians, density, behavior, flow, lighting conditions, viewpoint, noise, etc. Furthermore, we can increase the realism by combining synthetically-generated behaviors with real-world background videos. We demonstrate the benefits of LCrowdV over prior lableled crowd datasets by improving the accuracy of pedestrian detection and crowd behavior classification algorithms. LCrowdV would be released on the WWW

arXiv.org e-Print Archive

Crossref