Search CORE

7 research outputs found

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

Author: Graves A.
Kiros R.
Krizhevsky A.
Over P.
Simonyan K.
Srivastava N.
Sutskever I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/08/2015
Field of study

We address the problem of fine-grained action localization from temporally untrimmed web videos. We assume that only weak video-level annotations are available for training. The goal is to use these weak labels to identify temporal segments corresponding to the actions, and learn models that generalize to unconstrained web videos. We find that web images queried by action names serve as well-localized highlights for many actions, but are noisily labeled. To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output. This is achieved by cross-domain transfer between video frames and web images, using pre-trained deep convolutional neural networks. We then use the localized action frames to train action recognition models with long short-term memory networks. We collect a fine-grained sports action data set FGA-240 of more than 130,000 YouTube videos. It has 240 fine-grained actions under 85 sports activities. Convincing results are shown on the FGA-240 data set, as well as the THUMOS 2014 localization data set with untrimmed training videos.Comment: Camera ready version for ACM Multimedia 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Learning to recommend descriptive tags for questions in social forums

Author: Chang Pi-Chuan
Huang Yuchi
Jialie Shen
Liqiang Nie
Mori Yasuhide Hironobu
Sood Sanjay
Tat-Seng Chua
Wu Wei
Xiangyu Wang
Xu Zhichen
Yi-Liang Zhao
Zhou Dengyong
Zhou Dengyong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

10.1145/2559157ACM Transactions on Information Systems321-ATIS

Crossref

Institutional Knowledge at Singapore Management University

ScholarBank@NUS

The Effects of Multiple Query Evidences on Social Image Retrieval

Author: CHENG Zhiyong
MIAO Haiyan
SHEN Jialie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2016
Field of study

Institutional Knowledge at Singapore Management University

VIRAL TOPIC PREDICTION AND DESCRIPTION IN MICROBLOG SOCIAL NETWORKS

Author: BIAN JINGWEN
Publication venue
Publication date: 21/08/2015
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Entity Extraction from Unstructured Data on the Web

Author: Huynh Tan Dat
Publication venue: 'University of Queensland Library'
Publication date: 19/12/2014
Field of study

A large number of web pages contain information about entities in lists where the lists are represented in textual form. Textual lists contain implicit records of entities. However, the field values of such records cannot easily be separated or extracted by automatic processes. This, therefore, remains a challenging research problem in the literature. Previous studies in the literature relied mainly on probabilistic graph-based models to capture the attributes and the likely structures of implicit records in a list. However, one of the important limitations of existing methods is that the structures of the records in input lists were implicitly encoded via training data which was manually created. This thesis aims to investigate novel techniques to acquire automatically information about entities from implicit records embedded in textual lists on the web. This thesis introduces a self-supervised learning framework which exploits both existing data in a knowledge base and the structural similarity between sequences in lists to build an extraction model automatically. In the proposed framework, initial labels for candidate field values are created and assigned to generate label sequences. Then, the structure of implici

CiteSeerX

University of Queensland eSpace

Transfer tagging from image to video

Author: Huang Zi
Shen Heng Tao
Yang Yang
Yang Yi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Nowadays massive amount of web video datum has been emerging on the Internet. To achieve an effective and efficient video retrieval, it is critical to automatically assign semantic keywords to the videos via content analysis. However, most of the existing video tagging methods suffer from the problem of lacking sufficient tagged training videos due to high labor cost of manual tagging. Inspired by the observation that there are much more well-labeled data in other yet relevant types of media (e.g. images), in this paper we study how to build a "cross-media tunnel" to transfer external tag knowledge from image to video. Meanwhile, the intrinsic data structures of both image and video spaces are well explored for inferring tags. We propose a Cross-Media Tag Transfer (CMTT) paradigm which is able to: 1) transfer tag knowledge between image and video by minimizing their distribution difference; 2) infer tags by revealing the underlying manifold structures embedded within both image and video spaces. We also learn an explicit mapping function to handle unseen videos. Experimental results have been reported and analyzed to illustrate the superiority of our proposal. Copyright 2011 ACM

OPUS - University of Technology Sydney

University of Queensland eSpace

Effective transfer tagging from image to video

Author: Shen HT
Yang Y
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2013
Field of study

Recent years have witnessed a great explosion of user-generated videos on the Web. In order to achieve an effective and efficient video search, it is critical for modern video search engines to associate videos with semantic keywords automatically. Most of the existing video tagging methods can hardly achieve reliable performance due to deficiency of training data. It is noticed that abundant well-tagged data are available in other relevant types of media (e.g., images). In this article, we propose a novel video tagging framework, termed as Cross-Media Tag Transfer (CMTT), which utilizes the abundance of well-tagged images to facilitate video tagging. Specifically, we build a cross-media tunnel to transfer knowledge from images to videos. To this end, an optimal kernel space, in which distribution distance between images and video is minimized, is found to tackle the domainshift problem. A novel cross-media video tagging model is proposed to infer tags by exploring the intrinsic local structures of both labeled and unlabeled data, and learn reliable video classifiers. An efficient algorithm is designed to optimize the proposed model in an iterative and alternative way. Extensive experiments illustrate the superiority of our proposal compared to the state-of-the-art algorithms. © 2013 ACM

OPUS - University of Technology Sydney