Search CORE

4,930 research outputs found

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

Author: Graves A.
Kiros R.
Krizhevsky A.
Over P.
Simonyan K.
Srivastava N.
Sutskever I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/08/2015
Field of study

We address the problem of fine-grained action localization from temporally untrimmed web videos. We assume that only weak video-level annotations are available for training. The goal is to use these weak labels to identify temporal segments corresponding to the actions, and learn models that generalize to unconstrained web videos. We find that web images queried by action names serve as well-localized highlights for many actions, but are noisily labeled. To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output. This is achieved by cross-domain transfer between video frames and web images, using pre-trained deep convolutional neural networks. We then use the localized action frames to train action recognition models with long short-term memory networks. We collect a fine-grained sports action data set FGA-240 of more than 130,000 YouTube videos. It has 240 fine-grained actions under 85 sports activities. Convincing results are shown on the FGA-240 data set, as well as the THUMOS 2014 localization data set with untrimmed training videos.Comment: Camera ready version for ACM Multimedia 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Gaze Embeddings for Zero-Shot Image Classification

Author: Akata Zeynep
Bulling Andreas
Karessli Nour
Schiele Bernt
Publication venue
Publication date: 01/01/2017
Field of study

Zero-shot image classification using auxiliary information, such as attributes describing discriminative object properties, requires time-consuming annotation by domain experts. We instead propose a method that relies on human gaze as auxiliary information, exploiting that even non-expert users have a natural ability to judge class membership. We present a data collection paradigm that involves a discrimination task to increase the information content obtained from gaze data. Our method extracts discriminative descriptors from the data and learns a compatibility function between image and gaze using three novel gaze embeddings: Gaze Histograms (GH), Gaze Features with Grid (GFG) and Gaze Features with Sequence (GFS). We introduce two new gaze-annotated datasets for fine-grained image classification and show that human gaze data is indeed class discriminative, provides a competitive alternative to expert-annotated attributes, and outperforms other baselines for zero-shot image classification

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Compositional Model based Fisher Vector Coding for Image Classification

Author: Hengel Anton van den
Liu Lingqiao
Shen Chunhua
Shen Heng Tao
Wang Chao
Wang Lei
Wang Peng
Publication venue
Publication date: 01/01/2017
Field of study

Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to depict the generation process of local features. However, the representative power of the GMM could be limited because it essentially assumes that local features can be characterized by a fixed number of feature prototypes and the number of prototypes is usually small in FVC. To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions. Instead, we adopt a compositional mechanism which assumes that a local feature is drawn from a Gaussian distribution whose mean vector is composed as the linear combination of multiple key components and the combination weight is a latent random variable. In this way, we can greatly enhance the representative power of the generative model of FVC. To implement our idea, we designed two particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and Machine Intelligence (TPAMI

arXiv.org e-Print Archive

Adelaide Research & Scholarship

Research Online