Search CORE

68 research outputs found

Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model

Author: Gong SG
Hospedales TM
Li J
Xiang T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2011
Field of study

Crossref

Queen Mary Research Online

Transductive Multi-View Zero-Shot Learning

Author: Fu Y
Gong S
Hospedales TM
Xiang T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/03/2015
Field of study

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queen Mary Research Online

Weakly Supervised Learning of Objects, Attributes and Their Associations

Author: Hospedales TM
Shi Z
Xiang T
Yang Y
Publication venue
Publication date: 01/01/2014
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-10605-2_31]”

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Surrey

Queen Mary Research Online

Surrey Research Insight

Learning Multimodal Latent Attributes

Author: Fu Y
Gong S
Hospedales TM
Xiang T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2014
Field of study

Abstract—The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multi-modal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we (1) introduce a concept of semi-latent attribute space, expressing user-defined and latent attributes in a unified framework, and (2) propose a novel scalable probabilistic topic model for learning multi-modal semi-latent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multi-task learning, learning with label noise, N-shot transfer learning and importantly zero-shot learning

CiteSeerX

Queen Mary Research Online

Finding Rare Classes: Active Learning with Generative and Discriminative Models

Author: Gong S
Hospedales TM
Xiang T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/11/2016
Field of study

Crossref

Queen Mary Research Online

Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images

Author: Hospedales TM
Shi Z
Xiang T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2015
Field of study

Abstract—We address the problem of localisation of objects as bounding boxes in images and videos with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. In this paper, a novel framework based on Bayesian joint topic modelling is proposed, which differs significantly from the existing ones in that: (1) All foreground object classes are modelled jointly in a single generative model that encodes multiple object co-existence so that “explaining away ” inference can resolve ambiguity and lead to better learning and localisation. (2) Image backgrounds are shared across classes to better learn varying surroundings and “push out ” objects of interest. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Moreover, the Bayesian formulation enables the exploitation of various types of prior knowledge to compensate for the limited supervision offered by weakly labelled data, as well as Bayesian domain adaptation for transfer learning. Extensive experiments on the PASCAL VOC, ImageNet and YouTube-Object videos datasets demonstrate the effectiveness of our Bayesian joint model for weakly supervised object localisation

CiteSeerX

Crossref

Queen Mary Research Online

Transferring a Semantic Representation for Person Re-Identification and Search

Author: Hospedales TM
IEEE
Shi Z
Xiang T
Publication venue
Publication date: 08/12/2015
Field of study

Queen Mary Research Online

SEMANTIC EMBEDDING SPACE FOR ZERO-SHOT ACTION RECOGNITION

Author: Gong S
Hospedales T
IEEE
Xu X
Publication venue
Publication date: 15/03/2016
Field of study

Queen Mary Research Online

Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation

Author: Fu Y
Fu Z
Gong S
Hospedales TM
Xiang T
Publication venue
Publication date: 01/01/2014
Field of study

Abstract. Most existing zero-shot learning approaches exploit transfer learning via an intermediate-level semantic representation such as visual attributes or semantic word vectors. Such a semantic representation is shared between an annotated auxiliary dataset and a target dataset with no annotation. A projection from a low-level feature space to the seman-tic space is learned from the auxiliary dataset and is applied without adaptation to the target dataset. In this paper we identify an inher-ent limitation with this approach. That is, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift prob-lem and propose a novel framework, transductive multi-view embedding, to solve it. It is ‘transductive ’ in that unlabelled target data points are explored for projection adaptation, and ‘multi-view ’ in that both low-level feature (view) and multiple semantic representations (views) are embedded to rectify the projection shift. We demonstrate through ex-tensive experiments that our framework (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complemen-tarity of multiple semantic representations, (3) achieves state-of-the-art recognition results on image and video benchmark datasets, and (4) en-ables novel cross-view annotation tasks.

CiteSeerX

Crossref

Queen Mary Research Online

Semantic Regularisation for Recurrent Image Annotation

Author: Hospedales Timothy
Liu F.
Sun C.
Xiang T.
Yang W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/11/2016
Field of study

The "CNN-RNN" design pattern is increasingly widely applied in a variety of image annotation tasks including multi-label classification and captioning. Existing models use the weakly semantic CNN hidden layer or its transform as the image embedding that provides the interface between the CNN and RNN. This leaves the RNN overstretched with two jobs: predicting the visual concepts and modelling their correlations for generating structured annotation output. Importantly this makes the end-to-end training of the CNN and RNN slow and ineffective due to the difficulty of back propagating gradients through the RNN to train the CNN. We propose a simple modification to the design pattern that makes learning more effective and efficient. Specifically, we propose to use a semantically regularised embedding layer as the interface between the CNN and RNN. Regularising the interface can partially or completely decouple the learning problems, allowing each to be more effectively trained and jointly training much more efficient. Extensive experiments show that state-of-the art performance is achieved on multi-label classification as well as image captioning

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer