Search CORE

194,573 research outputs found

Part-based Face Recognition with Vision Transformers

Author: British Machine Vision Conference
Sun Z
Tzimiropoulos G
Publication venue
Publication date: 21/11/2022
Field of study

Queen Mary Research Online

Video augmentation for improving audio speech recognition under noise

Author: British Machine Vision Conference (BMVC)
Cavallaro A
Gong S
Pachoud S
Publication venue
Publication date: 23/02/2015
Field of study

Queen Mary Research Online

Finding Directions in GAN’s Latent Space for Neural Face Reenactment

Author: Argyriou V
Bounareli S
British Machine Vision Conference
Tzimiropoulos G
Publication venue
Publication date: 21/11/2022
Field of study

Queen Mary Research Online

Prompting Visual-Language Models for Dynamic Facial Expression Recognition

Author: Patras I
The 34th British Machine Vision Conference
Zhao Z
Publication venue
Publication date: 01/01/2023
Field of study

This paper presents a novel visual-language model called DFER-CLIP, which is based on the CLIP model and designed for in-the-wild Dynamic Facial Expression Recognition (DFER). Specifically, the proposed DFER-CLIP consists of a visual part and a textual part. For the visual part, based on the CLIP image encoder, a temporal model consisting of several Transformer encoders is introduced for extracting temporal facial expression features, and the final feature embedding is obtained as a learnable "class" token. For the textual part, we use as inputs textual descriptions of the facial behaviour that is related to the classes (facial expressions) that we are interested in recognising – those descriptions are generated using large language models, like ChatGPT. This, in contrast to works that use only the class names and more accurately captures the relationship between them. Alongside the textual description, we introduce a learnable token which helps the model learn relevant context information for each expression during training. Extensive experiments demonstrate the effectiveness of the proposed method and show that our DFER-CLIP also achieves state-of-the-art results compared with the current supervised DFER methods on the DFEW, FERV39k, and MAFW benchmarks

Queen Mary Research Online

Sketch-a-Net that Beats Humans

Author: British Machine Vision Conference
Hospedales T
Song Y
XIANG T
Yang Y
Yu Q
Publication venue
Publication date: 08/12/2015
Field of study

Queen Mary Research Online

TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition

Author: 30th British Machine Vision Conference
Bishay M
Patras I
Zoumpourlis G
Publication venue
Publication date: 09/09/2019
Field of study

In this paper we propose a novel Temporal Attentive Relation Network (TARN) for the problems of few-shot and zero-shot action recognition. At the heart of our network is a meta-learning approach that learns to compare representations of variable temporal length, that is, either two videos of different length (in the case of few-shot action recognition) or a video and a semantic representation such as word vector (in the case of zero-shot action recognition). By contrast to other works in few-shot and zero-shot action recognition, we a) utilise attention mechanisms so as to perform temporal alignment, and b) learn a deep-distance measure on the aligned representations at video segment level. We adopt an episode-based training scheme and train our network in an end-to-end manner. The proposed method does not require any fine-tuning in the target domain or maintaining additional representations as is the case of memory networks. Experimental results show that the proposed architecture outperforms the state of the art in few-shot action recognition, and achieves competitive results in zero-shot action recognition

Queen Mary Research Online

Gesture-based Object Recognition using Histograms of Guiding Strokes

Author: Bowden Richard
Collomosse John
Kopp Stefan
Mikolajczyk Krystian
Morency Louise-Philippe
Sadeghipour Amir
Publication venue: BMVA Press
Publication date: 01/01/2012
Field of study

Sadeghipour A, Morency L-P, Kopp S. Gesture-based Object Recognition using Histograms of Guiding Strokes. In: Bowden R, Collomosse J, Mikolajczyk K, eds. Proceedings of the British Machine Vision Conference. BMVA Press; 2012: 44.1-44.11

Publications at Bielefeld University

Learning Grimaces by Watching TV

Author: Albanie Samuel
Vedaldi Andrea
Publication venue
Publication date: 01/01/2016
Field of study

Differently from computer vision systems which require explicit supervision, humans can learn facial expressions by observing people in their environment. In this paper, we look at how similar capabilities could be developed in machine vision. As a starting point, we consider the problem of relating facial expressions to objectively measurable events occurring in videos. In particular, we consider a gameshow in which contestants play to win significant sums of money. We extract events affecting the game and corresponding facial expressions objectively and automatically from the videos, obtaining large quantities of labelled data for our study. We also develop, using benchmarks such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial expression recognition, showing that pre-training on face verification data can be highly beneficial for this task. Then, we extend these models to use facial expressions to predict events in videos and learn nameable expressions from them. The dataset and emotion recognition models are available at http://www.robots.ox.ac.uk/~vgg/data/facevalueComment: British Machine Vision Conference (BMVC) 201

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Using illumination estimated from silhouettes to carve surface details on visual hull

Author: Li S
Schnieders D
Wong KKY
Publication venue: British Machine Vision Association.
Publication date: 01/01/2008
Field of study

This paper deals with the problems of scene illumination estimation and shape recovery from an image sequence of a smooth textureless object. A novel method that exploits the surface points estimated from the silhouettes for recovering the scene illumination is introduced. Those surface points are acquired by a dual space approach and filtered according to their rank errors. Selected surface points allow a direct closed-form solution of illumination. In the mesh evolution step, an algorithm for optimizing the visual hull mesh is developed. It evolves the mesh by iteratively estimating both the surface normal and depth that maximize the photometric consistency across the sequence. Compared with previous work which optimizes the mesh by estimating the surface normal only, the proposed method shows better convergence and can recover better surface details, especially when concavities are deep and sharp.postprintThe British Machine Vision Conference (BMVC) 2008, Leeds, U.K., 1-4 September 2008. In Proceedings of the British Machine Vision Conference, 2008, v. 2, p. 895-90

HKU Scholars Hub

One-Shot Learning for Semantic Segmentation

Author: Bansal Shray
Boots Byron
Essa Irfan
Liu Zhen
Shaban Amirreza
Publication venue
Publication date: 01/01/2017
Field of study

Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3 times faster.Comment: To appear in the proceedings of the British Machine Vision Conference (BMVC) 2017. The code is available at https://github.com/lzzcd001/OSLS

arXiv.org e-Print Archive

Crossref