Search CORE

6,576 research outputs found

Detecting carried objects in short video sequences

Author: Damen D
Hogg DC
Publication venue
Publication date: 01/01/2008
Field of study

DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition

Author: Damen Dima
Perrett Toby
Publication venue
Publication date: 18/04/2019
Field of study

Domain alignment in convolutional networks aims to learn the degree of layer-specific feature alignment beneficial to the joint learning of source and target datasets. While increasingly popular in convolutional networks, there have been no previous attempts to achieve domain alignment in recurrent networks. Similar to spatial features, both source and target domains are likely to exhibit temporal dependencies that can be jointly learnt and aligned. In this paper we introduce Dual-Domain LSTM (DDLSTM), an architecture that is able to learn temporal dependencies from two domains concurrently. It performs cross-contaminated batch normalisation on both input-to-hidden and hidden-to-hidden weights, and learns the parameters for cross-contamination, for both single-layer and multi-layer LSTM architectures. We evaluate DDLSTM on frame-level action recognition using three datasets, taking a pair at a time, and report an average increase in accuracy of 3.5%. The proposed DDLSTM architecture outperforms standard, fine-tuned, and batch-normalised LSTMs.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

Explore Bristol Research

Attribute Multiset Grammars for Global Explanations of Activities

Author: Damen Dima
Hogg David
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2009
Field of study

Crossref

Explore Bristol Research

Demand Patterns and Employment Structures an Aggregate Analysis

Author: Joep Damen
Ronald Schettkat
Publication venue
Publication date
Field of study

Research Papers in Economics

Action Recognition from Single Timestamp Supervision in Untrimmed Videos

Author: Damen Dima
Fidler Sanja
Moltisanti Davide
Publication venue
Publication date: 09/04/2019
Field of study

Recognising actions in videos relies on labelled supervision during training, typically the start and end times of each action instance. This supervision is not only subjective, but also expensive to acquire. Weak video-level supervision has been successfully exploited for recognition in untrimmed videos, however it is challenged when the number of different actions in training videos increases. We propose a method that is supervised by single timestamps located around each action instance, in untrimmed videos. We replace expensive action bounds with sampling distributions initialised from these timestamps. We then use the classifier's response to iteratively update the sampling distributions. We demonstrate that these distributions converge to the location and extent of discriminative action segments. We evaluate our method on three datasets for fine-grained recognition, with increasing number of different actions per video, and show that single timestamps offer a reasonable compromise between recognition performance and labelling effort, performing comparably to full temporal supervision. Our update method improves top-1 test accuracy by up to 5.4%. across the evaluated datasets.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination

Author: Damen Dima
Doughty Hazel
Mayol-Cuevas Walterio
Publication venue
Publication date: 29/03/2018
Field of study

We present a method for assessing skill from video, applicable to a variety of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate the problem as pairwise (who's better?) and overall (who's best?) ranking of video collections, using supervised deep ranking. We propose a novel loss function that learns discriminative features when a pair of videos exhibit variance in skill, and learns shared features when a pair of videos exhibit comparable skill levels. Results demonstrate our method is applicable across tasks, with the percentage of correctly ordered pairs of videos ranging from 70% to 83% for four datasets. We demonstrate the robustness of our approach via sensitivity analysis of its parameters. We see this work as effort toward the automated organization of how-to video collections and overall, generic skill determination in video.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

Explore Bristol Research