Search CORE

4 research outputs found

Model and Feature Selection in Hidden Conditional Random Fields with Group Regularization

Author: A. Krogh
A. Quattoni
C. Bishop
C. Zhu
H. Bauschke
J. Huang
L. Gorelick
L. Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Proceedings of: 8th International Conference on Hybrid Artificial Intelligence Systems (HAIS 2013). Salamanca, September 11-13, 2013.Sequence classification is an important problem in computer vision, speech analysis or computational biology. This paper presents a new training strategy for the Hidden Conditional Random Field sequence classifier incorporating model and feature selection. The standard Lasso regularization employed in the estimation of model parameters is replaced by overlapping group-L1 regularization. Depending on the configuration of the overlapping groups, model selection, feature selection,or both are performed. The sequence classifiers trained in this way have better predictive performance. The application of the proposed method in a human action recognition task confirms that fact.This work was supported in part by Projects MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS (S2009/TIC-1485)Publicad

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

Author: Damen Dima
Huh Jaesung
Kazakos Vangelis
Nagrani Arsha
Zisserman Andrew
Publication venue
Publication date: 01/11/2021
Field of study

In egocentric videos, actions occur in quick succession. We capitalise on the action's temporal context and propose a method that learns to attend to surrounding actions in order to improve recognition performance. To incorporate the temporal context, we propose a transformer-based multimodal model that ingests video and audio as input modalities, with an explicit language model providing action sequence context to enhance the predictions. We test our approach on EPIC-KITCHENS and EGTEA datasets reporting state-of-the-art performance. Our ablations showcase the advantage of utilising temporal context as well as incorporating audio input modality and language model to rescore predictions. Code and models at: https://github.com/ekazakos/MTCN.Comment: Accepted at BMVC 202

arXiv.org e-Print Archive

Oxford University Research Archive

Explore Bristol Research

Audio-Visual Egocentric Action Recognition

Author: Kazakos Evangelos
Publication venue
Publication date: 21/06/2022
Field of study

Explore Bristol Research