Search CORE

5 research outputs found

Boosted Multiple Kernel Learning for First-Person Activity Recognition

Author: Arabaci Mehmet Ali
Ozkan Fatih
Surer Elif
Temizel Alptekin
Publication venue
Publication date: 05/06/2017
Field of study

Activity recognition from first-person (ego-centric) videos has recently gained attention due to the increasing ubiquity of the wearable cameras. There has been a surge of efforts adapting existing feature descriptors and designing new descriptors for the first-person videos. An effective activity recognition system requires selection and use of complementary features and appropriate kernels for each feature. In this study, we propose a data-driven framework for first-person activity recognition which effectively selects and combines features and their respective kernels during the training. Our experimental results show that use of Multiple Kernel Learning (MKL) and Boosted MKL in first-person activity recognition problem exhibits improved results in comparison to the state-of-the-art. In addition, these techniques enable the expansion of the framework with new features in an efficient and convenient way.Comment: First published in the Proceedings of the 25th European Signal Processing Conference (EUSIPCO-2017) in 2017, published by EURASI

arXiv.org e-Print Archive

Crossref

OpenMETU (Middle East Technical University)

Semi-Supervised First-Person Activity Recognition in Body-Worn Video

Author: Akar Osman
Bertozzi Andrea L.
Brantingham P. Jeffrey
Chen Honglin
Dhillon Adam
Haberland Matt
Li Hao
Song Alexander
Zhou Tiankuang
Publication venue
Publication date: 18/04/2019
Field of study

Body-worn cameras are now commonly used for logging daily life, sports, and law enforcement activities, creating a large volume of archived footage. This paper studies the problem of classifying frames of footage according to the activity of the camera-wearer with an emphasis on application to real-world police body-worn video. Real-world datasets pose a different set of challenges from existing egocentric vision datasets: the amount of footage of different activities is unbalanced, the data contains personally identifiable information, and in practice it is difficult to provide substantial training footage for a supervised approach. We address these challenges by extracting features based exclusively on motion information then segmenting the video footage using a semi-supervised classification algorithm. On publicly available datasets, our method achieves results comparable to, if not better than, supervised and/or deep learning methods using a fraction of the training data. It also shows promising results on real-world police body-worn video

arXiv.org e-Print Archive

eScholarship - University of California

Multi-modal Egocentric Activity Recognition using Audio-Visual Features

Author: Arabacı Mehmet Ali
Jančovič Peter
Surer Elif
Temizel Alptekin
Özkan Fatih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/04/2020
Field of study

Egocentric activity recognition in first-person videos has an increasing importance with a variety of applications such as lifelogging, summarization, assisted-living and activity tracking. Existing methods for this task are based on interpretation of various sensor information using pre-determined weights for each feature. In this work, we propose a new framework for egocentric activity recognition problem based on combining audio-visual features with multi-kernel learning (MKL) and multi-kernel boosting (MKBoost). For that purpose, firstly grid optical-flow, virtual-inertia feature, log-covariance, cuboid are extracted from the video. The audio signal is characterized using a "supervector", obtained based on Gaussian mixture modelling of frame-level features, followed by a maximum a-posteriori adaptation. Then, the extracted multi-modal features are adaptively fused by MKL classifiers in which both the feature and kernel selection/weighing and recognition tasks are performed together. The proposed framework was evaluated on a number of egocentric datasets. The results showed that using multi-modal features with MKL outperforms the existing methods

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

Author: Lanz Oswald
Sudhakaran Swathikiran
Publication venue
Publication date: 19/09/2017
Field of study

We present a novel deep learning approach for addressing the problem of interaction recognition from a first person perspective. The approach uses a pair of convolutional neural networks, whose parameters are shared, for extracting frame level features from successive frames of the video. The frame level features are then aggregated using a convolutional long short-term memory. The final hidden state of the convolutional long short-term memory is used for classification in to the respective categories. In our network the spatio-temporal structure of the input is preserved till the very final processing stage. Experimental results show that our method outperforms the state of the art on most recent first person interactions datasets that involve complex ego-motion. On UTKinect, it competes with methods that use depth image and skeletal joints information along with RGB images, while it surpasses previous methods that use only RGB images by more than 20% in recognition accuracy

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler