Search CORE

3,849 research outputs found

Two-Stream Convolutional Networks for Action Recognition in Videos

Author: Simonyan Karen
Zisserman Andrew
Publication venue
Publication date: 12/11/2014
Field of study

We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework. Our contribution is three-fold. First, we propose a two-stream ConvNet architecture which incorporates spatial and temporal networks. Second, we demonstrate that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Finally, we show that multi-task learning, applied to two different action classification datasets, can be used to increase the amount of training data and improve the performance on both. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art. It also exceeds by a large margin previous attempts to use deep nets for video classification

arXiv.org e-Print Archive

Oxford University Research Archive

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Author: Simonyan Karen
Vedaldi Andrea
Zisserman Andrew
Publication venue
Publication date: 01/01/2014
Field of study

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets). We consider two visualisation techniques, based on computing the gradient of the class score with respect to the input image. The first one generates an image, which maximises the class score [Erhan et al., 2009], thus visualising the notion of the class, captured by a ConvNet. The second technique computes a class saliency map, specific to a given image and class. We show that such maps can be employed for weakly supervised object segmentation using classification ConvNets. Finally, we establish the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks [Zeiler et al., 2013]

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Learning Local Feature Aggregation Functions with Backpropagation

Author: csurka
krizhevsky
liu
moosmann
perronnin
simonyan
soomro
Publication venue
Publication date: 26/06/2017
Field of study

This paper introduces a family of local feature aggregation functions and a novel method to estimate their parameters, such that they generate optimal representations for classification (or any task that can be expressed as a cost function minimization problem). To achieve that, we compose the local feature aggregation function with the classifier cost function and we backpropagate the gradient of this cost function in order to update the local feature aggregation function parameters. Experiments on synthetic datasets indicate that our method discovers parameters that model the class-relevant information in addition to the local feature space. Further experiments on a variety of motion and visual descriptors, both on image and video datasets, show that our method outperforms other state-of-the-art local feature aggregation functions, such as Bag of Words, Fisher Vectors and VLAD, by a large margin.Comment: In Proceedings of the 25th European Signal Processing Conference (EUSIPCO 2017

arXiv.org e-Print Archive

Crossref

PHT-bot: Deep-Learning based system for automatic risk stratification of COPD patients based upon signs of Pulmonary Hypertension

Author: Krizhevsky
Laserson
Ronneberger
Shin
Simonyan
Wang
Xie
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 28/05/2019
Field of study

Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbidity and mortality worldwide. Identifying those at highest risk of deterioration would allow more effective distribution of preventative and surveillance resources. Secondary pulmonary hypertension is a manifestation of advanced COPD, which can be reliably diagnosed by the main Pulmonary Artery (PA) to Ascending Aorta (Ao) ratio. In effect, a PA diameter to Ao diameter ratio of greater than 1 has been demonstrated to be a reliable marker of increased pulmonary arterial pressure. Although clinically valuable and readily visualized, the manual assessment of the PA and the Ao diameters is time consuming and under-reported. The present study describes a non invasive method to measure the diameters of both the Ao and the PA from contrast-enhanced chest Computed Tomography (CT). The solution applies deep learning techniques in order to select the correct axial slice to measure, and to segment both arteries. The system achieves test Pearson correlation coefficient scores of 93% for the Ao and 92% for the PA. To the best of our knowledge, it is the first such fully automated solution

arXiv.org e-Print Archive

Crossref

EmbraceNet for Activity: A Deep Multimodal Fusion Architecture for Activity Recognition

Author: Abadi Martín
Diederik
Ngiam Jiquan
Simonyan Karen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/04/2020
Field of study

Human activity recognition using multiple sensors is a challenging but promising task in recent decades. In this paper, we propose a deep multimodal fusion model for activity recognition based on the recently proposed feature fusion architecture named EmbraceNet. Our model processes each sensor data independently, combines the features with the EmbraceNet architecture, and post-processes the fused feature to predict the activity. In addition, we propose additional processes to boost the performance of our model. We submit the results obtained from our proposed model to the SHL recognition challenge with the team name "Yonsei-MCML."Comment: Accepted in HASCA at ACM UbiComp/ISWC 2019, won the 2nd place in the SHL Recognition Challenge 201

arXiv.org e-Print Archive

Crossref

Deep interpretable architecture for plant diseases classification

Author: hughes
ronneberger
simonyan
simonyan
springenberg
zeiler
Publication venue
Publication date: 13/06/2019
Field of study

Recently, many works have been inspired by the success of deep learning in computer vision for plant diseases classification. Unfortunately, these end-to-end deep classifiers lack transparency which can limit their adoption in practice. In this paper, we propose a new trainable visualization method for plant diseases classification based on a Convolutional Neural Network (CNN) architecture composed of two deep classifiers. The first one is named Teacher and the second one Student. This architecture leverages the multitask learning to train the Teacher and the Student jointly. Then, the communicated representation between the Teacher and the Student is used as a proxy to visualize the most important image regions for classification. This new architecture produces sharper visualization than the existing methods in plant diseases context. All experiments are achieved on PlantVillage dataset that contains 54306 plant images.Comment: 10 pages, 8 figures, Submitted to Signal Processing Algorithms, Architectures, Arrangements and Applications (SPA2019), https://github.com/Tahedi1/Teacher_Student_Architectur

arXiv.org e-Print Archive

Crossref

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Author: Jaderberg Max
Simonyan Karen
Vedaldi Andrea
Zisserman Andrew
Publication venue
Publication date: 01/01/2014
Field of study

In this work we present a framework for the recognition of natural scene text. Our framework does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past. The deep neural network models at the centre of this framework are trained solely on data produced by a synthetic text generation engine -- synthetic data that is highly realistic and sufficient to replace real data, giving us infinite amounts of training data. This excess of data exposes new possibilities for word recognition models, and here we consider three models, each one "reading" words in a different way: via 90k-way dictionary encoding, character sequence encoding, and bag-of-N-grams encoding. In the scenarios of language based and completely unconstrained text recognition we greatly improve upon state-of-the-art performance on standard datasets, using our fast, simple machinery and requiring zero data-acquisition costs

arXiv.org e-Print Archive

Oxford University Research Archive