3,849 research outputs found
Two-Stream Convolutional Networks for Action Recognition in Videos
We investigate architectures of discriminatively trained deep Convolutional
Networks (ConvNets) for action recognition in video. The challenge is to
capture the complementary information on appearance from still frames and
motion between frames. We also aim to generalise the best performing
hand-crafted features within a data-driven learning framework.
Our contribution is three-fold. First, we propose a two-stream ConvNet
architecture which incorporates spatial and temporal networks. Second, we
demonstrate that a ConvNet trained on multi-frame dense optical flow is able to
achieve very good performance in spite of limited training data. Finally, we
show that multi-task learning, applied to two different action classification
datasets, can be used to increase the amount of training data and improve the
performance on both.
Our architecture is trained and evaluated on the standard video actions
benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of
the art. It also exceeds by a large margin previous attempts to use deep nets
for video classification
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
This paper addresses the visualisation of image classification models, learnt
using deep Convolutional Networks (ConvNets). We consider two visualisation
techniques, based on computing the gradient of the class score with respect to
the input image. The first one generates an image, which maximises the class
score [Erhan et al., 2009], thus visualising the notion of the class, captured
by a ConvNet. The second technique computes a class saliency map, specific to a
given image and class. We show that such maps can be employed for weakly
supervised object segmentation using classification ConvNets. Finally, we
establish the connection between the gradient-based ConvNet visualisation
methods and deconvolutional networks [Zeiler et al., 2013]
Learning Local Feature Aggregation Functions with Backpropagation
This paper introduces a family of local feature aggregation functions and a
novel method to estimate their parameters, such that they generate optimal
representations for classification (or any task that can be expressed as a cost
function minimization problem). To achieve that, we compose the local feature
aggregation function with the classifier cost function and we backpropagate the
gradient of this cost function in order to update the local feature aggregation
function parameters. Experiments on synthetic datasets indicate that our method
discovers parameters that model the class-relevant information in addition to
the local feature space. Further experiments on a variety of motion and visual
descriptors, both on image and video datasets, show that our method outperforms
other state-of-the-art local feature aggregation functions, such as Bag of
Words, Fisher Vectors and VLAD, by a large margin.Comment: In Proceedings of the 25th European Signal Processing Conference
(EUSIPCO 2017
PHT-bot: Deep-Learning based system for automatic risk stratification of COPD patients based upon signs of Pulmonary Hypertension
Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbidity
and mortality worldwide. Identifying those at highest risk of deterioration
would allow more effective distribution of preventative and surveillance
resources. Secondary pulmonary hypertension is a manifestation of advanced
COPD, which can be reliably diagnosed by the main Pulmonary Artery (PA) to
Ascending Aorta (Ao) ratio. In effect, a PA diameter to Ao diameter ratio of
greater than 1 has been demonstrated to be a reliable marker of increased
pulmonary arterial pressure. Although clinically valuable and readily
visualized, the manual assessment of the PA and the Ao diameters is time
consuming and under-reported. The present study describes a non invasive method
to measure the diameters of both the Ao and the PA from contrast-enhanced chest
Computed Tomography (CT). The solution applies deep learning techniques in
order to select the correct axial slice to measure, and to segment both
arteries. The system achieves test Pearson correlation coefficient scores of
93% for the Ao and 92% for the PA. To the best of our knowledge, it is the
first such fully automated solution
EmbraceNet for Activity: A Deep Multimodal Fusion Architecture for Activity Recognition
Human activity recognition using multiple sensors is a challenging but
promising task in recent decades. In this paper, we propose a deep multimodal
fusion model for activity recognition based on the recently proposed feature
fusion architecture named EmbraceNet. Our model processes each sensor data
independently, combines the features with the EmbraceNet architecture, and
post-processes the fused feature to predict the activity. In addition, we
propose additional processes to boost the performance of our model. We submit
the results obtained from our proposed model to the SHL recognition challenge
with the team name "Yonsei-MCML."Comment: Accepted in HASCA at ACM UbiComp/ISWC 2019, won the 2nd place in the
SHL Recognition Challenge 201
Deep interpretable architecture for plant diseases classification
Recently, many works have been inspired by the success of deep learning in
computer vision for plant diseases classification. Unfortunately, these
end-to-end deep classifiers lack transparency which can limit their adoption in
practice. In this paper, we propose a new trainable visualization method for
plant diseases classification based on a Convolutional Neural Network (CNN)
architecture composed of two deep classifiers. The first one is named Teacher
and the second one Student. This architecture leverages the multitask learning
to train the Teacher and the Student jointly. Then, the communicated
representation between the Teacher and the Student is used as a proxy to
visualize the most important image regions for classification. This new
architecture produces sharper visualization than the existing methods in plant
diseases context. All experiments are achieved on PlantVillage dataset that
contains 54306 plant images.Comment: 10 pages, 8 figures, Submitted to Signal Processing Algorithms,
Architectures, Arrangements and Applications (SPA2019),
https://github.com/Tahedi1/Teacher_Student_Architectur
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
In this work we present a framework for the recognition of natural scene
text. Our framework does not require any human-labelled data, and performs word
recognition on the whole image holistically, departing from the character based
recognition systems of the past. The deep neural network models at the centre
of this framework are trained solely on data produced by a synthetic text
generation engine -- synthetic data that is highly realistic and sufficient to
replace real data, giving us infinite amounts of training data. This excess of
data exposes new possibilities for word recognition models, and here we
consider three models, each one "reading" words in a different way: via 90k-way
dictionary encoding, character sequence encoding, and bag-of-N-grams encoding.
In the scenarios of language based and completely unconstrained text
recognition we greatly improve upon state-of-the-art performance on standard
datasets, using our fast, simple machinery and requiring zero data-acquisition
costs
- …