51 research outputs found
Exploiting Local Features from Deep Networks for Image Retrieval
Deep convolutional neural networks have been successfully applied to image
classification tasks. When these same networks have been applied to image
retrieval, the assumption has been made that the last layers would give the
best performance, as they do in classification. We show that for instance-level
image retrieval, lower layers often perform better than the last layers in
convolutional neural networks. We present an approach for extracting
convolutional features from different layers of the networks, and adopt VLAD
encoding to encode features into a single vector for each image. We investigate
the effect of different layers and scales of input images on the performance of
convolutional features using the recent deep networks OxfordNet and GoogLeNet.
Experiments demonstrate that intermediate layers or higher layers with finer
scales produce better results for image retrieval, compared to the last layer.
When using compressed 128-D VLAD descriptors, our method obtains
state-of-the-art results and outperforms other VLAD and CNN based approaches on
two out of three test datasets. Our work provides guidance for transferring
deep networks trained on image classification to image retrieval tasks.Comment: CVPR DeepVision Workshop 201
Video Understanding with Deep Networks
Video understanding is one of the fundamental problems in computer vision. Videos provide more information to the image recognition task by adding a temporal component through which motion and other information can be additionally used. Encouraged by the success of deep convolutional neural networks (CNNs) on image classification, we extend the deep convolutional networks to video understanding by modeling both spatial and temporal information.
To effectively utilize deep networks, we need a comprehensive understanding of convolutional neural networks. We first study the network on the domain of image retrieval. We show that for instance-level image retrieval, lower layers often perform better than the last layers in convolutional neural networks. We present an approach for extracting convolutional features from different layers of the networks and adopt VLAD encoding to encode features into a single vector for each image. Our work provides guidance for transferring deep convolutional networks to other tasks.
We then propose and evaluate several deep neural network architectures to combine image information across a video over longer time periods than previously attempted. We propose two methods capable of handling full length videos. The first method explores various convolutional temporal feature pooling architectures, examining the various design choices which need to be made when adapting a CNN for this task. The second proposed method explicitly models the video as an ordered sequence of frames. For this purpose, we employ a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN.
Next, we propose a multitask learning model ActionFlowNet to train a single stream network directly from raw pixels to jointly estimate optical flow while recognizing actions with convolutional neural networks, capturing both appearance and motion in a single model. Experiments show that our model effectively learns video representation from motion information on unlabeled videos.
While recent deep models for videos show improvement by incorporating optical flow or aggregating high-level appearance across frames, they focus on modeling either the long-term temporal relations or short-term motion. We propose Temporal Difference Networks (TDN) that model both long-term relations and short-term motion from videos. We leverage a simple but effective motion representation: difference of CNN features in our network and jointly modeling the motion at multiple scales in a single CNN
Beyond Short Snippets: Deep Networks for Video Classification
Convolutional neural networks (CNNs) have been extensively applied for image
recognition problems giving state-of-the-art results on recognition, detection,
segmentation and retrieval. In this work we propose and evaluate several deep
neural network architectures to combine image information across a video over
longer time periods than previously attempted. We propose two methods capable
of handling full length videos. The first method explores various convolutional
temporal feature pooling architectures, examining the various design choices
which need to be made when adapting a CNN for this task. The second proposed
method explicitly models the video as an ordered sequence of frames. For this
purpose we employ a recurrent neural network that uses Long Short-Term Memory
(LSTM) cells which are connected to the output of the underlying CNN. Our best
networks exhibit significant performance improvements over previously published
results on the Sports 1 million dataset (73.1% vs. 60.9%) and the UCF-101
datasets with (88.6% vs. 88.0%) and without additional optical flow information
(82.6% vs. 72.8%)
Modeling and analysis of a deep learning pipeline for cloud based video analytics.
Video analytics systems based on deep learning approaches are becoming the basis of many widespread applications including smart cities to aid people and traffic monitoring. These systems necessitate massive amounts of labeled data and training time to perform fine tuning of hyper-parameters for object classification. We propose a cloud based video analytics system built upon an optimally tuned deep learning model to classify objects from video streams. The tuning of the hyper-parameters including learning rate, momentum, activation function and optimization algorithm is optimized through a mathematical model for efficient analysis of video streams. The system is capable of enhancing its own training data by performing transformations including rotation, flip and skew on the input dataset making it more robust and self-adaptive. The use of in-memory distributed training mechanism rapidly incorporates large number of distinguishing features from the training dataset - enabling the system to perform object classification with least human assistance and external support. The validation of the system is performed by means of an object classification case-study using a dataset of 100GB in size comprising of 88,432 video frames on an 8 node cloud. The extensive experimentation reveals an accuracy and precision of 0.97 and 0.96 respectively after a training of 6.8 hours. The system is scalable, robust to classification errors and can be customized for any real-life situation.N/
Image Retrieval using Multi-scale CNN Features Pooling
In this paper, we address the problem of image retrieval by learning images
representation based on the activations of a Convolutional Neural Network. We
present an end-to-end trainable network architecture that exploits a novel
multi-scale local pooling based on NetVLAD and a triplet mining procedure based
on samples difficulty to obtain an effective image representation. Extensive
experiments show that our approach is able to reach state-of-the-art results on
three standard datasets.Comment: Accepted at ICMR 202
OpenContrails: Benchmarking Contrail Detection on GOES-16 ABI
Contrails (condensation trails) are line-shaped ice clouds caused by aircraft
and are likely the largest contributor of aviation-induced climate change.
Contrail avoidance is potentially an inexpensive way to significantly reduce
the climate impact of aviation. An automated contrail detection system is an
essential tool to develop and evaluate contrail avoidance systems. In this
paper, we present a human-labeled dataset named OpenContrails to train and
evaluate contrail detection models based on GOES-16 Advanced Baseline Imager
(ABI) data. We propose and evaluate a contrail detection model that
incorporates temporal context for improved detection accuracy. The human
labeled dataset and the contrail detection outputs are publicly available on
Google Cloud Storage at gs://goes_contrails_dataset
A scalable system to measure contrail formation on a per-flight basis
Persistent contrails make up a large fraction of aviation's contribution to
global warming. We describe a scalable, automated detection and matching (ADM)
system to determine from satellite data whether a flight has made a persistent
contrail. The ADM system compares flight segments to contrails detected by a
computer vision algorithm running on images from the GOES-16 Advanced Baseline
Imager. We develop a 'flight matching' algorithm and use it to label each
flight segment as a 'match' or 'non-match'. We perform this analysis on 1.6
million flight segments. The result is an analysis of which flights make
persistent contrails several orders of magnitude larger than any previous work.
We assess the agreement between our labels and available prediction models
based on weather forecasts. Shifting air traffic to avoid regions of contrail
formation has been proposed as a possible mitigation with the potential for
very low cost/ton-CO2e. Our findings suggest that imperfections in these
prediction models increase this cost/ton by about an order of magnitude.
Contrail avoidance is a cost-effective climate change mitigation even with this
factor taken into account, but our results quantify the need for more accurate
contrail prediction methods and establish a benchmark for future development.Comment: 25 pages, 6 figure
An Epidemiological Study of Concomitant Use of Chinese Medicine and Antipsychotics in Schizophrenic Patients: Implication for Herb-Drug Interaction
Background: Herb-drug interactions are an important issue in drug safety and clinical practice. The aim of this epidemiological study was to characterize associations of clinical outcomes with concomitant herbal and antipsychotic use in patients with schizophrenia. Methods and Findings: In this retrospective, cross-sectional study, 1795 patients with schizophrenia who were randomly selected from 17 psychiatric hospitals in China were interviewed face-to-face using a structured questionnaire. Association analyses were conducted to examine correlates between Chinese medicine (CM) use and demographic, clinical variables, antipsychotic medication mode, and clinical outcomes. The prevalence of concomitant CM and antipsychotic treatment was 36.4% [95% confidence interval (95% CI) 34.2%-38.6%]. Patients using concomitant CM had a significantly greater chance of improved outcomes than non-CM use (61.1% vs. 34.3%, OR = 3.44, 95% CI 2.80-4.24). However, a small but significant number of patients treated concomitantly with CM had a greater risk of developing worse outcomes (7.2% vs. 4.4%, OR = 2.06, 95% CI 2.06-4.83). Significant predictors for concomitant CM treatment-associated outcomes were residence in urban areas, paranoid psychosis, and exceeding 3 months of CM use. Herbal medicine regimens containing Radix Bupleuri, Fructus Gardenia, Fructus Schisandrae, Radix Rehmanniae, Akebia Caulis, and Semen Plantaginis in concomitant use with quetiapine, clozapine, and olanzepine were associated with nearly 60% of the risk of adverse outcomes. Conclusions: Concomitant herbal and antipsychotic treatment could produce either beneficial or adverse clinical effects in schizophrenic population. Potential herb-drug pharmacokinetic interactions need to be further evaluated. © 2011 Zhang et al.published_or_final_versio
- …