51 research outputs found
CNN Features off-the-shelf: an Astounding Baseline for Recognition
Recent results indicate that the generic descriptors extracted from the
convolutional neural networks are very powerful. This paper adds to the
mounting evidence that this is indeed the case. We report on a series of
experiments conducted for different recognition tasks using the publicly
available code and model of the \overfeat network which was trained to perform
object classification on ILSVRC13. We use features extracted from the \overfeat
network as a generic image representation to tackle the diverse range of
recognition tasks of object image classification, scene recognition, fine
grained recognition, attribute detection and image retrieval applied to a
diverse set of datasets. We selected these tasks and datasets as they gradually
move further away from the original task and data the \overfeat network was
trained to solve. Astonishingly, we report consistent superior results compared
to the highly tuned state-of-the-art systems in all the visual classification
tasks on various datasets. For instance retrieval it consistently outperforms
low memory footprint methods except for sculptures dataset. The results are
achieved using a linear SVM classifier (or distance in case of retrieval)
applied to a feature representation of size 4096 extracted from a layer in the
net. The representations are further modified using simple augmentation
techniques e.g. jittering. The results strongly suggest that features obtained
from deep learning with convolutional nets should be the primary candidate in
most visual recognition tasks.Comment: version 3 revisions: 1)Added results using feature processing and
data augmentation 2)Referring to most recent efforts of using CNN for
different visual recognition tasks 3) updated text/captio
Persistent Evidence of Local Image Properties in Generic ConvNets
Supervised training of a convolutional network for object classification
should make explicit any information related to the class of objects and
disregard any auxiliary information associated with the capture of the image or
the variation within the object class. Does this happen in practice? Although
this seems to pertain to the very final layers in the network, if we look at
earlier layers we find that this is not the case. Surprisingly, strong spatial
information is implicit. This paper addresses this, in particular, exploiting
the image representation at the first fully connected layer, i.e. the global
image descriptor which has been recently shown to be most effective in a range
of visual recognition tasks. We empirically demonstrate evidences for the
finding in the contexts of four different tasks: 2d landmark detection, 2d
object keypoints prediction, estimation of the RGB values of input image, and
recovery of semantic label of each pixel. We base our investigation on a simple
framework with ridge rigression commonly across these tasks, and show results
which all support our insight. Such spatial information can be used for
computing correspondence of landmarks to a good accuracy, but should
potentially be useful for improving the training of the convolutional nets for
classification purposes
Uniform Seismic Hazard Spectra of Sanandaj, Iran
This paper presents uniform seismic hazard spectra of Sanandaj city of Iran. Sanandaj is the administrative center of Kurdistan province, in which more than 500,000 people live in. A collected catalogue, containing both historical and instrumental events and covering the period from the 10th century BC to the year 2006, is used. Then seismic sources and The seismotectonic model of the considered region have been modeled within the radius of 200 km and recurrence relationship is established. After elimination of the aftershocks and foreshocks, the main earthquakes were taken into consideration to calculate the seismic parameters. For this purpose the method proposed by Kijko [2000] was employed considering uncertainty in magnitude and incomplete earthquake catalogue. Sanandaj and its vicinity has been meshed as an 8(vertical lines)* 10(horizontal lines) and the calculations were performed using Ambraseys and et al. [1996] attenuation relationship. These calculations have been performed by the Poisson distribution of four hazard levels. Seismic hazard assessment is then carried out for each grid point using SEISRISK ΙΙΙ [1987]. The evaluation of the probabilistic occurrence of earthquake for the specific area is shown by horizontal spectral acceleration maps with the probability of 2% and 10% occurrences in 50 years
Unsupervised Contact Learning for Humanoid Estimation and Control
This work presents a method for contact state estimation using fuzzy
clustering to learn contact probability for full, six-dimensional humanoid
contacts. The data required for training is solely from proprioceptive sensors
- endeffector contact wrench sensors and inertial measurement units (IMUs) -
and the method is completely unsupervised. The resulting cluster means are used
to efficiently compute the probability of contact in each of the six
endeffector degrees of freedom (DoFs) independently. This clustering-based
contact probability estimator is validated in a kinematics-based base state
estimator in a simulation environment with realistic added sensor noise for
locomotion over rough, low-friction terrain on which the robot is subject to
foot slip and rotation. The proposed base state estimator which utilizes these
six DoF contact probability estimates is shown to perform considerably better
than that which determines kinematic contact constraints purely based on
measured normal force.Comment: Submitted to the IEEE International Conference on Robotics and
Automation (ICRA) 201
ADSORPTION AND DECOMPOSITION OF HCOOH ON POTASSIUM-PROMOTED RH(111) SURFACES
Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks.QC 20150507. QC 20200701</p
Image Retrieval using Multi-scale CNN Features Pooling
In this paper, we address the problem of image retrieval by learning images
representation based on the activations of a Convolutional Neural Network. We
present an end-to-end trainable network architecture that exploits a novel
multi-scale local pooling based on NetVLAD and a triplet mining procedure based
on samples difficulty to obtain an effective image representation. Extensive
experiments show that our approach is able to reach state-of-the-art results on
three standard datasets.Comment: Accepted at ICMR 202
Nested Invariance Pooling and RBM Hashing for Image Instance Retrieval
The goal of this work is the computation of very compact binary hashes for image instance retrieval. Our approach has two novel contributions. The first one is Nested Invariance Pooling (NIP), a method inspired from i-theory, a mathematical theory for computing group invariant transformations with feed-forward neural networks. NIP is able to produce compact and well-performing descriptors with visual representations extracted from convolutional neural networks. We specifically incorporate scale, translation and rotation invariances but the scheme can be extended to any arbitrary sets of transformations. We also show that using moments of increasing order throughout nesting is important. The NIP descriptors are then hashed to the target code size (32-256 bits) with a Restricted Boltzmann Machine with a novel batch-level reg-ularization scheme specifically designed for the purpose of hashing (RBMH). A thorough empirical evaluation with state-of-the-art shows that the results obtained both with the NIP descriptors and the NIP+RBMH hashes are consistently outstanding across a wide range of datasets
- …