424,501 research outputs found
Representation Learning by Learning to Count
We introduce a novel method for representation learning that uses an
artificial supervision signal based on counting visual primitives. This
supervision signal is obtained from an equivariance relation, which does not
require any manual annotation. We relate transformations of images to
transformations of the representations. More specifically, we look for the
representation that satisfies such relation rather than the transformations
that match a given representation. In this paper, we use two image
transformations in the context of counting: scaling and tiling. The first
transformation exploits the fact that the number of visual primitives should be
invariant to scale. The second transformation allows us to equate the total
number of visual primitives in each tile to that in the whole image. These two
transformations are combined in one constraint and used to train a neural
network with a contrastive loss. The proposed task produces representations
that perform on par or exceed the state of the art in transfer learning
benchmarks.Comment: ICCV 2017(oral
Learning to count with deep object features
Learning to count is a learning strategy that has been recently proposed in
the literature for dealing with problems where estimating the number of object
instances in a scene is the final objective. In this framework, the task of
learning to detect and localize individual object instances is seen as a harder
task that can be evaded by casting the problem as that of computing a
regression value from hand-crafted image features. In this paper we explore the
features that are learned when training a counting convolutional neural network
in order to understand their underlying representation. To this end we define a
counting problem for MNIST data and show that the internal representation of
the network is able to classify digits in spite of the fact that no direct
supervision was provided for them during training. We also present preliminary
results about a deep network that is able to count the number of pedestrians in
a scene.Comment: This paper has been accepted at Deep Vision Workshop at CVPR 201
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras
In this paper, we develop deep spatio-temporal neural networks to
sequentially count vehicles from low quality videos captured by city cameras
(citycams). Citycam videos have low resolution, low frame rate, high occlusion
and large perspective, making most existing methods lose their efficacy. To
overcome limitations of existing methods and incorporate the temporal
information of traffic video, we design a novel FCN-rLSTM network to jointly
estimate vehicle density and vehicle count by connecting fully convolutional
neural networks (FCN) with long short term memory networks (LSTM) in a residual
learning fashion. Such design leverages the strengths of FCN for pixel-level
prediction and the strengths of LSTM for learning complex temporal dynamics.
The residual learning connection reformulates the vehicle count regression as
learning residual functions with reference to the sum of densities in each
frame, which significantly accelerates the training of networks. To preserve
feature map resolution, we propose a Hyper-Atrous combination to integrate
atrous convolution in FCN and combine feature maps of different convolution
layers. FCN-rLSTM enables refined feature representation and a novel end-to-end
trainable mapping from pixels to vehicle count. We extensively evaluated the
proposed method on different counting tasks with three datasets, with
experimental results demonstrating their effectiveness and robustness. In
particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21
on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process
is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201
Recommended from our members
Speaker recognition with hybrid features from a deep belief network
Learning representation from audio data has shown advantages over the handcrafted features such as mel-frequency cepstral coefficients (MFCCs) in many audio applications. In most of the representation learning approaches, the connectionist systems have been used to learn and extract latent features from the fixed length data. In this paper, we propose an approach to combine the learned features and the MFCC features for speaker recognition task, which can be applied to audio scripts of different lengths. In particular, we study the use of features from different levels of deep belief network for quantizing the audio data into vectors of audio word counts. These vectors represent the audio scripts of different lengths that make them easier to train a classifier. We show in the experiment that the audio word count vectors generated from mixture of DBN features at different layers give better performance than the MFCC features. We also can achieve further improvement by combining the audio word count vector and the MFCC features
Learning Contact-Rich Manipulation Skills with Guided Policy Search
Autonomous learning of object manipulation skills can enable robots to
acquire rich behavioral repertoires that scale to the variety of objects found
in the real world. However, current motion skill learning methods typically
restrict the behavior to a compact, low-dimensional representation, limiting
its expressiveness and generality. In this paper, we extend a recently
developed policy search method \cite{la-lnnpg-14} and use it to learn a range
of dynamic manipulation behaviors with highly general policy representations,
without using known models or example demonstrations. Our approach learns a set
of trajectories for the desired motion skill by using iteratively refitted
time-varying linear models, and then unifies these trajectories into a single
control policy that can generalize to new situations. To enable this method to
run on a real robot, we introduce several improvements that reduce the sample
count and automate parameter selection. We show that our method can acquire
fast, fluent behaviors after only minutes of interaction time, and can learn
robust controllers for complex tasks, including putting together a toy
airplane, stacking tight-fitting lego blocks, placing wooden rings onto
tight-fitting pegs, inserting a shoe tree into a shoe, and screwing bottle caps
onto bottles
Classification of ductile cast iron specimens: A machine learning approach
In this paper an automatic procedure based on a machine learning approach is proposed to classify ductile cast iron specimens according to the American Society for Testing and Materials guidelines. The mechanical properties of a specimen are strongly influenced by the peculiar morphology of their graphite elements and useful characteristics, the features, are extracted from the specimens’ images; these characteristics examine the shape, the distribution and the size of the graphite particle in the specimen, the nodularity and the nodule count. The principal components analysis are used to provide a more efficient representation of these data. Support vector machines are trained to obtain a classification of the data by yielding sequential binary classification steps. Numerical analysis is performed on a significant number of images providing robust results, also in presence of dust, scratches and measurement noise
The impact of polyhedron learning assisted by Edpuzzle in improving students' mathematical representation
The ability that students must have is the ability of mathematical representation. This research aims to determine if eighth-grade students at Al-Qona'ah Islamic Junior High School improve their mathematical representation skills after using the Edpuzzle application to learn how to construct polyhedrons. It was a quantitative research design using the Quasi Experiment research method with control and experimental classes. This research's population comprised 60 students (23 males and 37 females). After conducting research, it was determined that students' mathematical representation abilities did not improve with the aid of the Edpuzzle application on the polyhedron due to the need for more learning support facilities when utilizing the Edpuzzle application. This research is supported by the t-test, which yielded t-count t-table or -29.2936 -1.67155 so that H0 is accepted. The results of the n-gain calculation yielded a total n-gain score of 0.03, or the n-gain was low, indicating was no progress in learning using the Edpuzzle application on the polyhedron. This study did not have an impact on increasing students' mathematical representation
- …