965 research outputs found
Improving Bags-of-Words model for object categorization
In the past decade, Bags-of-Words (BOW) models have become popular for the task of object recognition, owing to their good performance and simplicity. Some of the most effective recent methods for computer-based object recognition work by detecting and extracting local image features, before quantizing them according to a codebook rule such as k-means clustering, and classifying these with conventional classifiers such as Support Vector Machines and Naive Bayes.
In this thesis, a Spatial Object Recognition Framework is presented that consists of the four main contributions of the research.
The first contribution, frequent keypoint pattern discovery, works by combining pairs and triplets of frequent keypoints in order to discover intermediate representations for object classes. Based on the same frequent keypoints principle, algorithms for locating the region-of-interest in training images is then discussed.
Extensions to the successful Spatial Pyramid Matching scheme, in order to better capture spatial relationships, are then proposed. The pairs frequency histogram and shapes frequency histogram work by capturing more redefined spatial information between local image features.
Finally, alternative techniques to Spatial Pyramid Matching for capturing spatial information are presented. The proposed techniques, variations of binned log-polar histograms, divides the image into grids of different scale and different orientation. Thus captures the distribution of image features both in distance and orientation explicitly.
Evaluations on the framework are focused on several recent and popular datasets, including image retrieval, object recognition, and object categorization. Overall, while the effectiveness of the framework is limited in some of the datasets, the proposed contributions are nevertheless powerful improvements of the BOW model
Structured learning of metric ensembles with application to person re-identification
Matching individuals across non-overlapping camera networks, known as person
re-identification, is a fundamentally challenging problem due to the large
visual appearance changes caused by variations of viewpoints, lighting, and
occlusion. Approaches in literature can be categoried into two streams: The
first stream is to develop reliable features against realistic conditions by
combining several visual features in a pre-defined way; the second stream is to
learn a metric from training data to ensure strong inter-class differences and
intra-class similarities. However, seeking an optimal combination of visual
features which is generic yet adaptive to different benchmarks is a unsoved
problem, and metric learning models easily get over-fitted due to the scarcity
of training data in person re-identification. In this paper, we propose two
effective structured learning based approaches which explore the adaptive
effects of visual features in recognizing persons in different benchmark data
sets. Our framework is built on the basis of multiple low-level visual features
with an optimal ensemble of their metrics. We formulate two optimization
algorithms, CMCtriplet and CMCstruct, which directly optimize evaluation
measures commonly used in person re-identification, also known as the
Cumulative Matching Characteristic (CMC) curve.Comment: 16 pages. Extended version of "Learning to Rank in Person
Re-Identification With Metric Ensembles", at
http://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Paisitkriangkrai_Learning_to_Rank_2015_CVPR_paper.html.
arXiv admin note: text overlap with arXiv:1503.0154
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
- …