2,898 research outputs found
Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
We introduce a data-driven approach to complete partial 3D shapes through a
combination of volumetric deep neural networks and 3D shape synthesis. From a
partially-scanned input shape, our method first infers a low-resolution -- but
complete -- output. To this end, we introduce a 3D-Encoder-Predictor Network
(3D-EPN) which is composed of 3D convolutional layers. The network is trained
to predict and fill in missing data, and operates on an implicit surface
representation that encodes both known and unknown space. This allows us to
predict global structure in unknown areas at high accuracy. We then correlate
these intermediary results with 3D geometry from a shape database at test time.
In a final pass, we propose a patch-based 3D shape synthesis method that
imposes the 3D geometry from these retrieved shapes as constraints on the
coarsely-completed mesh. This synthesis process enables us to reconstruct
fine-scale detail and generate high-resolution output while respecting the
global mesh structure obtained by the 3D-EPN. Although our 3D-EPN outperforms
state-of-the-art completion method, the main contribution in our work lies in
the combination of a data-driven shape predictor and analytic 3D shape
synthesis. In our results, we show extensive evaluations on a newly-introduced
shape completion benchmark for both real-world and synthetic data
A Pose-Sensitive Embedding for Person Re-Identification with Expanded Cross Neighborhood Re-Ranking
Person re identification is a challenging retrieval task that requires
matching a person's acquired image across non overlapping camera views. In this
paper we propose an effective approach that incorporates both the fine and
coarse pose information of the person to learn a discriminative embedding. In
contrast to the recent direction of explicitly modeling body parts or
correcting for misalignment based on these, we show that a rather
straightforward inclusion of acquired camera view and/or the detected joint
locations into a convolutional neural network helps to learn a very effective
representation. To increase retrieval performance, re-ranking techniques based
on computed distances have recently gained much attention. We propose a new
unsupervised and automatic re-ranking framework that achieves state-of-the-art
re-ranking performance. We show that in contrast to the current
state-of-the-art re-ranking methods our approach does not require to compute
new rank lists for each image pair (e.g., based on reciprocal neighbors) and
performs well by using simple direct rank list based comparison or even by just
using the already computed euclidean distances between the images. We show that
both our learned representation and our re-ranking method achieve
state-of-the-art performance on a number of challenging surveillance image and
video datasets.
The code is available online at:
https://github.com/pse-ecn/pose-sensitive-embeddingComment: CVPR 2018: v2 (fixes, added new results on PRW dataset
Multi-Scale Colour Completed Local Binary Patterns for Scene and Event Sport Image Categorisation
The Local Binary Pattern (LBP) texture descriptor and some of its variant descriptors have been successfully used for texture classification and for a few other tasks such as face recognition, facial expression, and texture segmentation. However, these descriptors have been barely used for image categorisation because their calculations are based on the gray image and they are only invariant to monotonic light variations on the gray level. These descriptors ignore colour information despite their key role in distinguishing the objects and the natural scenes. In this paper, we enhance the Completed Local Binary Pattern (CLBP), an LBP variant with an impressive performance on texture classification. We propose five multiscale colour CLBP (CCLBP) descriptors by incorporating five different colour information into the original CLBP. By using the Oliva and Torralba (OT8) and Event sport datasets, our results attest to the superiority of the proposed CCLBP descriptors over the original CLBP in terms of image categorisation
Contextual Bag-Of-Visual-Words and ECOC-Rank for Retrieval and Multi-class Object Recognition
Projecte Final de Màster UPC realitzat en col.laboració amb Dept. Matemàtica Aplicada i Anàlisi, Universitat de BarcelonaMulti-class object categorization is an important line of research in Computer Vision
and Pattern Recognition fields. An artificial intelligent system is able to interact with its environment if it is able to distinguish among a set of cases, instances, situations, objects, etc. The World is inherently multi-class, and thus, the eficiency
of a system can be determined by its accuracy discriminating among a set of cases.
A recently applied procedure in the literature is the Bag-Of-Visual-Words (BOVW).
This methodology is based on the natural language processing theory, where a set of
sentences are defined based on word frequencies. Analogy, in the pattern recognition
domain, an object is described based on the frequency of its parts appearance.
However, a general drawback of this method is that the dictionary construction
does not take into account geometrical information about object parts. In order to
include parts relations in the BOVW model, we propose the Contextual BOVW
(C-BOVW), where the dictionary construction is guided by a geometricaly-based
merging procedure. As a result, objects are described as sentences where geometrical
information is implicitly considered.
In order to extend the proposed system to the multi-class case, we used the
Error-Correcting Output Codes framework (ECOC). State-of-the-art multi-class
techniques are frequently defined as an ensemble of binary classifiers. In this sense, the ECOC framework, based on error-correcting principles, showed to be a powerful tool, being able to classify a huge number of classes at the same time that corrects classification errors produced by the individual learners.
In our case, the C-BOVW sentences are learnt by means of an ECOC configuration, obtaining high discriminative power. Moreover, we used the ECOC outputs obtained by the new methodology to rank classes. In some situations, more than
one label is required to work with multiple hypothesis and find similar cases, such
as in the well-known retrieval problems. In this sense, we also included contextual
and semantic information to modify the ECOC outputs and defined an ECOC-rank methodology. Altering the ECOC output values by means of the adjacency of
classes based on features and classes relations based on ontologies, we also reporteda significant improvement in class-retrieval problems
PRSNet: A Masked Self-Supervised Learning Pedestrian Re-Identification Method
In recent years, self-supervised learning has attracted widespread academic
debate and addressed many of the key issues of computer vision. The present
research focus is on how to construct a good agent task that allows for
improved network learning of advanced semantic information on images so that
model reasoning is accelerated during pre-training of the current task. In
order to solve the problem that existing feature extraction networks are
pre-trained on the ImageNet dataset and cannot extract the fine-grained
information in pedestrian images well, and the existing pre-task of contrast
self-supervised learning may destroy the original properties of pedestrian
images, this paper designs a pre-task of mask reconstruction to obtain a
pre-training model with strong robustness and uses it for the pedestrian
re-identification task. The training optimization of the network is performed
by improving the triplet loss based on the centroid, and the mask image is
added as an additional sample to the loss calculation, so that the network can
better cope with the pedestrian matching in practical applications after the
training is completed. This method achieves about 5% higher mAP on Marker1501
and CUHK03 data than existing self-supervised learning pedestrian
re-identification methods, and about 1% higher for Rank1, and ablation
experiments are conducted to demonstrate the feasibility of this method. Our
model code is located at https://github.com/ZJieX/prsnet
- …