217 research outputs found
Joint calibration of Ensemble of Exemplar SVMs
We present a method for calibrating the Ensemble of Exemplar SVMs model.
Unlike the standard approach, which calibrates each SVM independently, our
method optimizes their joint performance as an ensemble. We formulate joint
calibration as a constrained optimization problem and devise an efficient
optimization algorithm to find its global optimum. The algorithm dynamically
discards parts of the solution space that cannot contain the optimum early on,
making the optimization computationally feasible. We experiment with EE-SVM
trained on state-of-the-art CNN descriptors. Results on the ILSVRC 2014 and
PASCAL VOC 2007 datasets show that (i) our joint calibration procedure
outperforms independent calibration on the task of classifying windows as
belonging to an object class or not; and (ii) this improved window classifier
leads to better performance on the object detection task
Indexing ensembles of exemplar-SVMs with rejecting taxonomies
Ensembles of Exemplar-SVMs have been used for a wide variety of tasks, such as object detection, segmentation, label transfer and mid-level feature learning. In order to make this technique effective though a large collection of classifiers is needed, which often makes the evaluation phase prohibitive. To overcome this issue we exploit the joint distribution of exemplar classifier scores to build a taxonomy capable of indexing each Exemplar-SVM and enabling a fast evaluation of the whole ensemble. We experiment with the Pascal 2007 benchmark on the task of object detection and on a simple segmentation task, in order to verify the robustness of our indexing data structure with reference to the standard Ensemble. We also introduce a rejection strategy to discard not relevant image patches for a more efficient access to the data
Clothing Co-Parsing by Joint Image Segmentation and Labeling
This paper aims at developing an integrated system of clothing co-parsing, in
order to jointly parse a set of clothing images (unsegmented but annotated with
tags) into semantic configurations. We propose a data-driven framework
consisting of two phases of inference. The first phase, referred as "image
co-segmentation", iterates to extract consistent regions on images and jointly
refines the regions over all images by employing the exemplar-SVM (E-SVM)
technique [23]. In the second phase (i.e. "region co-labeling"), we construct a
multi-image graphical model by taking the segmented regions as vertices, and
incorporate several contexts of clothing configuration (e.g., item location and
mutual interactions). The joint label assignment can be solved using the
efficient Graph Cuts algorithm. In addition to evaluate our framework on the
Fashionista dataset [30], we construct a dataset called CCP consisting of 2098
high-resolution street fashion photos to demonstrate the performance of our
system. We achieve 90.29% / 88.23% segmentation accuracy and 65.52% / 63.89%
recognition rate on the Fashionista and the CCP datasets, respectively, which
are superior compared with state-of-the-art methods.Comment: 8 pages, 5 figures, CVPR 201
Advances in detecting object classes and their semantic parts
Object classes are central to computer vision and have been the focus of substantial
research in the last fifteen years. This thesis addresses the tasks of localizing entire
objects in images (object class detection) and localizing their semantic parts (part detection).
We present four contributions, two for each task. The first two improve
existing object class detection techniques by using context and calibration. The other
two contributions explore semantic part detection in weakly-supervised settings.
First, the thesis presents a technique for predicting properties of objects in an image
based on its global appearance only. We demonstrate the method by predicting three
properties: aspect of appearance, location in the image and class membership. Overall,
the technique makes multi-component object detectors faster and improves their
performance.
The second contribution is a method for calibrating the popular Ensemble of Exemplar-
SVM object detector. Unlike the standard approach, which calibrates each Exemplar-
SVM independently, our technique optimizes their joint performance as an ensemble.
We devise an efficient optimization algorithm to find the global optimal solution of the
calibration problem. This leads to better object detection performance compared to
using independent calibration.
The third innovation is a technique to train part-based model of object classes using
data sourced from the web. We learn rich models incrementally. Our models encompass
the appearance of parts and their spatial arrangement on the object, specific to
each viewpoint. Importantly, it does not require any part location annotation, which is
one of the main limits to training many part detectors.
Finally, the last contribution is a study on whether semantic object parts emerge in
Convolutional Neural Networks trained for higher-level tasks, such as image classification.
While previous efforts studied this matter by visual inspection only, we perform
an extensive quantitative analysis based on ground-truth part location annotations. This
provides a more conclusive answer to the question
Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views
This paper presents an end-to-end convolutional neural network (CNN) for
2D-3D exemplar detection. We demonstrate that the ability to adapt the features
of natural images to better align with those of CAD rendered views is critical
to the success of our technique. We show that the adaptation can be learned by
compositing rendered views of textured object models on natural images. Our
approach can be naturally incorporated into a CNN detection pipeline and
extends the accuracy and speed benefits from recent advances in deep learning
to 2D-3D exemplar detection. We applied our method to two tasks: instance
detection, where we evaluated on the IKEA dataset, and object category
detection, where we out-perform Aubry et al. for "chair" detection on a subset
of the Pascal VOC dataset.Comment: To appear in CVPR 201
Advances in Monocular Exemplar-based Human Body Pose Analysis: Modeling, Detection and Tracking
Esta tesis contribuye en el análisis de la postura del cuerpo humano a partir de secuencias de imágenes adquiridas con una sola cámara. Esta temática presenta un amplio rango de potenciales aplicaciones en video-vigilancia, video-juegos o aplicaciones biomédicas. Las técnicas basadas en patrones han tenido éxito, sin embargo, su precisión depende de la similitud del punto de vista de la cámara y de las propiedades de la escena entre las imágenes de entrenamiento y las de prueba. Teniendo en cuenta un conjunto de datos de entrenamiento capturado mediante un número reducido de cámaras fijas, paralelas al suelo, se han identificado y analizado tres escenarios posibles con creciente nivel de dificultad: 1) una cámara estática paralela al suelo, 2) una cámara de vigilancia fija con un ángulo de visión considerablemente diferente, y 3) una secuencia de video capturada con una cámara en movimiento o simplemente una sola imagen estática
Exemplar codes for facial attributes and tattoo recognition
Abstract When implementing real-world computer vision systems, researchers can use mid-level representations as a tool to adjust the trade-off between accuracy and efficiency. Unfortunately, existing mid-level representations that improve accuracy tend to decrease efficiency, or are specifically tailored to work well within one pipeline or vision problem at the exclusion of others. We introduce a novel, efficient mid-level representation that improves classification efficiency without sacrificing accuracy. Our Exemplar Codes are based on linear classifiers and probability normalization from extreme value theory. We apply Exemplar Codes to two problems: facial attribute extraction and tattoo classification. In these settings, our Exemplar Codes are competitive with the state of the art and offer efficiency benefits, making it possible to achieve high accuracy even on commodity hardware with a low computational budget
MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes
Attribute recognition, particularly facial, extracts many labels for each
image. While some multi-task vision problems can be decomposed into separate
tasks and stages, e.g., training independent models for each task, for a
growing set of problems joint optimization across all tasks has been shown to
improve performance. We show that for deep convolutional neural network (DCNN)
facial attribute extraction, multi-task optimization is better. Unfortunately,
it can be difficult to apply joint optimization to DCNNs when training data is
imbalanced, and re-balancing multi-label data directly is structurally
infeasible, since adding/removing data to balance one label will change the
sampling of the other labels. This paper addresses the multi-label imbalance
problem by introducing a novel mixed objective optimization network (MOON) with
a loss function that mixes multiple task objectives with domain adaptive
re-weighting of propagated loss. Experiments demonstrate that not only does
MOON advance the state of the art in facial attribute recognition, but it also
outperforms independently trained DCNNs using the same data. When using facial
attributes for the LFW face recognition task, we show that our balanced (domain
adapted) network outperforms the unbalanced trained network.Comment: Post-print of manuscript accepted to the European Conference on
Computer Vision (ECCV) 2016
http://link.springer.com/chapter/10.1007%2F978-3-319-46454-1_
- …