1,556 research outputs found
A region-centered topic model for object discovery and category-based image segmentation
Latent topic models have become a popular paradigm in many computer vision applications due to their capability to unsupervisely discover semantics in visual content. Relying on the Bag-of-Words representation, they consider images as mixtures of latent topics that generate visual words according to some specific distributions. However, the performance of these methods is still limited by the way in which they take into account the spatial distribution of visual words and, what is even more important, the currently used appearance distributions. In this paper, we propose a novel region-centered latent topic model that introduces two main contributions: first, an improved spatial context model that allows for considering inter-topic inter-region influences; and second, an advanced region-based appearance distribution built on the Kernel Logistic Regressor. It is worth highlighting that the proposed contributions have been seamlessly integrated in the model, so that all the parameters are concurrently estimated using a unified inference process. Furthermore, the proposed model has been extended to work in both unsupervised and supervised modes. Our results for unsupervised mode improve 30% those of previous latent topic models. For supervised mode, where discriminative approaches are preponderant, our results are quite close to those of discriminative state-of-the-art methods.This work has been partially supported by the project AFICUS, co-funded by the Spanish Ministry of Industry, Trade and Tourism, and the European Fund for Regional Development, with Ref.: TSI-020110-2009-103, and the National Grant TEC2011-26807 of the Spanish Ministry of Science and Innovation.Publicad
Clothing Co-Parsing by Joint Image Segmentation and Labeling
This paper aims at developing an integrated system of clothing co-parsing, in
order to jointly parse a set of clothing images (unsegmented but annotated with
tags) into semantic configurations. We propose a data-driven framework
consisting of two phases of inference. The first phase, referred as "image
co-segmentation", iterates to extract consistent regions on images and jointly
refines the regions over all images by employing the exemplar-SVM (E-SVM)
technique [23]. In the second phase (i.e. "region co-labeling"), we construct a
multi-image graphical model by taking the segmented regions as vertices, and
incorporate several contexts of clothing configuration (e.g., item location and
mutual interactions). The joint label assignment can be solved using the
efficient Graph Cuts algorithm. In addition to evaluate our framework on the
Fashionista dataset [30], we construct a dataset called CCP consisting of 2098
high-resolution street fashion photos to demonstrate the performance of our
system. We achieve 90.29% / 88.23% segmentation accuracy and 65.52% / 63.89%
recognition rate on the Fashionista and the CCP datasets, respectively, which
are superior compared with state-of-the-art methods.Comment: 8 pages, 5 figures, CVPR 201
Object Matching in Distributed Video Surveillance Systems by LDA-Based Appearance Descriptors
Establishing correspondences among object instances is still challenging in multi-camera surveillance systems, especially when the cameras’ fields of view are non-overlapping. Spatiotemporal constraints can help in solving the correspondence problem but still leave a wide margin of uncertainty. One way to reduce this uncertainty is to use appearance information about the moving objects in the site. In this paper we present the preliminary results of a new method that can capture salient appearance characteristics at each camera node in the network. A Latent Dirichlet Allocation (LDA) model is created and maintained at each node in the camera network. Each object is encoded in terms of the LDA bag-of-words model for appearance. The encoded appearance is then used to establish probable matching across cameras. Preliminary experiments are conducted on a dataset of 20 individuals and comparison against Madden’s I-MCHR is reported
Planogram Compliance Checking Based on Detection of Recurring Patterns
In this paper, a novel method for automatic planogram compliance checking in
retail chains is proposed without requiring product template images for
training. Product layout is extracted from an input image by means of
unsupervised recurring pattern detection and matched via graph matching with
the expected product layout specified by a planogram to measure the level of
compliance. A divide and conquer strategy is employed to improve the speed.
Specifically, the input image is divided into several regions based on the
planogram. Recurring patterns are detected in each region respectively and then
merged together to estimate the product layout. Experimental results on real
data have verified the efficacy of the proposed method. Compared with a
template-based method, higher accuracies are achieved by the proposed method
over a wide range of products.Comment: Accepted by MM (IEEE Multimedia Magazine) 201
3D Robotic Sensing of People: Human Perception, Representation and Activity Recognition
The robots are coming. Their presence will eventually bridge the digital-physical divide and dramatically impact human life by taking over tasks where our current society has shortcomings (e.g., search and rescue, elderly care, and child education). Human-centered robotics (HCR) is a vision to address how robots can coexist with humans and help people live safer, simpler and more independent lives.
As humans, we have a remarkable ability to perceive the world around us, perceive people, and interpret their behaviors. Endowing robots with these critical capabilities in highly dynamic human social environments is a significant but very challenging problem in practical human-centered robotics applications.
This research focuses on robotic sensing of people, that is, how robots can perceive and represent humans and understand their behaviors, primarily through 3D robotic vision. In this dissertation, I begin with a broad perspective on human-centered robotics by discussing its real-world applications and significant challenges. Then, I will introduce a real-time perception system, based on the concept of Depth of Interest, to detect and track multiple individuals using a color-depth camera that is installed on moving robotic platforms. In addition, I will discuss human representation approaches, based on local spatio-temporal features, including new “CoDe4D” features that incorporate both color and depth information, a new “SOD” descriptor to efficiently quantize 3D visual features, and the novel AdHuC features, which are capable of representing the activities of multiple individuals. Several new algorithms to recognize human activities are also discussed, including the RG-PLSA model, which allows us to discover activity patterns without supervision, the MC-HCRF model, which can explicitly investigate certainty in latent temporal patterns, and the FuzzySR model, which is used to segment continuous data into events and probabilistically recognize human activities. Cognition models based on recognition results are also implemented for decision making that allow robotic systems to react to human activities. Finally, I will conclude with a discussion of future directions that will accelerate the upcoming technological revolution of human-centered robotics
- …