1,575 research outputs found
Object recognition and retrieval by context dependent similarity kernels
International audienceThe success of kernel methods including support vector machines (SVMs) strongly depends on the design of appropriate kernels. While initially kernels were designed in order to handle fixed-length data, their extension to unordered, variable-length data became more than necessary for real pattern recognition problems such as object recognition and bioinformatics. We focus in this paper on object recognition using a new type of kernel referred to as "context-dependent". Objects, seen as constellations of local features (interest points, regions, etc.), are matched by minimizing an energy function mixing (1) a fidelity term which measures the quality of feature matching, (2) a neighborhood criteria which captures the object geometry and (3) a regularization term. We will show that the fixed-point of this energy is a "context-dependent" kernel ("CDK") which also satisfies the Mercer condition. Experiments conducted on object recognition show that when plugging our kernel in SVMs, we clearly outperform SVMs with "context-free" kernels
A Performance Evaluation of Exact and Approximate Match Kernels for Object Recognition
Local features have repeatedly shown their effectiveness for object recognition during the last years, and they have consequently become the preferred descriptor for this type of problems. The solution of the correspondence problem is traditionally approached with exact or approximate techniques. In this paper we are interested in methods that solve the correspondence problem via the definition of a kernel function that makes it possible to use local features as input to a support vector machine. We single out the match kernel, an exact approach, and the pyramid match kernel, that uses instead an approximate strategy. We present a thorough experimental evaluation of the two methods on three different databases. Results show that the exact method performs consistently better than the approximate one, especially for the object identification task, when training on a decreasing number of images. Based on these findings and on the computational cost of each approach, we suggest some criteria for choosing between the two kernels given the application at hand
Orientation covariant aggregation of local descriptors with embeddings
Image search systems based on local descriptors typically achieve orientation
invariance by aligning the patches on their dominant orientations. Albeit
successful, this choice introduces too much invariance because it does not
guarantee that the patches are rotated consistently. This paper introduces an
aggregation strategy of local descriptors that achieves this covariance
property by jointly encoding the angle in the aggregation stage in a continuous
manner. It is combined with an efficient monomial embedding to provide a
codebook-free method to aggregate local descriptors into a single vector
representation. Our strategy is also compatible and employed with several
popular encoding methods, in particular bag-of-words, VLAD and the Fisher
vector. Our geometric-aware aggregation strategy is effective for image search,
as shown by experiments performed on standard benchmarks for image and
particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014
A Framework for Image Segmentation Using Shape Models and Kernel Space Shape Priors
©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.DOI: 10.1109/TPAMI.2007.70774Segmentation involves separating an object from the background in a given image. The use of image information alone often leads to poor segmentation results due to the presence of noise, clutter or occlusion. The introduction of shape priors in the geometric active contour (GAC) framework has proved to be an effective way to ameliorate some of these problems. In this work, we propose a novel segmentation method combining image information with prior shape knowledge, using level-sets. Following the work of Leventon et al., we propose to revisit the use of PCA to introduce prior knowledge about shapes in a more robust manner. We utilize kernel PCA (KPCA) and show that this method outperforms linear PCA by allowing only those shapes that are close enough to the training data. In our segmentation framework, shape knowledge and image information are encoded into two energy functionals entirely described in terms of shapes. This consistent description permits to fully take advantage of the Kernel PCA methodology and leads to promising segmentation results. In particular, our shape-driven segmentation technique allows for the simultaneous encoding of multiple types of shapes, and offers a convincing level of robustness with respect to noise, occlusions, or smearing
Recommended from our members
Parametric kernels for structured data analysis
textStructured representation of input physical patterns as a set of local features has been useful for a veriety of robotics and human computer interaction (HCI) applications. It enables a stable understanding of the variable inputs. However, this representation does not fit the conventional machine learning algorithms and distance metrics because they assume vector inputs. To learn from input patterns with variable structure is thus challenging. To address this problem, I propose a general and systematic method to design distance metrics between structured inputs that can be used in conventional learning algorithms. Based on the observation of the stability in the geometric distributions of local features over the physical patterns across similar inputs, this is done combining the local similarities and the conformity of the geometric relationship between local features. The produced distance metrics, called “parametric kernels”, are positive semi-definite and require almost linear time to compute. To demonstrate the general applicability and the efficacy of this approach, I designed and applied parametric kernels to handwritten character recognition, on-line face recognition, and object detection from laser range finder sensor data. Parametric kernels achieve recognition rates competitive to state-of-the-art approaches in these tasks.Computer Science
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions
We present a comparative evaluation of various techniques for action
recognition while keeping as many variables as possible controlled. We employ
two categories of Riemannian manifolds: symmetric positive definite matrices
and linear subspaces. For both categories we use their corresponding nearest
neighbour classifiers, kernels, and recent kernelised sparse representations.
We compare against traditional action recognition techniques based on Gaussian
mixture models and Fisher vectors (FVs). We evaluate these action recognition
techniques under ideal conditions, as well as their sensitivity in more
challenging conditions (variations in scale and translation). Despite recent
advancements for handling manifolds, manifold based techniques obtain the
lowest performance and their kernel representations are more unstable in the
presence of challenging conditions. The FV approach obtains the highest
accuracy under ideal conditions. Moreover, FV best deals with moderate scale
and translation changes
Recognizing flu-like symptoms from videos
© 2014 Hue Thi et al.; licensee BioMed Central Ltd. Background: Vision-based surveillance and monitoring is a potential alternative for early detection of respiratory disease outbreaks in urban areas complementing molecular diagnostics and hospital and doctor visit-based alert systems. Visible actions representing typical flu-like symptoms include sneeze and cough that are associated with changing patterns of hand to head distances, among others. The technical difficulties lie in the high complexity and large variation of those actions as well as numerous similar background actions such as scratching head, cell phone use, eating, drinking and so on. Results: In this paper, we make a first attempt at the challenging problem of recognizing flu-like symptoms from videos. Since there was no related dataset available, we created a new public health dataset for action recognition that includes two major flu-like symptom related actions (sneeze and cough) and a number of background actions. We also developed a suitable novel algorithm by introducing two types of Action Matching Kernels, where both types aim to integrate two aspects of local features, namely the space-time layout and the Bag-of-Words representations. In particular, we show that the Pyramid Match Kernel and Spatial Pyramid Matching are both special cases of our proposed kernels. Besides experimenting on standard testbed, the proposed algorithm is evaluated also on the new sneeze and cough set. Empirically, we observe that our approach achieves competitive performance compared to the state-of-the-arts, while recognition on the new public health dataset is shown to be a non-trivial task even with simple single person unobstructed view. Conclusions: Our sneeze and cough video dataset and newly developed action recognition algorithm is the first of its kind and aims to kick-start the field of action recognition of flu-like symptoms from videos. It will be challenging but necessary in future developments to consider more complex real-life scenario of detecting these actions simultaneously from multiple persons in possibly crowded environments
- …