1,203 research outputs found

    A Review of Codebook Models in Patch-Based Visual Object Recognition

    No full text
    The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods

    Orientation covariant aggregation of local descriptors with embeddings

    Get PDF
    Image search systems based on local descriptors typically achieve orientation invariance by aligning the patches on their dominant orientations. Albeit successful, this choice introduces too much invariance because it does not guarantee that the patches are rotated consistently. This paper introduces an aggregation strategy of local descriptors that achieves this covariance property by jointly encoding the angle in the aggregation stage in a continuous manner. It is combined with an efficient monomial embedding to provide a codebook-free method to aggregate local descriptors into a single vector representation. Our strategy is also compatible and employed with several popular encoding methods, in particular bag-of-words, VLAD and the Fisher vector. Our geometric-aware aggregation strategy is effective for image search, as shown by experiments performed on standard benchmarks for image and particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014

    Characterizing Objects in Images using Human Context

    Get PDF
    Humans have an unmatched capability of interpreting detailed information about existent objects by just looking at an image. Particularly, they can effortlessly perform the following tasks: 1) Localizing various objects in the image and 2) Assigning functionalities to the parts of localized objects. This dissertation addresses the problem of aiding vision systems accomplish these two goals. The first part of the dissertation concerns object detection in a Hough-based framework. To this end, the independence assumption between features is addressed by grouping them in a local neighborhood. We study the complementary nature of individual and grouped features and combine them to achieve improved performance. Further, we consider the challenging case of detecting small and medium sized household objects under human-object interactions. We first evaluate appearance based star and tree models. While the tree model is slightly better, appearance based methods continue to suffer due to deficiencies caused by human interactions. To this end, we successfully incorporate automatically extracted human pose as a form of context for object detection. The second part of the dissertation addresses the tedious process of manually annotating objects to train fully supervised detectors. We observe that videos of human-object interactions with activity labels can serve as weakly annotated examples of household objects. Since such objects cannot be localized only through appearance or motion, we propose a framework that includes human centric functionality to retrieve the common object. Designed to maximize data utility by detecting multiple instances of an object per video, the framework achieves performance comparable to its fully supervised counterpart. The final part of the dissertation concerns localizing functional regions or affordances within objects by casting the problem as that of semantic image segmentation. To this end, we introduce a dataset involving human-object interactions with strong i.e. pixel level and weak i.e. clickpoint and image level affordance annotations. We propose a framework that utilizes both forms of weak labels and demonstrate that efforts for weak annotation can be further optimized using human context

    Adapting pedestrian detectors to new domains: A comprehensive review.

    Get PDF
    Successful detection and localisation of pedestrians is an important goal in computer vision which is a core area in Artificial Intelligence. State-of-the-art pedestrian detectors proposed in literature have reached impressive performance on certain datasets. However, it has been pointed out that these detectors tend not to perform very well when applied to specific scenes that differ from the training datasets in some ways. Due to this, domain adaptation approaches have recently become popular in order to adapt existing detectors to new domains to improve the performance in those domains. There is a real need to review and analyse critically the state-of-the-art domain adaptation algorithms, especially in the area of object and pedestrian detection. In this paper, we survey the most relevant and important state-of-the-art results for domain adaptation for image and video data, with a particular focus on pedestrian detection. Related areas to domain adaptation are also included in our review and we make observations and draw conclusions from the representative papers and give practical recommendations on which methods should be preferred in different situations that practitioners may encounter in real-life

    Regularization Based Iterative Point Match Weighting for Accurate Rigid Transformation Estimation

    Get PDF
    Feature extraction and matching (FEM) for 3D shapes finds numerous applications in computer graphics and vision for object modeling, retrieval, morphing, and recognition. However, unavoidable incorrect matches lead to inaccurate estimation of the transformation relating different datasets. Inspired by AdaBoost, this paper proposes a novel iterative re-weighting method to tackle the challenging problem of evaluating point matches established by typical FEM methods. Weights are used to indicate the degree of belief that each point match is correct. Our method has three key steps: (i) estimation of the underlying transformation using weighted least squares, (ii) penalty parameter estimation via minimization of the weighted variance of the matching errors, and (iii) weight re-estimation taking into account both matching errors and information learnt in previous iterations. A comparative study, based on real shapes captured by two laser scanners, shows that the proposed method outperforms four other state-of-the-art methods in terms of evaluating point matches between overlapping shapes established by two typical FEM methods, resulting in more accurate estimates of the underlying transformation. This improved transformation can be used to better initialize the iterative closest point algorithm and its variants, making 3D shape registration more likely to succeed
    • …
    corecore