160,815 research outputs found

    Handcrafted and learning-based tie point features-comparison using the EuroSDR RPAS benchmark datasets

    Get PDF
    The identification of accurate and reliable image correspondences is fundamental for Structure-from-Motion (SfM) photogrammetry. Alongside handcrafted detectors and descriptors, recent machine learning-based approaches have shown promising results for tie point extraction, demonstrating matching success under strong perspective and illumination changes, and a general increase of tie point multiplicity. Recently, several methods based on convolutional neural networks (CNN) have been proposed, but few tests have yet been performed under real photogrammetric applications and, in particular, on full resolution aerial and RPAS image blocks that require rotationally invariant features. The research reported here compares two handcrafted (Metashape local features and RootSIFT) and two learning-based methods (LFNet and Key.Net) using the previously unused EuroSDR RPAS benchmark datasets. Analysis is conducted with DJI Zenmuse P1 imagery acquired at Wards Hill quarry in Northumberland, UK. The research firstly extracts keypoints using the aforementioned methods, before importing them into COLMAP for incremental reconstruction. The image coordinates of signalised ground control points (GCPs) and independent checkpoints (CPs) are automatically detected using an OpenCV algorithm, and then triangulated for comparison with accurate geometric ground-truth. The tests showed that learning-based local features are capable of outperforming traditional methods in terms of geometric accuracy, but several issues remain: few deep learning local features are trained to be rotation invariant, significant computational resources are required for large format imagery, and poor performance emerged in cases of repetitive patterns

    Trademark image retrieval by local features

    Get PDF
    The challenge of abstract trademark image retrieval as a test of machine vision algorithms has attracted considerable research interest in the past decade. Current operational trademark retrieval systems involve manual annotation of the images (the current ‘gold standard’). Accordingly, current systems require a substantial amount of time and labour to access, and are therefore expensive to operate. This thesis focuses on the development of algorithms that mimic aspects of human visual perception in order to retrieve similar abstract trademark images automatically. A significant category of trademark images are typically highly stylised, comprising a collection of distinctive graphical elements that often include geometric shapes. Therefore, in order to compare the similarity of such images the principal aim of this research has been to develop a method for solving the partial matching and shape perception problem. There are few useful techniques for partial shape matching in the context of trademark retrieval, because those existing techniques tend not to support multicomponent retrieval. When this work was initiated most trademark image retrieval systems represented images by means of global features, which are not suited to solving the partial matching problem. Instead, the author has investigated the use of local image features as a means to finding similarities between trademark images that only partially match in terms of their subcomponents. During the course of this work, it has been established that the Harris and Chabat detectors could potentially perform sufficiently well to serve as the basis for local feature extraction in trademark image retrieval. Early findings in this investigation indicated that the well established SIFT (Scale Invariant Feature Transform) local features, based on the Harris detector, could potentially serve as an adequate underlying local representation for matching trademark images. There are few researchers who have used mechanisms based on human perception for trademark image retrieval, implying that the shape representations utilised in the past to solve this problem do not necessarily reflect the shapes contained in these image, as characterised by human perception. In response, a ii practical approach to trademark image retrieval by perceptual grouping has been developed based on defining meta-features that are calculated from the spatial configurations of SIFT local image features. This new technique measures certain visual properties of the appearance of images containing multiple graphical elements and supports perceptual grouping by exploiting the non-accidental properties of their configuration. Our validation experiments indicated that we were indeed able to capture and quantify the differences in the global arrangement of sub-components evident when comparing stylised images in terms of their visual appearance properties. Such visual appearance properties, measured using 17 of the proposed metafeatures, include relative sub-component proximity, similarity, rotation and symmetry. Similar work on meta-features, based on the above Gestalt proximity, similarity, and simplicity groupings of local features, had not been reported in the current computer vision literature at the time of undertaking this work. We decided to adopted relevance feedback to allow the visual appearance properties of relevant and non-relevant images returned in response to a query to be determined by example. Since limited training data is available when constructing a relevance classifier by means of user supplied relevance feedback, the intrinsically non-parametric machine learning algorithm ID3 (Iterative Dichotomiser 3) was selected to construct decision trees by means of dynamic rule induction. We believe that the above approach to capturing high-level visual concepts, encoded by means of meta-features specified by example through relevance feedback and decision tree classification, to support flexible trademark image retrieval and to be wholly novel. The retrieval performance the above system was compared with two other state-of-the-art image trademark retrieval systems: Artisan developed by Eakins (Eakins et al., 1998) and a system developed by Jiang (Jiang et al., 2006). Using relevance feedback, our system achieves higher average normalised precision than either of the systems developed by Eakins’ or Jiang. However, while our trademark image query and database set is based on an image dataset used by Eakins, we employed different numbers of images. It was not possible to access to the same query set and image database used in the evaluation of Jiang’s trademark iii image retrieval system evaluation. Despite these differences in evaluation methodology, our approach would appear to have the potential to improve retrieval effectiveness

    Contributions to the Completeness and Complementarity of Local Image Features

    Get PDF
    Tese de doutoramento em Engenharia Informática apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraLocal image feature detection (or extraction, if we want to use a more semantically correct term) is a central and extremely active research topic in the field of computer vision. Reliable solutions to prominent problems such as matching, content-based image retrieval, object (class) recognition, and symmetry detection, often make use of local image features. It is widely accepted that a good local feature detector is the one that efficiently retrieves distinctive, accurate, and repeatable features in the presence of a wide variety of photometric and geometric transformations. However, these requirements are not always the most important. In fact, not all the applications require the same properties from a local feature detector. We can distinguish three broad categories of applications according to the required properties. The first category includes applications in which the semantic meaning of a particular type of features is exploited. For instance, edge or even ridge detection can be used to identify blood vessels in medical images or watercourses in aerial images. Another example in this category is the use of blob extraction to identify blob-like organisms in microscopic images. A second category includes tasks such as matching, tracking, and registration, which mainly require distinctive, repeatable, and accurate features. Finally, a third category comprises applications such as object (class) recognition, image retrieval, scene classification, and image compression. For this category, it is crucial that features preserve the most informative image content (robust image representation), while requirements such as repeatability and accuracy are of less importance. Our research work is mainly focused on the problem of providing a robust image representation through the use of local features. The limited number of types of features that a local feature extractor responds to might be insufficient to provide the so-called robust image representation. It is fundamental to analyze the completeness of local features, i.e., the amount of image information preserved by local features, as well as the often neglected complementarity between sets of features. The major contributions of this work come in the form of two substantially different local feature detectors aimed at providing considerably robust image representations. The first algorithm is an information theoretic-based keypoint extraction that responds to complementary local structures that are salient (highly informative) within the image context. This method represents a new paradigm in local feature extraction, as it introduces context-awareness principles. The second algorithm extracts Stable Salient Shapes, a novel type of regions, which are obtained through a feature-driven detection of Maximally Stable Extremal Regions (MSER). This method provides compact and robust image representations and overcomes some of the major shortcomings of MSER detection. We empirically validate the methods by investigating the repeatability, accuracy, completeness, and complementarity of the proposed features on standard benchmarks. Under these results, we discuss the applicability of both methods

    Matching sets of features for efficient retrieval and recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 145-153).In numerous domains it is useful to represent a single example by the collection of local features or parts that comprise it. In computer vision in particular, local image features are a powerful way to describe images of objects and scenes. Their stability under variable image conditions is critical for success in a wide range of recognition and retrieval applications. However, many conventional similarity measures and machine learning algorithms assume vector inputs. Comparing and learning from images represented by sets of local features is therefore challenging, since each set may vary in cardinality and its elements lack a meaningful ordering. In this thesis I present computationally efficient techniques to handle comparisons, learning, and indexing with examples represented by sets of features. The primary goal of this research is to design and demonstrate algorithms that can effectively accommodate this useful representation in a way that scales with both the representation size as well as the number of images available for indexing or learning. I introduce the pyramid match algorithm, which efficiently forms an implicit partial matching between two sets of feature vectors.(cont.) The matching has a linear time complexity, naturally forms a Mercer kernel, and is robust to clutter or outlier features, a critical advantage for handling images with variable backgrounds, occlusions, and viewpoint changes. I provide bounds on the expected error relative to the optimal partial matching. For very large databases, even extremely efficient pairwise comparisons may not offer adequately responsive query times. I show how to perform sub-linear time retrievals under the matching measure with randomized hashing techniques, even when input sets have varying numbers of features. My results are focused on several important vision tasks, including applications to content-based image retrieval, discriminative classification for object recognition, kernel regression, and unsupervised learning of categories. I show how the dramatic increase in performance enables accurate and flexible image comparisons to be made on large-scale data sets, and removes the need to artificially limit the number of local descriptions used per image when learning visual categories.by Kristen Lorraine Grauman.Ph.D

    Long-Term Localization for Self-Driving Cars

    Get PDF
    Long-term localization is hard due to changing conditions, while relative localization within time sequences is much easier. To achieve long-term localization in a sequential setting, such as, for self-driving cars, relative localization should be used to the fullest extent, whenever possible.This thesis presents solutions and insights both for long-term sequential visual localization, and localization using global navigational satellite systems (GNSS), that push us closer to the goal of accurate and reliable localization for self-driving cars. It addresses the question: How to achieve accurate and robust, yet cost-effective long-term localization for self-driving cars?Starting in this question, the thesis explores how existing sensor suites for advanced driver-assistance systems (ADAS) can be used most efficiently, and how landmarks in maps can be recognized and used for localization even after severe changes in appearance. The findings show that:* State-of-the-art ADAS sensors are insufficient to meet the requirements for localization of a self-driving car in less than ideal conditions.GNSS and visual localization are identified as areas to improve.\ua0* Highly accurate relative localization with no convergence delay is possible by using time relative GNSS observations with a single band receiver, and no base stations.\ua0* Sequential semantic localization is identified as a promising focus point for further research based on a benchmark study comparing state-of-the-art visual localization methods in challenging autonomous driving scenarios including day-to-night and seasonal changes.\ua0* A novel sequential semantic localization algorithm improves accuracy while significantly reducing map size compared to traditional methods based on matching of local image features.\ua0* Improvements for semantic segmentation in challenging conditions can be made efficiently by automatically generating pixel correspondences between images from a multitude of conditions and enforcing a consistency constraint during training.\ua0* A segmentation algorithm with automatically defined and more fine-grained classes improves localization performance.\ua0* The performance advantage seen in single image localization for modern local image features, when compared to traditional ones, is all but erased when considering sequential data with odometry, thus, encouraging to focus future research more on sequential localization, rather than pure single image localization
    • …
    corecore