309 research outputs found

    Trademark image retrieval by local features

    Get PDF
    The challenge of abstract trademark image retrieval as a test of machine vision algorithms has attracted considerable research interest in the past decade. Current operational trademark retrieval systems involve manual annotation of the images (the current ‘gold standard’). Accordingly, current systems require a substantial amount of time and labour to access, and are therefore expensive to operate. This thesis focuses on the development of algorithms that mimic aspects of human visual perception in order to retrieve similar abstract trademark images automatically. A significant category of trademark images are typically highly stylised, comprising a collection of distinctive graphical elements that often include geometric shapes. Therefore, in order to compare the similarity of such images the principal aim of this research has been to develop a method for solving the partial matching and shape perception problem. There are few useful techniques for partial shape matching in the context of trademark retrieval, because those existing techniques tend not to support multicomponent retrieval. When this work was initiated most trademark image retrieval systems represented images by means of global features, which are not suited to solving the partial matching problem. Instead, the author has investigated the use of local image features as a means to finding similarities between trademark images that only partially match in terms of their subcomponents. During the course of this work, it has been established that the Harris and Chabat detectors could potentially perform sufficiently well to serve as the basis for local feature extraction in trademark image retrieval. Early findings in this investigation indicated that the well established SIFT (Scale Invariant Feature Transform) local features, based on the Harris detector, could potentially serve as an adequate underlying local representation for matching trademark images. There are few researchers who have used mechanisms based on human perception for trademark image retrieval, implying that the shape representations utilised in the past to solve this problem do not necessarily reflect the shapes contained in these image, as characterised by human perception. In response, a ii practical approach to trademark image retrieval by perceptual grouping has been developed based on defining meta-features that are calculated from the spatial configurations of SIFT local image features. This new technique measures certain visual properties of the appearance of images containing multiple graphical elements and supports perceptual grouping by exploiting the non-accidental properties of their configuration. Our validation experiments indicated that we were indeed able to capture and quantify the differences in the global arrangement of sub-components evident when comparing stylised images in terms of their visual appearance properties. Such visual appearance properties, measured using 17 of the proposed metafeatures, include relative sub-component proximity, similarity, rotation and symmetry. Similar work on meta-features, based on the above Gestalt proximity, similarity, and simplicity groupings of local features, had not been reported in the current computer vision literature at the time of undertaking this work. We decided to adopted relevance feedback to allow the visual appearance properties of relevant and non-relevant images returned in response to a query to be determined by example. Since limited training data is available when constructing a relevance classifier by means of user supplied relevance feedback, the intrinsically non-parametric machine learning algorithm ID3 (Iterative Dichotomiser 3) was selected to construct decision trees by means of dynamic rule induction. We believe that the above approach to capturing high-level visual concepts, encoded by means of meta-features specified by example through relevance feedback and decision tree classification, to support flexible trademark image retrieval and to be wholly novel. The retrieval performance the above system was compared with two other state-of-the-art image trademark retrieval systems: Artisan developed by Eakins (Eakins et al., 1998) and a system developed by Jiang (Jiang et al., 2006). Using relevance feedback, our system achieves higher average normalised precision than either of the systems developed by Eakins’ or Jiang. However, while our trademark image query and database set is based on an image dataset used by Eakins, we employed different numbers of images. It was not possible to access to the same query set and image database used in the evaluation of Jiang’s trademark iii image retrieval system evaluation. Despite these differences in evaluation methodology, our approach would appear to have the potential to improve retrieval effectiveness

    Fault-Tolerant Vision for Vehicle Guidance in Agriculture

    Get PDF

    Vehicle make and model recognition for intelligent transportation monitoring and surveillance.

    Get PDF
    Vehicle Make and Model Recognition (VMMR) has evolved into a significant subject of study due to its importance in numerous Intelligent Transportation Systems (ITS), such as autonomous navigation, traffic analysis, traffic surveillance and security systems. A highly accurate and real-time VMMR system significantly reduces the overhead cost of resources otherwise required. The VMMR problem is a multi-class classification task with a peculiar set of issues and challenges like multiplicity, inter- and intra-make ambiguity among various vehicles makes and models, which need to be solved in an efficient and reliable manner to achieve a highly robust VMMR system. In this dissertation, facing the growing importance of make and model recognition of vehicles, we present a VMMR system that provides very high accuracy rates and is robust to several challenges. We demonstrate that the VMMR problem can be addressed by locating discriminative parts where the most significant appearance variations occur in each category, and learning expressive appearance descriptors. Given these insights, we consider two data driven frameworks: a Multiple-Instance Learning-based (MIL) system using hand-crafted features and an extended application of deep neural networks using MIL. Our approach requires only image level class labels, and the discriminative parts of each target class are selected in a fully unsupervised manner without any use of part annotations or segmentation masks, which may be costly to obtain. This advantage makes our system more intelligent, scalable, and applicable to other fine-grained recognition tasks. We constructed a dataset with 291,752 images representing 9,170 different vehicles to validate and evaluate our approach. Experimental results demonstrate that the localization of parts and distinguishing their discriminative powers for categorization improve the performance of fine-grained categorization. Extensive experiments conducted using our approaches yield superior results for images that were occluded, under low illumination, partial camera views, or even non-frontal views, available in our real-world VMMR dataset. The approaches presented herewith provide a highly accurate VMMR system for rea-ltime applications in realistic environments.\\ We also validate our system with a significant application of VMMR to ITS that involves automated vehicular surveillance. We show that our application can provide law inforcement agencies with efficient tools to search for a specific vehicle type, make, or model, and to track the path of a given vehicle using the position of multiple cameras

    Machine Intelligence for Advanced Medical Data Analysis: Manifold Learning Approach

    Get PDF
    In the current work, linear and non-linear manifold learning techniques, specifically Principle Component Analysis (PCA) and Laplacian Eigenmaps, are studied in detail. Their applications in medical image and shape analysis are investigated. In the first contribution, a manifold learning-based multi-modal image registration technique is developed, which results in a unified intensity system through intensity transformation between the reference and sensed images. The transformation eliminates intensity variations in multi-modal medical scans and hence facilitates employing well-studied mono-modal registration techniques. The method can be used for registering multi-modal images with full and partial data. Next, a manifold learning-based scale invariant global shape descriptor is introduced. The proposed descriptor benefits from the capability of Laplacian Eigenmap in dealing with high dimensional data by introducing an exponential weighting scheme. It eliminates the limitations tied to the well-known cotangent weighting scheme, namely dependency on triangular mesh representation and high intra-class quality of 3D models. In the end, a novel descriptive model for diagnostic classification of pulmonary nodules is presented. The descriptive model benefits from structural differences between benign and malignant nodules for automatic and accurate prediction of a candidate nodule. It extracts concise and discriminative features automatically from the 3D surface structure of a nodule using spectral features studied in the previous work combined with a point cloud-based deep learning network. Extensive experiments have been conducted and have shown that the proposed algorithms based on manifold learning outperform several state-of-the-art methods. Advanced computational techniques with a combination of manifold learning and deep networks can play a vital role in effective healthcare delivery by providing a framework for several fundamental tasks in image and shape processing, namely, registration, classification, and detection of features of interest

    3D automatic target recognition for missile platforms

    Get PDF
    The quest for military Automatic Target Recognition (ATR) procedures arises from the demand to reduce collateral damage and fratricide. Although missiles with two-dimensional ATR capabilities do exist, the potential of future Light Detection and Ranging (LIDAR) missiles with three-dimensional (3D) ATR abilities shall significantly improve the missile’s effectiveness in complex battlefields. This is because 3D ATR can encode the target’s underlying structure and thus reinforce target recognition. However, the current military grade 3D ATR or military applied computer vision algorithms used for object recognition do not pose optimum solutions in the context of an ATR capable LIDAR based missile, primarily due to the computational and memory (in terms of storage) constraints that missiles impose. Therefore, this research initially introduces a 3D descriptor taxonomy for the Local and the Global descriptor domain, capable of realising the processing cost of each potential option. Through these taxonomies, the optimum missile oriented descriptor per domain is identified that will further pinpoint the research route for this thesis. In terms of 3D descriptors that are suitable for missiles, the contribution of this thesis is a 3D Global based descriptor and four 3D Local based descriptors namely the SURF Projection recognition (SPR), the Histogram of Distances (HoD), the processing efficient variant (HoD-S) and the binary variant B-HoD. These are challenged against current state-of-the-art 3D descriptors on standard commercial datasets, as well as on highly credible simulated air-to-ground missile engagement scenarios that consider various platform parameters and nuisances including simulated scale change and atmospheric disturbances. The results obtained over the different datasets showed an outstanding computational improvement, on average x19 times faster than state-of-the-art techniques in the literature, while maintaining or even improving on some occasions the detection rate to a minimum of 90% and over of correct classified targets

    Visual grasp point localization, classification and state recognition in robotic manipulation of cloth: an overview

    Get PDF
    © . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Cloth manipulation by robots is gaining popularity among researchers because of its relevance, mainly (but not only) in domestic and assistive robotics. The required science and technologies begin to be ripe for the challenges posed by the manipulation of soft materials, and many contributions have appeared in the last years. This survey provides a systematic review of existing techniques for the basic perceptual tasks of grasp point localization, state estimation and classification of cloth items, from the perspective of their manipulation by robots. This choice is grounded on the fact that any manipulative action requires to instruct the robot where to grasp, and most garment handling activities depend on the correct recognition of the type to which the particular cloth item belongs and its state. The high inter- and intraclass variability of garments, the continuous nature of the possible deformations of cloth and the evident difficulties in predicting their localization and extension on the garment piece are challenges that have encouraged the researchers to provide a plethora of methods to confront such problems, with some promising results. The present review constitutes for the first time an effort in furnishing a structured framework of these works, with the aim of helping future contributors to gain both insight and perspective on the subjectPeer ReviewedPostprint (author's final draft

    3D Shape Descriptor-Based Facial Landmark Detection: A Machine Learning Approach

    Get PDF
    Facial landmark detection on 3D human faces has had numerous applications in the literature such as establishing point-to-point correspondence between 3D face models which is itself a key step for a wide range of applications like 3D face detection and authentication, matching, reconstruction, and retrieval, to name a few. Two groups of approaches, namely knowledge-driven and data-driven approaches, have been employed for facial landmarking in the literature. Knowledge-driven techniques are the traditional approaches that have been widely used to locate landmarks on human faces. In these approaches, a user with sucient knowledge and experience usually denes features to be extracted as the landmarks. Data-driven techniques, on the other hand, take advantage of machine learning algorithms to detect prominent features on 3D face models. Besides the key advantages, each category of these techniques has limitations that prevent it from generating the most reliable results. In this work we propose to combine the strengths of the two approaches to detect facial landmarks in a more ecient and precise way. The suggested approach consists of two phases. First, some salient features of the faces are extracted using expert systems. Afterwards, these points are used as the initial control points in the well-known Thin Plate Spline (TPS) technique to deform the input face towards a reference face model. Second, by exploring and utilizing multiple machine learning algorithms another group of landmarks are extracted. The data-driven landmark detection step is performed in a supervised manner providing an information-rich set of training data in which a set of local descriptors are computed and used to train the algorithm. We then, use the detected landmarks for establishing point-to-point correspondence between the 3D human faces mainly using an improved version of Iterative Closest Point (ICP) algorithms. Furthermore, we propose to use the detected landmarks for 3D face matching applications

    Graph matching using position coordinates and local features for image analysis

    Get PDF
    Encontrar las correspondencias entre dos imágenes es un problema crucial en el campo de la visión por ordenador i el reconocimiento de patrones. Es relevante para un amplio rango de propósitos des de aplicaciones de reconocimiento de objetos en las áreas de biometría, análisis de documentos i análisis de formas hasta aplicaciones relacionadas con la geometría desde múltiples puntos de vista tales cómo la recuperación de la pose, estructura desde el movimiento y localización y mapeo. La mayoría de las técnicas existentes enfocan este problema o bien usando características locales en la imagen o bien usando métodos de registro de conjuntos de puntos (o bien una mezcla de ambos). En las primeras, un conjunto disperso de características es primeramente extraído de las imágenes y luego caracterizado en la forma de vectores descriptores usando evidencias locales de la imagen. Las características son asociadas según la similitud entre sus descriptores. En las segundas, los conjuntos de características son considerados cómo conjuntos de puntos los cuales son asociados usando técnicas de optimización no lineal. Estos son procedimientos iterativos que estiman los parámetros de correspondencia y de alineamiento en pasos alternados. Los grafos son representaciones que contemplan relaciones binarias entre las características. Tener en cuenta relaciones binarias al problema de la correspondencia a menudo lleva al llamado problema del emparejamiento de grafos. Existe cierta cantidad de métodos en la literatura destinados a encontrar soluciones aproximadas a diferentes instancias del problema de emparejamiento de grafos, que en la mayoría de casos es del tipo "NP-hard". El cuerpo de trabajo principal de esta tesis está dedicado a formular ambos problemas de asociación de características de imagen y registro de conjunto de puntos como instancias del problema de emparejamiento de grafos. En todos los casos proponemos algoritmos aproximados para solucionar estos problemas y nos comparamos con un número de métodos existentes pertenecientes a diferentes áreas como eliminadores de "outliers", métodos de registro de conjuntos de puntos y otros métodos de emparejamiento de grafos. Los experimentos muestran que en la mayoría de casos los métodos propuestos superan al resto. En ocasiones los métodos propuestos o bien comparten el mejor rendimiento con algún método competidor o bien obtienen resultados ligeramente peores. En estos casos, los métodos propuestos normalmente presentan tiempos computacionales inferiores.Trobar les correspondències entre dues imatges és un problema crucial en el camp de la visió per ordinador i el reconeixement de patrons. És rellevant per un ampli ventall de propòsits des d’aplicacions de reconeixement d’objectes en les àrees de biometria, anàlisi de documents i anàlisi de formes fins aplicacions relacionades amb geometria des de múltiples punts de vista tals com recuperació de pose, estructura des del moviment i localització i mapeig. La majoria de les tècniques existents enfoquen aquest problema o bé usant característiques locals a la imatge o bé usant mètodes de registre de conjunts de punts (o bé una mescla d’ambdós). En les primeres, un conjunt dispers de característiques és primerament extret de les imatges i després caracteritzat en la forma de vectors descriptors usant evidències locals de la imatge. Les característiques son associades segons la similitud entre els seus descriptors. En les segones, els conjunts de característiques son considerats com conjunts de punts els quals son associats usant tècniques d’optimització no lineal. Aquests son procediments iteratius que estimen els paràmetres de correspondència i d’alineament en passos alternats. Els grafs son representacions que contemplen relacions binaries entre les característiques. Tenir en compte relacions binàries al problema de la correspondència sovint porta a l’anomenat problema de l’emparellament de grafs. Existeix certa quantitat de mètodes a la literatura destinats a trobar solucions aproximades a diferents instàncies del problema d’emparellament de grafs, el qual en la majoria de casos és del tipus “NP-hard”. Una part del nostre treball està dedicat a investigar els beneficis de les mesures de ``bins'' creuats per a la comparació de característiques locals de les imatges. La resta està dedicat a formular ambdós problemes d’associació de característiques d’imatge i registre de conjunt de punts com a instàncies del problema d’emparellament de grafs. En tots els casos proposem algoritmes aproximats per solucionar aquests problemes i ens comparem amb un nombre de mètodes existents pertanyents a diferents àrees com eliminadors d’“outliers”, mètodes de registre de conjunts de punts i altres mètodes d’emparellament de grafs. Els experiments mostren que en la majoria de casos els mètodes proposats superen a la resta. En ocasions els mètodes proposats o bé comparteixen el millor rendiment amb algun mètode competidor o bé obtenen resultats lleugerament pitjors. En aquests casos, els mètodes proposats normalment presenten temps computacionals inferiors

    Enhancing low-level features with mid-level cues

    Get PDF
    Local features have become an essential tool in visual recognition. Much of the progress in computer vision over the past decade has built on simple, local representations such as SIFT or HOG. SIFT in particular shifted the paradigm in feature representation. Subsequent works have often focused on improving either computational efficiency, or invariance properties. This thesis belongs to the latter group. Invariance is a particularly relevant aspect if we intend to work with dense features. The traditional approach to sparse matching is to rely on stable interest points, such as corners, where scale and orientation can be reliably estimated, enforcing invariance; dense features need to be computed on arbitrary points. Dense features have been shown to outperform sparse matching techniques in many recognition problems, and form the bulk of our work. In this thesis we present strategies to enhance low-level, local features with mid-level, global cues. We devise techniques to construct better features, and use them to handle complex ambiguities, occlusions and background changes. To deal with ambiguities, we explore the use of motion to enforce temporal consistency with optical flow priors. We also introduce a novel technique to exploit segmentation cues, and use it to extract features invariant to background variability. For this, we downplay image measurements most likely to belong to a region different from that where the descriptor is computed. In both cases we follow the same strategy: we incorporate mid-level, "big picture" information into the construction of local features, and proceed to use them in the same manner as we would the baseline features. We apply these techniques to different feature representations, including SIFT and HOG, and use them to address canonical vision problems such as stereo and object detection, demonstrating that the introduction of global cues yields consistent improvements. We prioritize solutions that are simple, general, and efficient. Our main contributions are as follows: (a) An approach to dense stereo reconstruction with spatiotemporal features, which unlike existing works remains applicable to wide baselines. (b) A technique to exploit segmentation cues to construct dense descriptors invariant to background variability, such as occlusions or background motion. (c) A technique to integrate bottom-up segmentation with recognition efficiently, amenable to sliding window detectors.Les "features" locals s'han convertit en una eina fonamental en el camp del reconeixement visual. Gran part del progrés experimentat en el camp de la visió per computador al llarg de l'última decada es basa en representacions locals de baixa complexitat, com SIFT o HOG. SIFT, en concret, ha canviat el paradigma en representació de característiques visuals. Els treballs que l'han succeït s'acostumen a centrar o bé a millorar la seva eficiencia computacional, o bé propietats d'invariança. El treball presentat en aquesta tesi pertany al segon grup. L'invariança es un aspecte especialment rellevant quan volem treballab amb "features" denses, és a dir per a cada pixel. La manera tradicional d'atacar el problema amb "features" de baixa densitat consisteix en seleccionar punts d'interés estables, com per exemple cantonades, on l'escala i l'orientació poden ser estimades de manera robusta. Les "features" denses, per definició, han de ser calculades en punts arbitraris de la imatge. S'ha demostrat que les "features" denses obtenen millors resultats en tècniques de correspondència per a molts problemes en reconeixement, i formen la major part del nostre treball. En aquesta tesi presentem estratègies per a enriquir "features" locals de baix nivell amb "cues" o dades globals, de mitja complexitat. Dissenyem tècniques per a construïr millors "features", que usem per a atacar problemes tals com correspondències amb un grau elevat d'ambigüetat, oclusions, i canvis del fons de la imatge. Per a atacar ambigüetats, explorem l'ús del moviment per a imposar consistència espai-temporal mitjançant informació d'"optical flow". També presentem una tècnica per explotar dades de segmentació que fem servir per a extreure "features" invariants a canvis en el fons de la imatge. Aquest mètode consisteix en atenuar els components de la imatge (i per tant les "features") que probablement corresponguin a regions diferents a la del descriptor que estem calculant. En ambdós casos seguim la mateixa estratègia: la nostra voluntat és incorporar dades globals d'un nivell de complexitat mitja a la construcció de "features" locals, que procedim a utilitzar de la mateixa manera que les "features" originals. Aquestes tècniques són aplicades a diferents tipus de representacions, incloent SIFT i HOG, i mostrem com utilitzar-les per a atacar problemes fonamentals en visió per computador tals com l'estèreo i la detecció d'objectes. En aquest treball demostrem que introduïnt informació global en la construcció de "features" locals podem obtenir millores consistentment. Donem prioritat a solucions senzilles, generals i eficients. Aquestes són les principals contribucions de la tesi: (a) Una tècnica per a reconstrucció estèreo densa mitjançant "features" espai-temporals, amb l'avantatge respecte a treballs existents que podem aplicar-la a càmeres en qualsevol configuració geomètrica ("wide-baseline"). (b) Una tècnica per a explotar dades de segmentació dins la construcció de descriptors densos, fent-los invariants a canvis al fons de la imatge, i per tant a problemes com les oclusions en estèreo o objectes en moviment. (c) Una tècnica per a integrar segmentació de manera ascendent ("bottom-up") en problemes de reconeixement d'una manera eficient, dissenyada per a detectors de tipus "sliding window"

    Remote sensing image fusion on 3D scenarios: A review of applications for agriculture and forestry

    Get PDF
    Three-dimensional (3D) image mapping of real-world scenarios has a great potential to provide the user with a more accurate scene understanding. This will enable, among others, unsupervised automatic sampling of meaningful material classes from the target area for adaptive semi-supervised deep learning techniques. This path is already being taken by the recent and fast-developing research in computational fields, however, some issues related to computationally expensive processes in the integration of multi-source sensing data remain. Recent studies focused on Earth observation and characterization are enhanced by the proliferation of Unmanned Aerial Vehicles (UAV) and sensors able to capture massive datasets with a high spatial resolution. In this scope, many approaches have been presented for 3D modeling, remote sensing, image processing and mapping, and multi-source data fusion. This survey aims to present a summary of previous work according to the most relevant contributions for the reconstruction and analysis of 3D models of real scenarios using multispectral, thermal and hyperspectral imagery. Surveyed applications are focused on agriculture and forestry since these fields concentrate most applications and are widely studied. Many challenges are currently being overcome by recent methods based on the reconstruction of multi-sensorial 3D scenarios. In parallel, the processing of large image datasets has recently been accelerated by General-Purpose Graphics Processing Unit (GPGPU) approaches that are also summarized in this work. Finally, as a conclusion, some open issues and future research directions are presented.European Commission 1381202-GEU PYC20-RE-005-UJA IEG-2021Junta de Andalucia 1381202-GEU PYC20-RE-005-UJA IEG-2021Instituto de Estudios GiennesesEuropean CommissionSpanish Government UIDB/04033/2020DATI-Digital Agriculture TechnologiesPortuguese Foundation for Science and Technology 1381202-GEU FPU19/0010
    corecore