15 research outputs found

    Rotation-invariant features for multi-oriented text detection in natural images.

    Get PDF
    Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes

    Extremal Regions Detection Guided by Maxima of Gradient Magnitude

    Get PDF

    Image Matching based on Curvilinear Regions

    Get PDF

    Localizing Polygonal Objects in Man-Made Environments

    Get PDF
    Object detection is a significant challenge in Computer Vision and has received a lot of attention in the field. One such challenge addressed in this thesis is the detection of polygonal objects, which are prevalent in man-made environments. Shape analysis is an important cue to detect these objects. We propose a contour-based object detection framework to deal with the related challenges, including how to efficiently detect polygonal shapes and how to exploit them for object detection. First, we propose an efficient component tree segmentation framework for stable region extraction and a multi-resolution line segment detection algorithm, which form the bases of our detection framework. Our component tree segmentation algorithm explores the optimal threshold for each branch of the component tree, and achieves a significant improvement over image thresholding segmentation, and comparable performance to more sophisticated methods but only at a fraction of computation time. Our line segment detector overcomes several inherent limitations of the Hough transform, and achieves a comparable performance to the state-of-the-art line segment detectors. However, our approach can better capture dominant structures and is more stable against low-quality imaging conditions. Second, we propose a global shape analysis measurement for simple polygon detection and use it to develop an approach for real-time landing site detection in unconstrained man-made environments. Since the task of detecting landing sites must be performed in a few seconds or less, existing methods are often limited to simple local intensity and edge variation cues. By contrast, we show how to efficiently take into account the potential sitesâ global shape, which is a critical cue in man-made scenes. Our method relies on component tree segmentation algorithm and a new shape regularity measure to look for polygonal regions in video sequences. In this way we enforce both temporal consistency and geometric regularity, resulting in reliable and consistent detections. Third, we propose a generic contour grouping based object detection approach by exploring promising cycles in a line fragment graph. Previous contour-based methods are limited to use additive scoring functions. In this thesis, we propose an approximate search approach that eliminates this restriction. Given a weighted line fragment graph, we prune its cycle space by removing cycles containing weak nodes or weak edges, until the upper bound of the cycle space is less than the threshold defined by the cyclomatic number. Object contours are then detected as maximally scoring elementary circuits in the pruned cycle space. Furthermore, we propose another more efficient algorithm, which reconstructs the graph by grouping the strongest edges iteratively until the number of the cycles reaches the upper bound. Our approximate search approaches can be used with any cycle scoring function. Moreover, unlike other contour grouping based approaches, our approach does not rely on a greedy strategy for finding multiple candidates and is capable of finding multiple candidates sharing common line fragments. We demonstrate that our approach significantly outperforms the state-of-the-art

    A Comparison of Affine Region Detectors

    Full text link

    Recognition and matching in the presence of deformation and lighting change

    Get PDF
    Natural images of objects and scenes show a fascinating amount of variability due to different factors like lighting and viewpoint change, occlusion, articulation and non-rigid deformation. There are certain cases like recognition of specular objects and images with arbitrary deformations where existing techniques do not perform well. For image deformation, we propose a method for faster keypoint matching with histogram descriptors and a completely deformation invariant representation. We also propose a method for improving specular object recognition. Histograms are a powerful statistical representation for keypoint matching and content based image retrieval. The earth mover's distance (EMD) is an important perceptually meaningful metric for comparing histograms, but it suffers from high (O(n3 log n)) computational complexity. We propose a novel linear time algorithm for approximating EMD with the weighted L1 norm of the wavelet transform of the difference histogram. We prove that the resulting wavelet EMD metric is equivalent to EMD. We experimentally show that wavelet EMD is a good approximation to EMD, has similar performance, but requires much less computation. We also give a fast algorithm for the best partial EMD match between two histograms. Images of non-planar object can undergo a large non-linear deformation due to a viewpoint change. Complex deformations occur in images of non-rigid objects, for example, in medical image sequences. We propose using the contour tree as a novel framework invariant to arbitrary deformations for representing and comparing images. It represents all the deformation invariant information in an image. Lighting changes greatly affect the appearance of specular objects and make recognition difficult much more than for Lambertian objects. In model based recognition of specular objects, an important constraint is that the estimated lighting should be non-negative everywhere. We propose a new method to enforce this constraint and explore its usefulness in specular object recognition, using the spherical harmonic representation of lighting. The new method is faster as well as more accurate than previous methods. Experiments on both synthetic and real data indicate that the constraint can improve recognition of specular objects by better separating the correct and incorrect models

    Online Structured Learning for Real-Time Computer Vision Gaming Applications

    Get PDF
    In recent years computer vision has played an increasingly important role in the development of computer games, and it now features as one of the core technologies for many gaming platforms. The work in this thesis addresses three problems in real-time computer vision, all of which are motivated by their potential application to computer games. We rst present an approach for real-time 2D tracking of arbitrary objects. In common with recent research in this area we incorporate online learning to provide an appearance model which is able to adapt to the target object and its surrounding background during tracking. However, our approach moves beyond the standard framework of tracking using binary classication and instead integrates tracking and learning in a more principled way through the use of structured learning. As well as providing a more powerful framework for adaptive visual object tracking, our approach also outperforms state-of-the-art tracking algorithms on standard datasets. Next we consider the task of keypoint-based object tracking. We take the traditional pipeline of matching keypoints followed by geometric verication and show how this can be embedded into a structured learning framework in order to provide principled adaptivity to a given environment. We also propose an approximation method allowing us to take advantage of recently developed binary image descriptors, meaning our approach is suitable for real-time application even on low-powered portable devices. Experimentally, we clearly see the benet that online adaptation using structured learning can bring to this problem. Finally, we present an approach for approximately recovering the dense 3D structure of a scene which has been mapped by a simultaneous localisation and mapping system. Our approach is guided by the constraints of the low-powered portable hardware we are targeting, and we develop a system which coarsely models the scene using a small number of planes. To achieve this, we frame the task as a structured prediction problem and introduce online learning into our approach to provide adaptivity to a given scene. This allows us to use relatively simple multi-view information coupled with online learning of appearance to efficiently produce coarse reconstructions of a scene

    Detection of Counterfeit Coins and Assessment of Coin Qualities.

    Get PDF
    Due to the proliferation of fake money these days, detection of counterfeit coins with high accuracy is in strong demand, yet not much research has been conducted in this field. The objective of this thesis is to introduce modern computer vision techniques and machine intelligence to differentiate real coins and fake ones with high precision, based on visual aspects. To that end, a high-resolution scanning device – IBIX Trax is deployed to sample the coin images. On top of that, three visual aspects are thoroughly inspected, namely lettering, images and texture. Six features are extracted from letterings, i.e. stroke width, contour smoothness, lettering height, lettering width, relative angle, and relative distance. As for classification, a hierarchical clustering – max spacing K-clustering—is adopted. Our experimental results show that the fake coins and real ones are totally separable based on these features. As for images, we propose a novel shape feature— angle-distance. After images are segmented, a vector of size 360*1 is deployed to represent each shape. For classification, a dissimilarity measurement is used to quantize the difference between two shapes. The results show it can recognize the fake coins successfully. As for texture, a cutting-edge feature maximum stable extremal region is adopted to automatically detect the holes and indents on the coin surface. Parameters associated with this feature are adjusted in the experiments. The detection results show this feature can be used as an indicator for assessing the qualities of coins

    Modelling Visual Objects Regardless of Depictive Style

    Get PDF

    Contributions to the content-based image retrieval using pictorial queries

    Get PDF
    Descripció del recurs: el 02 de novembre de 2010L'accés massiu a les càmeres digitals, els ordinadors personals i a Internet, ha propiciat la creació de grans volums de dades en format digital. En aquest context, cada vegada adquireixen major rellevància totes aquelles eines dissenyades per organitzar la informació i facilitar la seva cerca. Les imatges són un cas particular de dades que requereixen tècniques específiques de descripció i indexació. L'àrea de la visió per computador encarregada de l'estudi d'aquestes tècniques rep el nom de Recuperació d'Imatges per Contingut, en anglès Content-Based Image Retrieval (CBIR). Els sistemes de CBIR no utilitzen descripcions basades en text sinó que es basen en característiques extretes de les pròpies imatges. En contrast a les més de 6000 llengües parlades en el món, les descripcions basades en característiques visuals representen una via d'expressió universal. La intensa recerca en el camp dels sistemes de CBIR s'ha aplicat en àrees de coneixement molt diverses. Així doncs s'han desenvolupat aplicacions de CBIR relacionades amb la medicina, la protecció de la propietat intel·lectual, el periodisme, el disseny gràfic, la cerca d'informació en Internet, la preservació dels patrimoni cultural, etc. Un dels punts importants d'una aplicació de CBIR resideix en el disseny de les funcions de l'usuari. L'usuari és l'encarregat de formular les consultes a partir de les quals es fa la cerca de les imatges. Nosaltres hem centrat l'atenció en aquells sistemes en què la consulta es formula a partir d'una representació pictòrica. Hem plantejat una taxonomia dels sistemes de consulta en composada per quatre paradigmes diferents: Consulta-segons-Selecció, Consulta-segons-Composició-Icònica, Consulta-segons-Esboç i Consulta-segons-Il·lustració. Cada paradigma incorpora un nivell diferent en el potencial expressiu de l'usuari. Des de la simple selecció d'una imatge, fins a la creació d'una il·lustració en color, l'usuari és qui pren el control de les dades d'entrada del sistema. Al llarg dels capítols d'aquesta tesi hem analitzat la influència que cada paradigma de consulta exerceix en els processos interns d'un sistema de CBIR. D'aquesta manera també hem proposat un conjunt de contribucions que hem exemplificat des d'un punt de vista pràctic mitjançant una aplicació final
    corecore