256 research outputs found

    Real-Time Salient Closed Boundary Tracking via Line Segments Perceptual Grouping

    Full text link
    This paper presents a novel real-time method for tracking salient closed boundaries from video image sequences. This method operates on a set of straight line segments that are produced by line detection. The tracking scheme is coherently integrated into a perceptual grouping framework in which the visual tracking problem is tackled by identifying a subset of these line segments and connecting them sequentially to form a closed boundary with the largest saliency and a certain similarity to the previous one. Specifically, we define a new tracking criterion which combines a grouping cost and an area similarity constraint. The proposed criterion makes the resulting boundary tracking more robust to local minima. To achieve real-time tracking performance, we use Delaunay Triangulation to build a graph model with the detected line segments and then reduce the tracking problem to finding the optimal cycle in this graph. This is solved by our newly proposed closed boundary candidates searching algorithm called "Bidirectional Shortest Path (BDSP)". The efficiency and robustness of the proposed method are tested on real video sequences as well as during a robot arm pouring experiment.Comment: 7 pages, 8 figures, The 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2017) submission ID 103

    Development of a text reading system on video images

    Get PDF
    Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text region¿s identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions

    An investigation into automated processes for generating focus maps

    Get PDF
    The use of geographic information for mobile applications such as wayfinding has increased rapidly, enabling users to view information on their current position in relation to the neighbouring environment. This is due to the ubiquity of small devices like mobile phones, coupled with location finding devices utilising global positioning system. However, such applications are still not attractive to users because of the difficulties in viewing and identifying the details of the immediate surroundings that help users to follow directions along a route. This results from a lack of presentation techniques to highlight the salient features (such as landmarks) among other unique features. Another problem is that since such applications do not provide any eye-catching distinction between information about the region of interest along the route and the background information, users are not tempted to focus and engage with wayfinding applications. Although several approaches have previously been attempted to solve these deficiencies by developing focus maps, such applications still need to be improved in order to provide users with a visually appealing presentation of information to assist them in wayfinding. The primary goal of this research is to investigate the processes involved in generating a visual representation that allows key features in an area of interest to stand out from the background in focus maps for wayfinding users. In order to achieve this, the automated processes in four key areas - spatial data structuring, spatial data enrichment, automatic map generalization and spatial data mining - have been thoroughly investigated by testing existing algorithms and tools. Having identified the gaps that need to be filled in these processes, the research has developed new algorithms and tools in each area through thorough testing and validation. Thus, a new triangulation data structure is developed to retrieve the adjacency relationship between polygon features required for data enrichment and automatic map generalization. Further, a new hierarchical clustering algorithm is developed to group polygon features under data enrichment required in the automatic generalization process. In addition, two generalization algorithms for polygon merging are developed for generating a generalized background for focus maps, and finally a decision tree algorithm - C4.5 - is customised for deriving salient features, including the development of a new framework to validate derived landmark saliency in order to improve the representation of focus maps

    Retinal Fundus Image Registration via Vascular Structure Graph Matching

    Get PDF
    Motivated by the observation that a retinal fundus image may contain some unique geometric structures within its vascular trees which can be utilized for feature matching, in this paper, we proposed a graph-based registration framework called GM-ICP to align pairwise retinal images. First, the retinal vessels are automatically detected and represented as vascular structure graphs. A graph matching is then performed to find global correspondences between vascular bifurcations. Finally, a revised ICP algorithm incorporating with quadratic transformation model is used at fine level to register vessel shape models. In order to eliminate the incorrect matches from global correspondence set obtained via graph matching, we proposed a structure-based sample consensus (STRUCT-SAC) algorithm. The advantages of our approach are threefold: (1) global optimum solution can be achieved with graph matching; (2) our method is invariant to linear geometric transformations; and (3) heavy local feature descriptors are not required. The effectiveness of our method is demonstrated by the experiments with 48 pairs retinal images collected from clinical patients

    Efficient feature-based image registration by mapping sparsified surfaces

    Get PDF
    With the advancement in the digital camera technology, the use of high resolution images and videos has been widespread in the modern society. In particular, image and video frame registration is frequently applied in computer graphics and film production. However, conventional registration approaches usually require long computational time for high resolution images and video frames. This hinders the application of the registration approaches in the modern industries. In this work, we first propose a new image representation method to accelerate the registration process by triangulating the images effectively. For each high resolution image or video frame, we compute an optimal coarse triangulation which captures the important features of the image. Then, we apply a surface registration algorithm to obtain a registration map which is used to compute the registration of the high resolution image. Experimental results suggest that our overall algorithm is efficient and capable to achieve a high compression rate while the accuracy of the registration is well retained when compared with the conventional grid-based approach. Also, the computational time of the registration is significantly reduced using our triangulation-based approach

    Extracting Geometric Structures in Images with Delaunay Point Processes

    Get PDF
    International audienceWe introduce Delaunay Point Processes, a framework for the extraction of geometric structures from images. Our approach simultaneously locates and groups geometric primitives (line segments, triangles) to form extended structures (line networks, polygons) for a variety of image analysis tasks. Similarly to traditional point processes, our approach uses Markov Chain Monte Carlo to minimize an energy that balances fidelity to the input image data with geometric priors on the output structures. However, while existing point processes struggle to model structures composed of interconnected components, we propose to embed the point process into a Delaunay triangulation, which provides high-quality connectivity by construction. We further leverage key properties of the Delaunay triangulation to devise a fast Markov Chain Monte Carlo sampler. We demonstrate the flexibility of our approach on a variety of applications, including line network extraction, object contouring, and mesh-based image compression
    corecore