231 research outputs found

    Place and Object Recognition for Real-time Visual Mapping

    Get PDF
    Este trabajo aborda dos de las principales dificultades presentes en los sistemas actuales de localización y creación de mapas de forma simultánea (del inglés Simultaneous Localization And Mapping, SLAM): el reconocimiento de lugares ya visitados para cerrar bucles en la trajectoria y crear mapas precisos, y el reconocimiento de objetos para enriquecer los mapas con estructuras de alto nivel y mejorar la interación entre robots y personas. En SLAM visual, las características que se extraen de las imágenes de una secuencia de vídeo se van acumulando con el tiempo, haciendo más laboriosos dos de los aspectos de la detección de bucles: la eliminación de los bucles incorrectos que se detectan entre lugares que tienen una apariencia muy similar, y conseguir un tiempo de ejecución bajo y factible en trayectorias largas. En este trabajo proponemos una técnica basada en vocabularios visuales y en bolsas de palabras para detectar bucles de manera robusta y eficiente, centrándonos en dos ideas principales: 1) aprovechar el origen secuencial de las imágenes de vídeo, y 2) hacer que todo el proceso pueda funcionar a frecuencia de vídeo. Para beneficiarnos del origen secuencial de las imágenes, presentamos una métrica de similaridad normalizada para medir el parecido entre imágenes e incrementar la distintividad de las detecciones correctas. A su vez, agrupamos los emparejamientos de imágenes candidatas a ser bucle para evitar que éstas compitan cuando realmente fueron tomadas desde el mismo lugar. Finalmente, incorporamos una restricción temporal para comprobar la coherencia entre detecciones consecutivas. La eficiencia se logra utilizando índices inversos y directos y características binarias. Un índice inverso acelera la comparación entre imágenes de lugares, y un índice directo, el cálculo de correspondencias de puntos entre éstas. Por primera vez, en este trabajo se han utilizado características binarias para detectar bucles, dando lugar a una solución viable incluso hasta para decenas de miles de imágenes. Los bucles se verifican comprobando la coherencia de la geometría de las escenas emparejadas. Para ello utilizamos varios métodos robustos que funcionan tanto con una como con múltiples cámaras. Presentamos resultados competitivos y sin falsos positivos en distintas secuencias, con imágenes adquiridas tanto a alta como a baja frecuencia, con cámaras frontales y laterales, y utilizando el mismo vocabulario y la misma configuración. Con descriptores binarios, el sistema completo requiere 22 milisegundos por imagen en una secuencia de 26.300 imágenes, resultando un orden de magnitud más rápido que otras técnicas actuales. Se puede utilizar un algoritmo similar al de reconocimiento de lugares para resolver el reconocimiento de objetos en SLAM visual. Detectar objetos en este contexto es particularmente complicado debido a que las distintas ubicaciones, posiciones y tamaños en los que se puede ver un objeto en una imagen son potencialmente infinitos, por lo que suelen ser difíciles de distinguir. Además, esta complejidad se multiplica cuando la comparación ha de hacerse contra varios objetos 3D. Nuestro esfuerzo en este trabajo está orientado a: 1) construir el primer sistema de SLAM visual que puede colocar objectos 3D reales en el mapa, y 2) abordar los problemas de escalabilidad resultantes al tratar con múltiples objetos y vistas de éstos. En este trabajo, presentamos el primer sistema de SLAM monocular que reconoce objetos 3D, los inserta en el mapa y refina su posición en el espacio 3D a medida que el mapa se va construyendo, incluso cuando los objetos dejan de estar en el campo de visión de la cámara. Esto se logra en tiempo real con modelos de objetos compuestos por información tridimensional y múltiples imágenes representando varios puntos de vista del objeto. Después nos centramos en la escalabilidad de la etapa del reconocimiento de los objetos 3D. Presentamos una técnica rápida para segmentar imágenes en regiones de interés para detectar objetos pequeños o lejanos. Tras ello, proponemos sustituir el modelo de objetos de vistas independientes por un modelado con una única bolsa de palabras de características binarias asociadas a puntos 3D. Creamos también una base de datos que incorpora índices inversos y directos para aprovechar sus ventajas a la hora de recuperar rápidamente tanto objetos candidatos a ser detectados como correspondencias de puntos, tal y como hacían en el caso de la detección de bucles. Los resultados experimentales muestran que nuestro sistema funciona en tiempo real en un entorno de escritorio con cámara en mano y en una habitación con una cámara montada sobre un robot autónomo. Las mejoras en el proceso de reconocimiento obtienen resultados satisfactorios, sin detecciones erróneas y con un tiempo de ejecución medio de 28 milisegundos por imagen con una base de datos de 20 objetos 3D

    Non-overlapping Distributed Tracking System Utilizing Particle Filter

    Get PDF
    Tracking people across multiple cameras is a challenging research area in visual computing, especially when these cameras have non-overlapping field of views. The important task is to associate a current subject with other prior appearances of the same subject across time and space in a camera network. Several known techniques rely on Bayesian approaches to perform the matching task. However, these approaches do not scale well when the dimension of the problem increases; e.g. when the number of subject or possible path increases. The aim of this paper is to propose a unified tracking framework using particle filters to efficiently switch between visual tracking (field of view tracking) and track prediction (non-overlapping region tracking). The particle filter tracking system utilizes a map (known environment) to assist the tracking process when targets leave the field of view of any camera. We implemented and tested this tracking approach in an in-house multiple cameras system as well as using on-line data. Promising results were obtained which suggested the feasibility of such an approach

    Object Tracking

    Get PDF
    Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

    TermEval: an automatic metric for evaluating terminology translation in MT

    Get PDF
    Terminology translation plays a crucial role in domain-specific machine translation (MT). Preservation of domain-knowledge from source to target is arguably the most concerning factor for the customers in translation industry, especially for critical domains such as medical, transportation, military, legal and aerospace. However, evaluation of terminology translation, despite its huge importance in the translation industry, has been a less examined area in MT research. Term translation quality in MT is usually measured with domain experts, either in academia or industry. To the best of our knowledge, as of yet there is no publicly available solution to automatically evaluate terminology translation in MT. In particular, manual intervention is often needed to evaluate terminology translation in MT, which, by nature, is a time-consuming and highly expensive task. In fact, this is unimaginable in an industrial setting where customised MT systems are often needed to be updated for many reasons (e.g. availability of new training data or leading MT techniques). Hence, there is a genuine need to have a faster and less expensive solution to this problem, which could aid the end-users to instantly identify term translation problems in MT. In this study, we propose an automatic evaluation metric, TermEval, for evaluating terminology translation in MT. To the best of our knowledge, there is no gold-standard dataset available for measuring terminology translation quality in MT. In the absence of gold standard evaluation test set, we semi-automatically create a gold-standard dataset from English--Hindi judicial domain parallel corpus. We trained state-of-the-art phrase-based SMT (PB-SMT) and neural MT (NMT) models on two translation directions: English-to-Hindi and Hindi-to-English, and use TermEval to evaluate their performance on terminology translation over the created gold standard test set. In order to measure the correlation between TermEval scores and human judgments, translations of each source terms (of the gold standard test set) is validated with human evaluator. High correlation between TermEval and human judgements manifests the effectiveness of the proposed terminology translation evaluation metric. We also carry out comprehensive manual evaluation on terminology translation and present our observations

    Video analytics for security systems

    Get PDF
    This study has been conducted to develop robust event detection and object tracking algorithms that can be implemented in real time video surveillance applications. The aim of the research has been to produce an automated video surveillance system that is able to detect and report potential security risks with minimum human intervention. Since the algorithms are designed to be implemented in real-life scenarios, they must be able to cope with strong illumination changes and occlusions. The thesis is divided into two major sections. The first section deals with event detection and edge based tracking while the second section describes colour measurement methods developed to track objects in crowded environments. The event detection methods presented in the thesis mainly focus on detection and tracking of objects that become stationary in the scene. Objects such as baggage left in public places or vehicles parked illegally can cause a serious security threat. A new pixel based classification technique has been developed to detect objects of this type in cluttered scenes. Once detected, edge based object descriptors are obtained and stored as templates for tracking purposes. The consistency of these descriptors is examined using an adaptive edge orientation based technique. Objects are tracked and alarm events are generated if the objects are found to be stationary in the scene after a certain period of time. To evaluate the full capabilities of the pixel based classification and adaptive edge orientation based tracking methods, the model is tested using several hours of real-life video surveillance scenarios recorded at different locations and time of day from our own and publically available databases (i-LIDS, PETS, MIT, ViSOR). The performance results demonstrate that the combination of pixel based classification and adaptive edge orientation based tracking gave over 95% success rate. The results obtained also yield better detection and tracking results when compared with the other available state of the art methods. In the second part of the thesis, colour based techniques are used to track objects in crowded video sequences in circumstances of severe occlusion. A novel Adaptive Sample Count Particle Filter (ASCPF) technique is presented that improves the performance of the standard Sample Importance Resampling Particle Filter by up to 80% in terms of computational cost. An appropriate particle range is obtained for each object and the concept of adaptive samples is introduced to keep the computational cost down. The objective is to keep the number of particles to a minimum and only to increase them up to the maximum, as and when required. Variable standard deviation values for state vector elements have been exploited to cope with heavy occlusion. The technique has been tested on different video surveillance scenarios with variable object motion, strong occlusion and change in object scale. Experimental results show that the proposed method not only tracks the object with comparable accuracy to existing particle filter techniques but is up to five times faster. Tracking objects in a multi camera environment is discussed in the final part of the thesis. The ASCPF technique is deployed within a multi-camera environment to track objects across different camera views. Such environments can pose difficult challenges such as changes in object scale and colour features as the objects move from one camera view to another. Variable standard deviation values of the ASCPF have been utilized in order to cope with sudden colour and scale changes. As the object moves from one scene to another, the number of particles, together with the spread value, is increased to a maximum to reduce any effects of scale and colour change. Promising results are obtained when the ASCPF technique is tested on live feeds from four different camera views. It was found that not only did the ASCPF method result in the successful tracking of the moving object across different views but also maintained the real time frame rate due to its reduced computational cost thus indicating that the method is a potential practical solution for multi camera tracking applications

    Recent Developments in Video Surveillance

    Get PDF
    With surveillance cameras installed everywhere and continuously streaming thousands of hours of video, how can that huge amount of data be analyzed or even be useful? Is it possible to search those countless hours of videos for subjects or events of interest? Shouldn’t the presence of a car stopped at a railroad crossing trigger an alarm system to prevent a potential accident? In the chapters selected for this book, experts in video surveillance provide answers to these questions and other interesting problems, skillfully blending research experience with practical real life applications. Academic researchers will find a reliable compilation of relevant literature in addition to pointers to current advances in the field. Industry practitioners will find useful hints about state-of-the-art applications. The book also provides directions for open problems where further advances can be pursued

    Visual Human Tracking and Group Activity Analysis: A Video Mining System for Retail Marketing

    Get PDF
    Thesis (PhD) - Indiana University, Computer Sciences, 2007In this thesis we present a system for automatic human tracking and activity recognition from video sequences. The problem of automated analysis of visual information in order to derive descriptors of high level human activities has intrigued computer vision community for decades and is considered to be largely unsolved. A part of this interest is derived from the vast range of applications in which such a solution may be useful. We attempt to find efficient formulations of these tasks as applied to the extracting customer behavior information in a retail marketing context. Based on these formulations, we present a system that visually tracks customers in a retail store and performs a number of activity analysis tasks based on the output from the tracker. In tracking we introduce new techniques for pedestrian detection, initialization of the body model and a formulation of the temporal tracking as a global trans-dimensional optimization problem. Initial human detection is addressed by a novel method for head detection, which incorporates the knowledge of the camera projection model.The initialization of the human body model is addressed by newly developed shape and appearance descriptors. Temporal tracking of customer trajectories is performed by employing a human body tracking system designed as a Bayesian jump-diffusion filter. This approach demonstrates the ability to overcome model dimensionality ambiguities as people are leaving and entering the scene. Following the tracking, we developed a two-stage group activity formulation based upon the ideas from swarming research. For modeling purposes, all moving actors in the scene are viewed here as simplistic agents in the swarm. This allows to effectively define a set of inter-agent interactions, which combine to derive a distance metric used in further swarm clustering. This way, in the first stage the shoppers that belong to the same group are identified by deterministically clustering bodies to detect short term events and in the second stage events are post-processed to form clusters of group activities with fuzzy memberships. Quantitative analysis of the tracking subsystem shows an improvement over the state of the art methods, if used under similar conditions. Finally, based on the output from the tracker, the activity recognition procedure achieves over 80% correct shopper group detection, as validated by the human generated ground truth results

    Adaptive detection and tracking using multimodal information

    Get PDF
    This thesis describes work on fusing data from multiple sources of information, and focuses on two main areas: adaptive detection and adaptive object tracking in automated vision scenarios. The work on adaptive object detection explores a new paradigm in dynamic parameter selection, by selecting thresholds for object detection to maximise agreement between pairs of sources. Object tracking, a complementary technique to object detection, is also explored in a multi-source context and an efficient framework for robust tracking, termed the Spatiogram Bank tracker, is proposed as a means to overcome the difficulties of traditional histogram tracking. As well as performing theoretical analysis of the proposed methods, specific example applications are given for both the detection and the tracking aspects, using thermal infrared and visible spectrum video data, as well as other multi-modal information sources

    Automatic road network extraction from high resolution satellite imagery using spectral classification methods

    Get PDF
    Road networks play an important role in a number of geospatial applications, such as cartographic, infrastructure planning and traffic routing software. Automatic and semi-automatic road network extraction techniques have significantly increased the extraction rate of road networks. Automated processes still yield some erroneous and incomplete results and costly human intervention is still required to evaluate results and correct errors. With the aim of improving the accuracy of road extraction systems, three objectives are defined in this thesis: Firstly, the study seeks to develop a flexible semi-automated road extraction system, capable of extracting roads from QuickBird satellite imagery. The second objective is to integrate a variety of algorithms within the road network extraction system. The benefits of using each of these algorithms within the proposed road extraction system, is illustrated. Finally, a fully automated system is proposed by incorporating a number of the algorithms investigated throughout the thesis. CopyrightDissertation (MSc)--University of Pretoria, 2010.Computer Scienceunrestricte
    corecore