7 research outputs found

    SenseCam image localisation using hierarchical SURF trees

    Get PDF
    The SenseCam is a wearable camera that automatically takes photos of the wearer's activities, generating thousands of images per day. Automatically organising these images for efficient search and retrieval is a challenging task, but can be simplified by providing semantic information with each photo, such as the wearer's location during capture time. We propose a method for automatically determining the wearer's location using an annotated image database, described using SURF interest point descriptors. We show that SURF out-performs SIFT in matching SenseCam images and that matching can be done efficiently using hierarchical trees of SURF descriptors. Additionally, by re-ranking the top images using bi-directional SURF matches, location matching performance is improved further

    Multimodal identification of journeys

    Get PDF
    Due to the ubiquity of localisation technology, users now have the ability to keep a record of their own location, as a kind of ‘location diary’. Such a large collection of data can become unmanageable without some way to structure that data to make it useful and searchable. We address this problem of structuring location data by proposing a framework for classifying the data into often-traversed routes. In this work, commonly traversed routes are identified with clusters based on sensed data. Our framework does not rely on any one source of location information, but can fuse data from multimodal localisation sources. We demonstrate the effectiveness of our algorithm by examining the combination of GPS, wireless signal strength readings and image-matching on very challenging data in a variety of environmental conditions. By fusing these three modalities we obtained better performance than any individual or combination of two modalities. As it can be orientated towards the needs and capabilities of the user based on context, this method becomes useful for some ambient assisted living applications

    Organising and structuring a visual diary using visual interest point detectors

    Get PDF
    As wearable cameras become more popular, researchers are increasingly focusing on novel applications to manage the large volume of data these devices produce. One such application is the construction of a Visual Diary from an individual’s photographs. Microsoft’s SenseCam, a device designed to passively record a Visual Diary and cover a typical day of the user wearing the camera, is an example of one such device. The vast quantity of images generated by these devices means that the management and organisation of these collections is not a trivial matter. We believe wearable cameras, such as SenseCam, will become more popular in the future and the management of the volume of data generated by these devices is a key issue. Although there is a significant volume of work in the literature in the object detection and recognition and scene classification fields, there is little work in the area of setting detection. Furthermore, few authors have examined the issues involved in analysing extremely large image collections (like a Visual Diary) gathered over a long period of time. An algorithm developed for setting detection should be capable of clustering images captured at the same real world locations (e.g. in the dining room at home, in front of the computer in the office, in the park, etc.). This requires the selection and implementation of suitable methods to identify visually similar backgrounds in images using their visual features. We present a number of approaches to setting detection based on the extraction of visual interest point detectors from the images. We also analyse the performance of two of the most popular descriptors - Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF).We present an implementation of a Visual Diary application and evaluate its performance via a series of user experiments. Finally, we also outline some techniques to allow the Visual Diary to automatically detect new settings, to scale as the image collection continues to grow substantially over time, and to allow the user to generate a personalised summary of their data

    Modeling the environment with egocentric vision systems

    Get PDF
    Cada vez más sistemas autónomos, ya sean robots o sistemas de asistencia, están presentes en nuestro día a día. Este tipo de sistemas interactúan y se relacionan con su entorno y para ello necesitan un modelo de dicho entorno. En función de las tareas que deben realizar, la información o el detalle necesario del modelo varía. Desde detallados modelos 3D para sistemas de navegación autónomos, a modelos semánticos que incluyen información importante para el usuario como el tipo de área o qué objetos están presentes. La creación de estos modelos se realiza a través de las lecturas de los distintos sensores disponibles en el sistema. Actualmente, gracias a su pequeño tamaño, bajo precio y la gran información que son capaces de capturar, las cámaras son sensores incluidos en todos los sistemas autónomos. El objetivo de esta tesis es el desarrollar y estudiar nuevos métodos para la creación de modelos del entorno a distintos niveles semánticos y con distintos niveles de precisión. Dos puntos importantes caracterizan el trabajo desarrollado en esta tesis: - El uso de cámaras con punto de vista egocéntrico o en primera persona ya sea en un robot o en un sistema portado por el usuario (wearable). En este tipo de sistemas, las cámaras son solidarias al sistema móvil sobre el que van montadas. En los últimos años han aparecido muchos sistemas de visión wearables, utilizados para multitud de aplicaciones, desde ocio hasta asistencia de personas. - El uso de sistemas de visión omnidireccional, que se distinguen por su gran campo de visión, incluyendo mucha más información en cada imagen que las cámara convencionales. Sin embargo plantean nuevas dificultades debido a distorsiones y modelos de proyección más complejos. Esta tesis estudia distintos tipos de modelos del entorno: - Modelos métricos: el objetivo de estos modelos es crear representaciones detalladas del entorno en las que localizar con precisión el sistema autónomo. Ésta tesis se centra en la adaptación de estos modelos al uso de visión omnidireccional, lo que permite capturar más información en cada imagen y mejorar los resultados en la localización. - Modelos topológicos: estos modelos estructuran el entorno en nodos conectados por arcos. Esta representación tiene menos precisión que la métrica, sin embargo, presenta un nivel de abstracción mayor y puede modelar el entorno con más riqueza. %, por ejemplo incluyendo el tipo de área de cada nodo, la localización de objetos importantes o el tipo de conexión entre los distintos nodos. Esta tesis se centra en la creación de modelos topológicos con información adicional sobre el tipo de área de cada nodo y conexión (pasillo, habitación, puertas, escaleras...). - Modelos semánticos: este trabajo también contribuye en la creación de nuevos modelos semánticos, más enfocados a la creación de modelos para aplicaciones en las que el sistema interactúa o asiste a una persona. Este tipo de modelos representan el entorno a través de conceptos cercanos a los usados por las personas. En particular, esta tesis desarrolla técnicas para obtener y propagar información semántica del entorno en secuencias de imágen

    A framework for automated landmark recognition in community contributed image corpora

    Get PDF
    Any large library of information requires efficient ways to organise it and methods that allow people to access information efficiently and collections of digital images are no exception. Automatically creating high-level semantic tags based on image content is difficult, if not impossible to achieve accurately. In this thesis a framework is presented that allows for the automatic creation of rich and accurate tags for images with landmarks as the main object. This framework uses state of the art computer vision techniques fused with the wide range of contextual information that is available with community contributed imagery. Images are organised into clusters based on image content and spatial data associated with each image. Based on these clusters different types of classifiers are* trained to recognise landmarks contained within the images in each cluster. A novel hybrid approach is proposed combining these classifiers with an hierarchical matching approach to allow near real-time classification and captioning of images containing landmarks

    A Survey of Augmented Reality

    Get PDF
    © 2015 M. Billinghurst, A. Clark, and G. Lee. This survey summarizes almost 50 years of research and development in the field of Augmented Reality (AR). From early research in the 1960's until widespread availability by the 2010's there has been steady progress towards the goal of being able to seamlessly combine real and virtual worlds. We provide an overview of the common definitions of AR, and show how AR fits into taxonomies of other related technologies. A history of important milestones in Augmented Reality is followed by sections on the key enabling technologies of tracking, display and input devices. We also review design guidelines and provide some examples of successful AR applications. Finally, we conclude with a summary of directions for future work and a review of some of the areas that are currently being researched

    Computer Vision Algorithms for Mobile Camera Applications

    Get PDF
    Wearable and mobile sensors have found widespread use in recent years due to their ever-decreasing cost, ease of deployment and use, and ability to provide continuous monitoring as opposed to sensors installed at fixed locations. Since many smart phones are now equipped with a variety of sensors, including accelerometer, gyroscope, magnetometer, microphone and camera, it has become more feasible to develop algorithms for activity monitoring, guidance and navigation of unmanned vehicles, autonomous driving and driver assistance, by using data from one or more of these sensors. In this thesis, we focus on multiple mobile camera applications, and present lightweight algorithms suitable for embedded mobile platforms. The mobile camera scenarios presented in the thesis are: (i) activity detection and step counting from wearable cameras, (ii) door detection for indoor navigation of unmanned vehicles, and (iii) traffic sign detection from vehicle-mounted cameras. First, we present a fall detection and activity classification system developed for embedded smart camera platform CITRIC. In our system, the camera platform is worn by the subject, as opposed to static sensors installed at fixed locations in certain rooms, and, therefore, monitoring is not limited to confined areas, and extends to wherever the subject may travel including indoors and outdoors. Next, we present a real-time smart phone-based fall detection system, wherein we implement camera and accelerometer based fall-detection on Samsung Galaxy S™ 4. We fuse these two sensor modalities to have a more robust fall detection system. Then, we introduce a fall detection algorithm with autonomous thresholding using relative-entropy within the class of Ali-Silvey distance measures. As another wearable camera application, we present a footstep counting algorithm using a smart phone camera. This algorithm provides more accurate step-count compared to using only accelerometer data in smart phones and smart watches at various body locations. As a second mobile camera scenario, we study autonomous indoor navigation of unmanned vehicles. A novel approach is proposed to autonomously detect and verify doorway openings by using the Google Project Tango™ platform. The third mobile camera scenario involves vehicle-mounted cameras. More specifically, we focus on traffic sign detection from lower-resolution and noisy videos captured from vehicle-mounted cameras. We present a new method for accurate traffic sign detection, incorporating Aggregate Channel Features and Chain Code Histograms, with the goal of providing much faster training and testing, and comparable or better performance, with respect to deep neural network approaches, without requiring specialized processors. Proposed computer vision algorithms provide promising results for various useful applications despite the limited energy and processing capabilities of mobile devices
    corecore