1,370 research outputs found

    Active Perception by Interaction with Other Agents in a Predictive Coding Framework: Application to Internet of Things Environment

    Get PDF
    Predicting the state of an agent\u27s partially-observable environment is a problem of interest in many domains. Typically in the real world, the environment consists of multiple agents, not necessarily working towards a common goal. Though the goal and sensory observation for each agent is unique, one agent might have acquired some knowledge that may benefit the other. In essence, the knowledge base regarding the environment is distributed among the agents. An agent can sample this distributed knowledge base by communicating with other agents. Since an agent is not storing the entire knowledge base, its model can be small and its inference can be efficient and fault-tolerant. However, the agent needs to learn -- when, with whom and what -- to communicate (in general interact) under different situations.This dissertation presents an agent model that actively and selectively communicates with other agents to predict the state of its environment efficiently. Communication is a challenge when the internal models of other agents is unknown and unobservable. The proposed agent learns communication policies as mappings from its belief state to when, with whom and what to communicate. The policies are learned using predictive coding in an online manner, without any reinforcement. The proposed agent model is evaluated on widely-studied applications, such as human activity recognition from multimodal, multisource and heterogeneous sensor data, and transferring knowledge across sensor networks. In the applications, either each sensor or each sensor network is assumed to be monitored by an agent. The recognition accuracy on benchmark datasets is comparable to the state-of-the-art, even though our model has significantly fewer parameters and infers the state in a localized manner. The learned policy reduces number of communications. The agent is tolerant to communication failures and can recognize the reliability of each agent from its communication messages. To the best of our knowledge, this is the first work on learning communication policies by an agent for predicting the state of its environment

    Multi-sensor human action recognition with particular application to tennis event-based indexing

    Get PDF
    The ability to automatically classify human actions and activities using vi- sual sensors or by analysing body worn sensor data has been an active re- search area for many years. Only recently with advancements in both fields and the ubiquitous nature of low cost sensors in our everyday lives has auto- matic human action recognition become a reality. While traditional sports coaching systems rely on manual indexing of events from a single modality, such as visual or inertial sensors, this thesis investigates the possibility of cap- turing and automatically indexing events from multimodal sensor streams. In this work, we detail a novel approach to infer human actions by fusing multimodal sensors to improve recognition accuracy. State of the art visual action recognition approaches are also investigated. Firstly we apply these action recognition detectors to basic human actions in a non-sporting con- text. We then perform action recognition to infer tennis events in a tennis court instrumented with cameras and inertial sensing infrastructure. The system proposed in this thesis can use either visual or inertial sensors to au- tomatically recognise the main tennis events during play. A complete event retrieval system is also presented to allow coaches to build advanced queries, which existing sports coaching solutions cannot facilitate, without an inordi- nate amount of manual indexing. The event retrieval interface is evaluated against a leading commercial sports coaching tool in terms of both usability and efficiency

    Multimodal, Embodied and Location-Aware Interaction

    Get PDF
    This work demonstrates the development of mobile, location-aware, eyes-free applications which utilise multiple sensors to provide a continuous, rich and embodied interaction. We bring together ideas from the fields of gesture recognition, continuous multimodal interaction, probability theory and audio interfaces to design and develop location-aware applications and embodied interaction in both a small-scale, egocentric body-based case and a large-scale, exocentric `world-based' case. BodySpace is a gesture-based application, which utilises multiple sensors and pattern recognition enabling the human body to be used as the interface for an application. As an example, we describe the development of a gesture controlled music player, which functions by placing the device at different parts of the body. We describe a new approach to the segmentation and recognition of gestures for this kind of application and show how simulated physical model-based interaction techniques and the use of real world constraints can shape the gestural interaction. GpsTunes is a mobile, multimodal navigation system equipped with inertial control that enables users to actively explore and navigate through an area in an augmented physical space, incorporating and displaying uncertainty resulting from inaccurate sensing and unknown user intention. The system propagates uncertainty appropriately via Monte Carlo sampling and output is displayed both visually and in audio, with audio rendered via granular synthesis. We demonstrate the use of uncertain prediction in the real world and show that appropriate display of the full distribution of potential future user positions with respect to sites-of-interest can improve the quality of interaction over a simplistic interpretation of the sensed data. We show that this system enables eyes-free navigation around set trajectories or paths unfamiliar to the user for varying trajectory width and context. We demon- strate the possibility to create a simulated model of user behaviour, which may be used to gain an insight into the user behaviour observed in our field trials. The extension of this application to provide a general mechanism for highly interactive context aware applications via density exploration is also presented. AirMessages is an example application enabling users to take an embodied approach to scanning a local area to find messages left in their virtual environment

    Multimodal, Embodied and Location-Aware Interaction

    Get PDF
    This work demonstrates the development of mobile, location-aware, eyes-free applications which utilise multiple sensors to provide a continuous, rich and embodied interaction. We bring together ideas from the fields of gesture recognition, continuous multimodal interaction, probability theory and audio interfaces to design and develop location-aware applications and embodied interaction in both a small-scale, egocentric body-based case and a large-scale, exocentric `world-based' case. BodySpace is a gesture-based application, which utilises multiple sensors and pattern recognition enabling the human body to be used as the interface for an application. As an example, we describe the development of a gesture controlled music player, which functions by placing the device at different parts of the body. We describe a new approach to the segmentation and recognition of gestures for this kind of application and show how simulated physical model-based interaction techniques and the use of real world constraints can shape the gestural interaction. GpsTunes is a mobile, multimodal navigation system equipped with inertial control that enables users to actively explore and navigate through an area in an augmented physical space, incorporating and displaying uncertainty resulting from inaccurate sensing and unknown user intention. The system propagates uncertainty appropriately via Monte Carlo sampling and output is displayed both visually and in audio, with audio rendered via granular synthesis. We demonstrate the use of uncertain prediction in the real world and show that appropriate display of the full distribution of potential future user positions with respect to sites-of-interest can improve the quality of interaction over a simplistic interpretation of the sensed data. We show that this system enables eyes-free navigation around set trajectories or paths unfamiliar to the user for varying trajectory width and context. We demon- strate the possibility to create a simulated model of user behaviour, which may be used to gain an insight into the user behaviour observed in our field trials. The extension of this application to provide a general mechanism for highly interactive context aware applications via density exploration is also presented. AirMessages is an example application enabling users to take an embodied approach to scanning a local area to find messages left in their virtual environment

    Automatic Food Intake Assessment Using Camera Phones

    Get PDF
    Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user\u27s memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies. In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation. This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs. A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation. To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors
    corecore