308 research outputs found

    A vision system for mobile maritime surveillance platforms

    Get PDF
    Mobile surveillance systems play an important role to minimise security and safety threats in high-risk or hazardous environments. Providing a mobile marine surveillance platform with situational awareness of its environment is important for mission success. An essential part of situational awareness is the ability to detect and subsequently track potential target objects.Typically, the exact type of target objects is unknown, hence detection is addressed as a problem of finding parts of an image that stand out in relation to their surrounding regions or are atypical to the domain. Contrary to existing saliency methods, this thesis proposes the use of a domain specific visual attention approach for detecting potential regions of interest in maritime imagery. For this, low-level features that are indicative of maritime targets are identified. These features are then evaluated with respect to their local, regional, and global significance. Together with a domain specific background segmentation technique, the features are combined in a Bayesian classifier to direct visual attention to potential target objects.The maritime environment introduces challenges to the camera system: gusts, wind, swell, or waves can cause the platform to move drastically and unpredictably. Pan-tilt-zoom cameras that are often utilised for surveillance tasks can adjusting their orientation to provide a stable view onto the target. However, in rough maritime environments this requires high-speed and precise inputs. In contrast, omnidirectional cameras provide a full spherical view, which allows the acquisition and tracking of multiple targets at the same time. However, the target itself only occupies a small fraction of the overall view. This thesis proposes a novel, target-centric approach for image stabilisation. A virtual camera is extracted from the omnidirectional view for each target and is adjusted based on the measurements of an inertial measurement unit and an image feature tracker. The combination of these two techniques in a probabilistic framework allows for stabilisation of rotational and translational ego-motion. Furthermore, it has the specific advantage of being robust to loosely calibrated and synchronised hardware since the fusion of tracking and stabilisation means that tracking uncertainty can be used to compensate for errors in calibration and synchronisation. This then completely eliminates the need for tedious calibration phases and the adverse effects of assembly slippage over time.Finally, this thesis combines the visual attention and omnidirectional stabilisation frameworks and proposes a multi view tracking system that is capable of detecting potential target objects in the maritime domain. Although the visual attention framework performed well on the benchmark datasets, the evaluation on real-world maritime imagery produced a high number of false positives. An investigation reveals that the problem is that benchmark data sets are unconsciously being influenced by human shot selection, which greatly simplifies the problem of visual attention. Despite the number of false positives, the tracking approach itself is robust even if a high number of false positives are tracked

    Automatic Food Intake Assessment Using Camera Phones

    Get PDF
    Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user\u27s memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies. In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation. This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs. A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation. To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors

    RD-VIO: Robust Visual-Inertial Odometry for Mobile Augmented Reality in Dynamic Environments

    Full text link
    It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation. In this work, we design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these two problems. Firstly, we propose an IMU-PARSAC algorithm which can robustly detect and match keypoints in a two-stage process. In the first state, landmarks are matched with new keypoints using visual and IMU measurements. We collect statistical information from the matching and then guide the intra-keypoint matching in the second stage. Secondly, to handle the problem of pure rotation, we detect the motion type and adapt the deferred-triangulation technique during the data-association process. We make the pure-rotational frames into the special subframes. When solving the visual-inertial bundle adjustment, they provide additional constraints to the pure-rotational motion. We evaluate the proposed VIO system on public datasets. Experiments show the proposed RD-VIO has obvious advantages over other methods in dynamic environments

    Multimodal, Embodied and Location-Aware Interaction

    Get PDF
    This work demonstrates the development of mobile, location-aware, eyes-free applications which utilise multiple sensors to provide a continuous, rich and embodied interaction. We bring together ideas from the fields of gesture recognition, continuous multimodal interaction, probability theory and audio interfaces to design and develop location-aware applications and embodied interaction in both a small-scale, egocentric body-based case and a large-scale, exocentric `world-based' case. BodySpace is a gesture-based application, which utilises multiple sensors and pattern recognition enabling the human body to be used as the interface for an application. As an example, we describe the development of a gesture controlled music player, which functions by placing the device at different parts of the body. We describe a new approach to the segmentation and recognition of gestures for this kind of application and show how simulated physical model-based interaction techniques and the use of real world constraints can shape the gestural interaction. GpsTunes is a mobile, multimodal navigation system equipped with inertial control that enables users to actively explore and navigate through an area in an augmented physical space, incorporating and displaying uncertainty resulting from inaccurate sensing and unknown user intention. The system propagates uncertainty appropriately via Monte Carlo sampling and output is displayed both visually and in audio, with audio rendered via granular synthesis. We demonstrate the use of uncertain prediction in the real world and show that appropriate display of the full distribution of potential future user positions with respect to sites-of-interest can improve the quality of interaction over a simplistic interpretation of the sensed data. We show that this system enables eyes-free navigation around set trajectories or paths unfamiliar to the user for varying trajectory width and context. We demon- strate the possibility to create a simulated model of user behaviour, which may be used to gain an insight into the user behaviour observed in our field trials. The extension of this application to provide a general mechanism for highly interactive context aware applications via density exploration is also presented. AirMessages is an example application enabling users to take an embodied approach to scanning a local area to find messages left in their virtual environment

    Multimodal, Embodied and Location-Aware Interaction

    Get PDF
    This work demonstrates the development of mobile, location-aware, eyes-free applications which utilise multiple sensors to provide a continuous, rich and embodied interaction. We bring together ideas from the fields of gesture recognition, continuous multimodal interaction, probability theory and audio interfaces to design and develop location-aware applications and embodied interaction in both a small-scale, egocentric body-based case and a large-scale, exocentric `world-based' case. BodySpace is a gesture-based application, which utilises multiple sensors and pattern recognition enabling the human body to be used as the interface for an application. As an example, we describe the development of a gesture controlled music player, which functions by placing the device at different parts of the body. We describe a new approach to the segmentation and recognition of gestures for this kind of application and show how simulated physical model-based interaction techniques and the use of real world constraints can shape the gestural interaction. GpsTunes is a mobile, multimodal navigation system equipped with inertial control that enables users to actively explore and navigate through an area in an augmented physical space, incorporating and displaying uncertainty resulting from inaccurate sensing and unknown user intention. The system propagates uncertainty appropriately via Monte Carlo sampling and output is displayed both visually and in audio, with audio rendered via granular synthesis. We demonstrate the use of uncertain prediction in the real world and show that appropriate display of the full distribution of potential future user positions with respect to sites-of-interest can improve the quality of interaction over a simplistic interpretation of the sensed data. We show that this system enables eyes-free navigation around set trajectories or paths unfamiliar to the user for varying trajectory width and context. We demon- strate the possibility to create a simulated model of user behaviour, which may be used to gain an insight into the user behaviour observed in our field trials. The extension of this application to provide a general mechanism for highly interactive context aware applications via density exploration is also presented. AirMessages is an example application enabling users to take an embodied approach to scanning a local area to find messages left in their virtual environment

    Enriching remote labs with computer vision and drones

    Get PDF
    165 p.With the technological advance, new learning technologies are being developed in order to contribute to better learning experience. In particular, remote labs constitute an interesting and a practical way that can motivate nowadays students to learn. The studen can at anytime, and from anywhere, access the remote lab and do his lab-work. Despite many advantages, remote tecnologies in education create a distance between the student and the teacher. Without the presence of a teacher, students can have difficulties, if no appropriate interventions can be taken to help them. In this thesis, we aim to enrich an existing remote electronic lab made for engineering students called "LaboREM" (for remote Laboratory) in two ways: first we enable the student to send high level commands to a mini-drone available in the remote lab facility. The objective is to examine the front panels of electronic measurement instruments, by the camera embedded on the drone. Furthermore, we allow remote student-teacher communication using the drone, in case there is a teacher present in the remote lab facility. Finally, the drone has to go back home when the mission is over to land on a platform for automatic recharge of the batteries. Second, we propose an automatic system that estimates the affective state of the student (frustrated/confused/flow) in order to take appropriate interventions to ensure good learning outcomes. For example, if the studen is having major difficulties we can try to give him hints or to reduce the difficulty level of the lab experiment. We propose to do this by using visual cues (head pose estimation and facil expression analysis). Many evidences on the state of the student can be acquired, however these evidences are incomplete, sometims inaccurate, and do not cover all the aspects of the state of the student alone. This is why we propose to fuse evidences using the theory of Dempster-Shafer that allows the fusion of incomplete evidence

    Object-Aware Tracking and Mapping

    Get PDF
    Reasoning about geometric properties of digital cameras and optical physics enabled researchers to build methods that localise cameras in 3D space from a video stream, while – often simultaneously – constructing a model of the environment. Related techniques have evolved substantially since the 1980s, leading to increasingly accurate estimations. Traditionally, however, the quality of results is strongly affected by the presence of moving objects, incomplete data, or difficult surfaces – i.e. surfaces that are not Lambertian or lack texture. One insight of this work is that these problems can be addressed by going beyond geometrical and optical constraints, in favour of object level and semantic constraints. Incorporating specific types of prior knowledge in the inference process, such as motion or shape priors, leads to approaches with distinct advantages and disadvantages. After introducing relevant concepts in Chapter 1 and Chapter 2, methods for building object-centric maps in dynamic environments using motion priors are investigated in Chapter 5. Chapter 6 addresses the same problem as Chapter 5, but presents an approach which relies on semantic priors rather than motion cues. To fully exploit semantic information, Chapter 7 discusses the conditioning of shape representations on prior knowledge and the practical application to monocular, object-aware reconstruction systems
    • …
    corecore