60 research outputs found

    e-TLD: Event-based Framework for Dynamic Object Tracking

    Full text link
    This paper presents a long-term object tracking framework with a moving event camera under general tracking conditions. A first of its kind for these revolutionary cameras, the tracking framework uses a discriminative representation for the object with online learning, and detects and re-tracks the object when it comes back into the field-of-view. One of the key novelties is the use of an event-based local sliding window technique that tracks reliably in scenes with cluttered and textured background. In addition, Bayesian bootstrapping is used to assist real-time processing and boost the discriminative power of the object representation. On the other hand, when the object re-enters the field-of-view of the camera, a data-driven, global sliding window detector locates the object for subsequent tracking. Extensive experiments demonstrate the ability of the proposed framework to track and detect arbitrary objects of various shapes and sizes, including dynamic objects such as a human. This is a significant improvement compared to earlier works that simply track objects as long as they are visible under simpler background settings. Using the ground truth locations for five different objects under three motion settings, namely translation, rotation and 6-DOF, quantitative measurement is reported for the event-based tracking framework with critical insights on various performance issues. Finally, real-time implementation in C++ highlights tracking ability under scale, rotation, view-point and occlusion scenarios in a lab setting.Comment: 11 pages, 10 figure

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Contemporary Robotics

    Get PDF
    This book book is a collection of 18 chapters written by internationally recognized experts and well-known professionals of the field. Chapters contribute to diverse facets of contemporary robotics and autonomous systems. The volume is organized in four thematic parts according to the main subjects, regarding the recent advances in the contemporary robotics. The first thematic topics of the book are devoted to the theoretical issues. This includes development of algorithms for automatic trajectory generation using redudancy resolution scheme, intelligent algorithms for robotic grasping, modelling approach for reactive mode handling of flexible manufacturing and design of an advanced controller for robot manipulators. The second part of the book deals with different aspects of robot calibration and sensing. This includes a geometric and treshold calibration of a multiple robotic line-vision system, robot-based inline 2D/3D quality monitoring using picture-giving and laser triangulation, and a study on prospective polymer composite materials for flexible tactile sensors. The third part addresses issues of mobile robots and multi-agent systems, including SLAM of mobile robots based on fusion of odometry and visual data, configuration of a localization system by a team of mobile robots, development of generic real-time motion controller for differential mobile robots, control of fuel cells of mobile robots, modelling of omni-directional wheeled-based robots, building of hunter- hybrid tracking environment, as well as design of a cooperative control in distributed population-based multi-agent approach. The fourth part presents recent approaches and results in humanoid and bioinspirative robotics. It deals with design of adaptive control of anthropomorphic biped gait, building of dynamic-based simulation for humanoid robot walking, building controller for perceptual motor control dynamics of humans and biomimetic approach to control mechatronic structure using smart materials

    An Unsupervised Approach to Modelling Visual Data

    Get PDF
    For very large visual datasets, producing expert ground-truth data for training supervised algorithms can represent a substantial human effort. In these situations there is scope for the use of unsupervised approaches that can model collections of images and automatically summarise their content. The primary motivation for this thesis comes from the problem of labelling large visual datasets of the seafloor obtained by an Autonomous Underwater Vehicle (AUV) for ecological analysis. It is expensive to label this data, as taxonomical experts for the specific region are required, whereas automatically generated summaries can be used to focus the efforts of experts, and inform decisions on additional sampling. The contributions in this thesis arise from modelling this visual data in entirely unsupervised ways to obtain comprehensive visual summaries. Firstly, popular unsupervised image feature learning approaches are adapted to work with large datasets and unsupervised clustering algorithms. Next, using Bayesian models the performance of rudimentary scene clustering is boosted by sharing clusters between multiple related datasets, such as regular photo albums or AUV surveys. These Bayesian scene clustering models are extended to simultaneously cluster sub-image segments to form unsupervised notions of “objects” within scenes. The frequency distribution of these objects within scenes is used as the scene descriptor for simultaneous scene clustering. Finally, this simultaneous clustering model is extended to make use of whole image descriptors, which encode rudimentary spatial information, as well as object frequency distributions to describe scenes. This is achieved by unifying the previously presented Bayesian clustering models, and in so doing rectifies some of their weaknesses and limitations. Hence, the final contribution of this thesis is a practical unsupervised algorithm for modelling images from the super-pixel to album levels, and is applicable to large datasets

    Design and Evaluation of Compression, Classification and Localization Schemes for Various IoT Applications

    Get PDF
    Nowadays we are surrounded by a huge number of objects able to communicate, read information such as temperature, light or humidity, and infer new information through ex- changing data. These kinds of objects are not limited to high-tech devices, such as desktop PC, laptop, new generation mobile phone, i.e. smart phone, and others with high capabilities, but also include commonly used object, such as ID cards, driver license, clocks, etc. that can made smart by allowing them to communicate. Thus, the analog world of just a few years ago is becoming the a digital world of the Inter- net of Things (IoT), where the information from a single object can be retrieved from the Internet. The IoT paradigm opens several architectural challenges, including self-organization, self-managing, self-deployment of the smart objects, as well as the problem of how to minimize the usage of the limited resources of each device. The concept of IoT covers a lot of communication paradigms such as WiFi, Radio Frequency Identification (RFID), and Wireless Sensor Network (WSN). Each paradigm can be thought of as an IoT island where each device can communicate directly with other devices. The thesis is divided in sections in order to cover each problem mentioned above. The first step is to understand the possibility to infer new knowledge from the deployed device in a scenario. For this reason, the research is focused on the web semantic, web 3.0, to assign a semantic meaning to each thing inside the architecture. The sole semantic concept is unusable to infer new information from the data gathered; in fact, it is necessary to organize the data through a hierarchical form defined by an Ontology. Through the exploitation of the Ontology, it is possible to apply semantic engine reasoners to infer new knowledge about the network. The second step of the dissertation deals with the minimization of the usage of every node in a WSN. The main purpose of each node is to collect environmental data and to exchange hem with other nodes. To minimize battery consumption, it is necessary to limit the radio usage. Therefore, we implemented Razor, a new lightweight algorithm which is expected to improve data compression and classification by leveraging on the advantages offered by data mining methods for optimizing communications and by enhancing information transmission to simplify data classification. Data compression is performed studying the well-know Vector Quantization (VQ) theory in order to create the codebooks necessary for signal compression. At the same time, it is requested to give a semantic meaning to un- known signals. In this way, the codebook feature is able not only to compress the signals, but also to classify unknown signals. Razor is compared with both state-of-the-art compression and signal classification techniques for WSN . The third part of the thesis covers the concept of smart object applied to Robotic research. A critical issue is how a robot can localize and retrieve smart objects in a real scenario without any prior knowledge. In order to achieve the objectives, it is possible to exploit the smart object concept and localize them through RSSI measurements. After the localization phase, the robot can exploit its own camera to retrieve the objects. Several filtering algorithms are developed in order to mitigate the multi–path issue due to the wireless communication channel and to achieve a better distance estimation through the RSSI measurement. The last part of the dissertation deals with the design and the development of a Cognitive Network (CN) testbed using off the shelf devices. The device type is chosen considering the cost, usability, configurability, mobility and possibility to modify the Operating System (OS) source code. Thus, the best choice is to select some devices based on Linux kernel as Android OS. The feature to modify the Operating System is required to extract the TCP/IP protocol stack parameters for the CN paradigm. It is necessary to monitor the network status in real-time and to modify the critical parameters in order to improve some performance, such as bandwidth consumption, number of hops to exchange the data, and throughput

    TOWARD 3D RECONSTRUCTION OF STATIC AND DYNAMIC OBJECTS

    Get PDF
    The goal of image-based 3D reconstruction is to construct a spatial understanding of the world from a collection of images. For applications that seek to model generic real-world scenes, it is important that the reconstruction methods used are able to characterize both static scene elements (e.g. trees and buildings) as well as dynamic objects (e.g. cars and pedestrians). However, due to many inherent ambiguities in the reconstruction problem, recovering this 3D information with accuracy, robustness, and efficiency is a considerable challenge. To advance the research frontier for image-based 3D modeling, this dissertation focuses on three challenging problems in static scene and dynamic object reconstruction. We first target the problem of static scene depthmap estimation from crowd-sourced datasets (i.e. photos collected from the Internet). While achieving high-quality depthmaps using images taken under a controlled environment is already a difficult task, heterogeneous crowd-sourced data presents a unique set of challenges for multi-view depth estimation, including varying illumination and occasional occlusions. We propose a depthmap estimation method that demonstrates high accuracy, robustness, and scalability on a large number of photos collected from the Internet. Compared to static scene reconstruction, the problem of dynamic object reconstruction from monocular images is fundamentally ambiguous when not imposing any additional assumptions. This is because having only a single observation of an object is insufficient for valid 3D triangulation, which typically requires concurrent observations of the object from multiple viewpoints. Assuming that dynamic objects of the same class (e.g. all the pedestrians walking on a sidewalk) move in a common path in the real world, we develop a method that estimates the 3D positions of the dynamic objects from unstructured monocular images. Experiments on both synthetic and real datasets illustrate the solvability of the problem and the effectiveness of our approach. Finally, we address the problem of dynamic object reconstruction from a set of unsynchronized videos capturing the same dynamic event. This problem is of great interest because, due to the increased availability of portable capture devices, captures using multiple unsynchronized videos are common in the real world. To resolve the challenges that arises from non-concurrent captures and unknown temporal overlap among video streams, we propose a self-expressive dictionary learning framework, where the dictionary entries are defined as the collection of temporally varying structures. Experiments demonstrate the effectiveness of this approach to the previously unsolved problem.Doctor of Philosoph
    corecore