7,667 research outputs found
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Semi-Supervised Visual Tracking of Marine Animals using Autonomous Underwater Vehicles
In-situ visual observations of marine organisms is crucial to developing
behavioural understandings and their relations to their surrounding ecosystem.
Typically, these observations are collected via divers, tags, and
remotely-operated or human-piloted vehicles. Recently, however, autonomous
underwater vehicles equipped with cameras and embedded computers with GPU
capabilities are being developed for a variety of applications, and in
particular, can be used to supplement these existing data collection mechanisms
where human operation or tags are more difficult. Existing approaches have
focused on using fully-supervised tracking methods, but labelled data for many
underwater species are severely lacking. Semi-supervised trackers may offer
alternative tracking solutions because they require less data than
fully-supervised counterparts. However, because there are not existing
realistic underwater tracking datasets, the performance of semi-supervised
tracking algorithms in the marine domain is not well understood. To better
evaluate their performance and utility, in this paper we provide (1) a novel
dataset specific to marine animals located at http://warp.whoi.edu/vmat/, (2)
an evaluation of state-of-the-art semi-supervised algorithms in the context of
underwater animal tracking, and (3) an evaluation of real-world performance
through demonstrations using a semi-supervised algorithm on-board an autonomous
underwater vehicle to track marine animals in the wild.Comment: To appear in IJCV SI: Animal Trackin
Learning Multimodal Structures in Computer Vision
A phenomenon or event can be received from various kinds of detectors or under different conditions. Each such acquisition framework is a modality of the phenomenon. Due to the relation between the modalities of multimodal phenomena, a single modality cannot fully describe the event of interest. Since several modalities report on the same event introduces new challenges comparing to the case of exploiting each modality separately.
We are interested in designing new algorithmic tools to apply sensor fusion techniques in the particular signal representation of sparse coding which is a favorite methodology in signal processing, machine learning and statistics to represent data. This coding scheme is based on a machine learning technique and has been demonstrated to be capable of representing many modalities like natural images. We will consider situations where we are not only interested in support of the model to be sparse, but also to reflect a-priorily known knowledge about the application in hand.
Our goal is to extract a discriminative representation of the multimodal data that leads to easily finding its essential characteristics in the subsequent analysis step, e.g., regression and classification. To be more precise, sparse coding is about representing signals as linear combinations of a small number of bases from a dictionary. The idea is to learn a dictionary that encodes intrinsic properties of the multimodal data in a decomposition coefficient vector that is favorable towards the maximal discriminatory power.
We carefully design a multimodal representation framework to learn discriminative feature representations by fully exploiting, the modality-shared which is the information shared by various modalities, and modality-specific which is the information content of each modality individually. Plus, it automatically learns the weights for various feature components in a data-driven scheme. In other words, the physical interpretation of our learning framework is to fully exploit the correlated characteristics of the available modalities, while at the same time leverage the modality-specific character of each modality and change their corresponding weights for different parts of the feature in recognition
An Overview about Emerging Technologies of Autonomous Driving
Since DARPA started Grand Challenges in 2004 and Urban Challenges in 2007,
autonomous driving has been the most active field of AI applications. This
paper gives an overview about technical aspects of autonomous driving
technologies and open problems. We investigate the major fields of self-driving
systems, such as perception, mapping and localization, prediction, planning and
control, simulation, V2X and safety etc. Especially we elaborate on all these
issues in a framework of data closed loop, a popular platform to solve the long
tailed autonomous driving problems
Deformable Linear Objects 3D Shape Estimation and Tracking From Multiple 2D Views
This letter presents DLO3DS , an approach for the 3D shapes estimation and tracking of Deformable Linear Objects (DLOs) such as cables, wires or plastic hoses, using a cheap and compact 2D vision sensor mounted on the robot end-effector. DLO3DS can be applied in all those scenarios in which the perception and manipulation of DLO-like structures are needed, such as in the case of switchgear cabling, wiring harness manufacturing and assembly in the automotive and aerospace industries, or production of hoses for medical applications. The developed procedure is based on a pipeline that first processes the images coming from the 2D camera extracting key topological points along the DLOs. These points are then used to model each DLO with a B-spline curve. Finally, the set of splines obtained from all the images is matched by exploiting a multi-view stereo-based algorithm. DLO3DS is validated both on a real scenario and on simulated data obtained by exploiting a rendering engine for photo-realistic images. In this way, reliable ground-truth data are retrieved and utilized for assessing the estimation error achievable by DLO3DS , which on the employed test set is characterized by a mean reconstruction error of 0.82 mm
- …