47 research outputs found
Computationally efficient deformable 3D object tracking with a monocular RGB camera
182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices
Computationally efficient deformable 3D object tracking with a monocular RGB camera
182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices
Inferring Human Pose and Motion from Images
As optical gesture recognition technology advances, touchless human computer interfaces of the future will soon become a reality. One particular technology, markerless motion capture, has gained a large amount of attention, with widespread application in diverse disciplines, including medical science, sports analysis, advanced user interfaces, and virtual arts. However, the complexity of human anatomy makes markerless motion capture a non-trivial problem: I) parameterised pose configuration exhibits high dimensionality, and II) there is considerable ambiguity in surjective inverse mapping from observation to pose configuration spaces with a limited number of camera views. These factors together lead to multimodality in high dimensional space, making markerless motion capture an ill-posed problem. This study challenges these difficulties by introducing a new framework. It begins with automatically modelling specific subject template models and calibrating posture at the initial stage. Subsequent tracking is accomplished by embedding naturally-inspired global optimisation into the sequential Bayesian filtering framework. Tracking is enhanced by several robust evaluation improvements. Sparsity of images is managed by compressive evaluation, further accelerating computational efficiency in high dimensional space
Visual attention and swarm cognition for off-road robots
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2011Esta tese aborda o problema da modelação de atenção visual no contexto de robôs autónomos todo-o-terreno. O objectivo de utilizar mecanismos de atenção visual é o de focar a percepção nos aspectos do ambiente mais relevantes à tarefa do robô. Esta tese mostra que, na detecção de obstáculos e de trilhos, esta capacidade promove robustez e parcimónia computacional. Estas são características chave para a rapidez e eficiência dos robôs todo-o-terreno. Um dos maiores desafios na modelação de atenção visual advém da necessidade de gerir o compromisso velocidade-precisão na presença de variações de contexto ou de tarefa. Esta tese mostra que este compromisso é resolvido se o processo de atenção visual for modelado como um processo auto-organizado, cuja operação é modulada pelo módulo de selecção de acção, responsável pelo controlo do robô. Ao fechar a malha entre o processo de selecção de acção e o de percepção, o último é capaz de operar apenas onde é necessário, antecipando as acções do robô. Para fornecer atenção visual com propriedades auto-organizadas, este trabalho obtém inspiração da Natureza. Concretamente, os mecanismos responsáveis pela capacidade que as formigas guerreiras têm de procurar alimento de forma auto-organizada, são usados como metáfora na resolução da tarefa de procurar, também de forma auto-organizada, obstáculos e trilhos no campo visual do robô. A solução proposta nesta tese é a de colocar vários focos de atenção encoberta a operar como um enxame, através de interacções baseadas em feromona. Este trabalho representa a primeira realização corporizada de cognição de enxame. Este é um novo campo de investigação que procura descobrir os princípios básicos da cognição, inspeccionando as propriedades auto-organizadas da inteligência colectiva exibida pelos insectos sociais. Logo, esta tese contribui para a robótica como disciplina de engenharia e para a robótica como disciplina de modelação, capaz de suportar o estudo do comportamento adaptável.Esta tese aborda o problema da modelação de atenção visual no contexto de robôs autónomos
todo-o-terreno. O objectivo de utilizar mecanismos de atenção visual é o de focar a percepção
nos aspectos do ambiente mais relevantes à tarefa do robô. Esta tese mostra que, na detecção de
obstáculos e de trilhos, esta capacidade promove robustez e parcimónia computacional. Estas
são características chave para a rapidez e eficiência dos robôs todo-o-terreno.
Um dos maiores desafios na modelação de atenção visual advém da necessidade de gerir o
compromisso velocidade-precisão na presença de variações de contexto ou de tarefa. Esta tese
mostra que este compromisso é resolvido se o processo de atenção visual for modelado como
um processo auto-organizado, cuja operação é modulada pelo módulo de selecção de acção,
responsável pelo controlo do robô. Ao fechar a malha entre o processo de selecção de acção e
o de percepção, o último é capaz de operar apenas onde é necessário, antecipando as acções do
robô.
Para fornecer atenção visual com propriedades auto-organizadas, este trabalho obtém inspi-
ração da Natureza. Concretamente, os mecanismos responsáveis pela capacidade que as formi-
gas guerreiras têm de procurar alimento de forma auto-organizada, são usados como metáfora
na resolução da tarefa de procurar, também de forma auto-organizada, obstáculos e trilhos no
campo visual do robô. A solução proposta nesta tese é a de colocar vários focos de atenção
encoberta a operar como um enxame, através de interacções baseadas em feromona.
Este trabalho representa a primeira realização corporizada de cognição de enxame. Este é
um novo campo de investigação que procura descobrir os princípios básicos da cognição, ins-
peccionando as propriedades auto-organizadas da inteligência colectiva exibida pelos insectos
sociais. Logo, esta tese contribui para a robótica como disciplina de engenharia e para a robótica
como disciplina de modelação, capaz de suportar o estudo do comportamento adaptável.Fundação para a Ciência e a Tecnologia (FCT,SFRH/BD/27305/2006); Laboratory of Agent Modelling (LabMag
Mobile Augmented Reality: User Interfaces, Frameworks, and Intelligence
Mobile Augmented Reality (MAR) integrates computer-generated virtual objects with physical environments for mobile devices. MAR systems enable users to interact with MAR devices, such as smartphones and head-worn wearables, and perform seamless transitions from the physical world to a mixed world with digital entities. These MAR systems support user experiences using MAR devices to provide universal access to digital content. Over the past 20 years, several MAR systems have been developed, however, the studies and design of MAR frameworks have not yet been systematically reviewed from the perspective of user-centric design. This article presents the first effort of surveying existing MAR frameworks (count: 37) and further discuss the latest studies on MAR through a top-down approach: (1) MAR applications; (2) MAR visualisation techniques adaptive to user mobility and contexts; (3) systematic evaluation of MAR frameworks, including supported platforms and corresponding features such as tracking, feature extraction, and sensing capabilities; and (4) underlying machine learning approaches supporting intelligent operations within MAR systems. Finally, we summarise the development of emerging research fields and the current state-of-the-art, and discuss the important open challenges and possible theoretical and technical directions. This survey aims to benefit both researchers and MAR system developers alike.Peer reviewe
Autonomous Navigation for Unmanned Aerial Systems - Visual Perception and Motion Planning
L'abstract è presente nell'allegato / the abstract is in the attachmen
Contextualised learning‐free three‐dimensional body pose estimation from two‐dimensional body features in monocular images
In this study, the authors present a learning‐free method for inferring kinematically plausible three‐dimensional (3D) human body poses contextualised in a predefined 3D world, given a set of 2D body features extracted from monocular images. This contextualisation has the advantage of providing further semantic information about the observed scene. Their method consists of two main steps. Initially, the camera parameters are obtained by adjusting the reference floor of the predefined 3D world to four key‐points in the image. Then, the person's body part lengths and pose are estimated by fitting a parametrised multi‐body 3D kinematic model to 2D image body features, which can be located by state‐of‐the‐art body part detectors. The adjustment is carried out by a hierarchical optimisation procedure, where the model's scale variations are considered first and then the body part lengths are refined. At each iteration, tentative poses are inferred by a combination of efficient perspective‐n‐point camera pose estimation and constrained viewpoint‐dependent inverse kinematics. Experimental results show that their method obtains good results in terms of accuracy with respect to state‐of‐the‐art alternatives, but without the need of learning 2D/3D mapping models from training data. Their method works efficiently, allowing its integration in video soft sensing systems
Multi-Object Tracking System based on LiDAR and RADAR for Intelligent Vehicles applications
El presente Trabajo Fin de Grado tiene como objetivo el desarrollo de un Sistema de Detección y
Multi-Object Tracking 3D basado en la fusión sensorial de LiDAR y RADAR para aplicaciones
de conducción autónoma basándose en algoritmos tradicionales de Machine Learning. La implementación
realizada está basada en Python, ROS y cumple requerimientos de tiempo real.
En la etapa de detección de objetos se utiliza el algoritmo de segmentación del plano RANSAC,
para una posterior extracción de Bounding Boxes mediante DBSCAN. Una Late Sensor Fusion
mediante Intersection over Union 3D y un sistema de tracking BEV-SORT completan la arquitectura
propuesta.This Final Degree Project aims to develop a 3D Multi-Object Tracking and Detection System
based on the Sensor Fusion of LiDAR and RADAR for autonomous driving applications based
on traditional Machine Learning algorithms. The implementation is based on Python, ROS and
complies with real-time requirements. In the Object Detection stage, the RANSAC plane segmentation
algorithm is used, for a subsequent extraction of Bounding Boxes using DBSCAN.
A Late Sensor Fusion using Intersection over Union 3D and a BEV-SORT tracking system complete
the proposed architecture.Grado en Ingeniería en Electrónica y Automática Industria
Robust and Optimal Methods for Geometric Sensor Data Alignment
Geometric sensor data alignment - the problem of finding the
rigid transformation that correctly aligns two sets of sensor
data without prior knowledge of how the data correspond - is a
fundamental task in computer vision and robotics. It is
inconvenient then that outliers and non-convexity are inherent to
the problem and present significant challenges for alignment
algorithms. Outliers are highly prevalent in sets of sensor data,
particularly when the sets overlap incompletely. Despite this,
many alignment objective functions are not robust to outliers,
leading to erroneous alignments. In addition, alignment problems
are highly non-convex, a property arising from the objective
function and the transformation. While finding a local optimum
may not be difficult, finding the global optimum is a hard
optimisation problem. These key challenges have not been fully
and jointly resolved in the existing literature, and so there is
a need for robust and optimal solutions to alignment problems.
Hence the objective of this thesis is to develop tractable
algorithms for geometric sensor data alignment that are robust to
outliers and not susceptible to spurious local optima.
This thesis makes several significant contributions to the
geometric alignment literature, founded on new insights into
robust alignment and the geometry of transformations. Firstly, a
novel discriminative sensor data representation is proposed that
has better viewpoint invariance than generative models and is
time and memory efficient without sacrificing model fidelity.
Secondly, a novel local optimisation algorithm is developed for
nD-nD geometric alignment under a robust distance measure. It
manifests a wider region of convergence and a greater robustness
to outliers and sampling artefacts than other local optimisation
algorithms. Thirdly, the first optimal solution for 3D-3D
geometric alignment with an inherently robust objective function
is proposed. It outperforms other geometric alignment algorithms
on challenging datasets due to its guaranteed optimality and
outlier robustness, and has an efficient parallel implementation.
Fourthly, the first optimal solution for 2D-3D geometric
alignment with an inherently robust objective function is
proposed. It outperforms existing approaches on challenging
datasets, reliably finding the global optimum, and has an
efficient parallel implementation. Finally, another optimal
solution is developed for 2D-3D geometric alignment, using a
robust surface alignment measure.
Ultimately, robust and optimal methods, such as those in this
thesis, are necessary to reliably find accurate solutions to
geometric sensor data alignment problems