987 research outputs found
Down-Sampling coupled to Elastic Kernel Machines for Efficient Recognition of Isolated Gestures
In the field of gestural action recognition, many studies have focused on
dimensionality reduction along the spatial axis, to reduce both the variability
of gestural sequences expressed in the reduced space, and the computational
complexity of their processing. It is noticeable that very few of these methods
have explicitly addressed the dimensionality reduction along the time axis.
This is however a major issue with regard to the use of elastic distances
characterized by a quadratic complexity. To partially fill this apparent gap,
we present in this paper an approach based on temporal down-sampling associated
to elastic kernel machine learning. We experimentally show, on two data sets
that are widely referenced in the domain of human gesture recognition, and very
different in terms of quality of motion capture, that it is possible to
significantly reduce the number of skeleton frames while maintaining a good
recognition rate. The method proves to give satisfactory results at a level
currently reached by state-of-the-art methods on these data sets. The
computational complexity reduction makes this approach eligible for real-time
applications.Comment: ICPR 2014, International Conference on Pattern Recognition, Stockholm
: Sweden (2014
Network streaming and compression for mixed reality tele-immersion
Bulterman, D.C.A. [Promotor]Cesar, P.S. [Copromotor
Activity monitoring and behaviour analysis using RGB-depth sensors and wearable devices for ambient assisted living applications
Nei paesi sviluppati, la percentuale delle persone anziane è in costante crescita. Questa condizione è dovuta ai risultati raggiunti nel capo medico e nel miglioramento della qualità della vita. Con l'avanzare dell'età, le persone sono più soggette a malattie correlate con l'invecchiamento. Esse sono classificabili in tre gruppi: fisiche, sensoriali e mentali. Come diretta conseguenza dell'aumento della popolazione anziana ci sarà quindi una crescita dei costi nel sistema sanitario, che dovrà essere affrontata dalla UE nei prossimi anni.
Una possibile soluzione a questa sfida è l'utilizzo della tecnologia. Questo concetto è chiamato Ambient Assisted living (AAL) e copre diverse aree quali ad esempio il supporto alla mobilità, la cura delle persone, la privacy, la sicurezza e le interazioni sociali.
In questa tesi differenti sensori saranno utilizzati per mostrare, attraverso diverse applicazioni, le potenzialità della tecnologia nel contesto dell'AAL. In particolare verranno utilizzate le telecamere RGB-profondità e sensori indossabili.
La prima applicazione sfrutta una telecamera di profondità per monitorare la distanza sensore-persona al fine di individuare possibili cadute. Un'implementazione alternativa usa l'informazione di profondità sincronizzata con l'accelerazione fornita da un dispositivo indossabile per classificare le attività realizzate dalla persona in due gruppi: Activity Daily Living e cadute.
Al fine di valutare il fattore di rischio caduta negli anziani, la seconda applicazione usa la stessa configurazione descritta in precedenza per misurare i parametri cinematici del corpo durante un test clinico chiamato Timed Up and Go.
Infine, la terza applicazione monitora i movimenti della persona durante il pasto per valutare se il soggetto sta seguendo una dieta corretta. L'informazione di profondità viene sfruttata per riconoscere particolari azioni mentre quella RGB per classificare oggetti di interesse come bicchieri o piatti presenti sul tavolo.Nowadays, in the developed countries, the percentage of the elderly is growing. This situation is a consequence of improvements in people's quality life and developments in the medical field. Because of ageing, people have higher probability to be affected by age-related diseases classified in three main groups physical, perceptual and mental. Therefore, the direct consequence is a growing of healthcare system costs and a not negligible financial sustainability issue which the EU will have to face in the next years.
One possible solution to tackle this challenge is exploiting the advantages provided by the technology. This paradigm is called Ambient Assisted Living (AAL) and concerns different areas, such as mobility support, health and care, privacy and security, social environment and communication.
In this thesis, two different type of sensors will be used to show the potentialities of the technology in the AAL scenario. RGB-Depth cameras and wearable devices will be studied to design affordable solutions.
The first one is a fall detection system that uses the distance information between the target and the camera to monitor people inside the covered area. The application will trigger an alarm when recognizes a fall. An alternative implementation of the same solution synchronizes the information provided by a depth camera and a wearable device to classify the activities performed by the user in two groups: Activity Daily Living and fall.
In order to assess the fall risk in the elderly, the second proposed application uses the previous sensors configuration to measure kinematic parameters of the body during a specific assessment test called Timed Up and Go.
Finally, the third application monitor's the user's movements during an intake activity. Especially, the drinking gesture can be recognized by the system using the depth information to track the hand movements whereas the RGB stream is exploited to classify important objects placed on a table
Development of an active vision system for robot inspection of complex objects
Dissertação de mestrado integrado em Engenharia Mecânica (área de especialização em Sistemas Mecatrónicos)The dissertation presented here is in the scope of the IntVis4Insp project between University of Minho
and the company Neadvance. It focuses on the development of a 3D hand tracking system that must be
capable of extracting the hand position and orientation to prepare a manipulator for automatic inspection
of leather pieces.
This work starts with a literature review about the two main methods for collecting the necessary data to
perform 3D hand tracking. These divide into glove-based methods and vision-based methods. The first
ones work with some kind of support mounted on the hand that holds all the necessary sensors to
measure the desired parameters. While the second ones recur to one or more cameras to capture the
hands and through computer vision algorithms track their position and configuration. The selected
method for this work was the vision-based method Openpose. For each recorded image, this application
can locate 21 hand keypoints on each hand that together form a skeleton of the hands.
This application is used in the tracking system developed throughout this dissertation. Its information is
used in a more complete pipeline where the location of those hand keypoints is crucial to track the hands
in videos of the demonstrated movements. These videos were recorded with an RGB-D camera, the
Microsoft Kinect, which provides a depth value for every RGB pixel recorded. With the depth information
and the 2D location of the hand keypoints in the images, it was possible to obtain the 3D world coordinates
of these points considering the pinhole camera model.
To define the hand, position a point is selected among the 21 for each hand, but for the hand orientation,
it was necessary to develop an auxiliary method called “Iterative Pose Estimation Method” (ITP), which
estimates the complete 3D pose of the hands. This method recurs only to the 2D locations of every hand
keypoint, and the complete 3D world coordinates of the wrists to estimate the right 3D world coordinates
of all the remaining points on the hand. This solution solves the problems related to hand occlusions that
a prone to happen due to the use of only one camera to record the inspection videos. Once the world
location of all the points in the hands is accurately estimated, their orientation can be defined by selecting
three points forming a plane.A dissertação aqui apresentada insere-se no âmbito do projeto IntVis4Insp entre a Universidade do Minho
e a empresa Neadavance, e foca-se no desenvolvimento de um sistema para extração da posição e
orientação das mãos no espaço para posterior auxílio na manipulação automática de peças de couro,
com recurso a manipuladores robóticos.
O trabalho inicia-se com uma revisão literária sobre os dois principais métodos existentes para efetuar a
recolha de dados necessária à monitorização da posição e orientação das mãos ao longo do tempo.
Estes dividem-se em métodos baseados em luvas ou visão. No caso dos primeiros, estes recorrem
normalmente a algum tipo de suporte montado na mão (ex.: luva em tecido), onde estão instalados todos
os sensores necessários para a medição dos parâmetros desejados. Relativamente a sistemas de visão
estes recorrem a uma câmara ou conjunto delas para capturar as mãos e por via de algoritmos de visão
por computador determinam a sua posição e configuração. Foi selecionado para este trabalho um
algoritmo de visão por computador denominado por Openpose. Este é capaz de, em cada imagem
gravada e para cada mão, localizar 21 pontos pertencentes ao seu esqueleto.
Esta aplicação é inserida no sistema de monitorização desenvolvido, sendo utilizada a sua informação
numa arquitetura mais completa onde é efetuada a extração da localização dos pontos chave de cada
mão nos vídeos de demonstração dos movimentos de inspeção. A gravação destes vídeos é efetuada
com uma câmara RGB-D, a Microsoft Kinect, que fornece um valor de profundidade para cada pixel RGB
gravado. Com os dados de profundidade e a localização dos pontos chave nas imagens foi possível obter
as coordenadas 3D no mundo destes pontos considerando o modelo pinhole para a câmara. No caso da
posição da mão é selecionado um ponto de entre os 21 para a definir ao longo do tempo, no entanto,
para o cálculo da orientação foi desenvolvido um método auxiliar para estimação da pose tridimensional
da mão denominado por “Iterative Pose Estimation Method” (ITP). Este método recorre aos dados 2D
do Openpose e às coordenadas 3D do pulso de cada mão para efetuar a correta estimação das
coordenadas 3D dos restantes pontos da mão. Isto permite essencialmente resolver problemas com
oclusões da mão, muito frequentes com o uso de uma só câmara na gravação dos vídeos. Uma vez
estimada corretamente a posição 3D no mundo dos vários pontos da mão, a sua orientação pode ser
definida com recurso a quaisquer três pontos que definam um plano
Data analytics for image visual complexity and kinect-based videos of rehabilitation exercises
With the recent advances in computer vision and pattern recognition, methods from these fields are successfully applied to solve problems in various domains, including health care and social sciences. In this thesis, two such problems, from different domains, are discussed. First, an application of computer vision and broader pattern recognition in physical therapy is presented. Home-based physical therapy is an essential part of the recovery process in which the patient is prescribed specific exercises in order to improve symptoms and daily functioning of the body. However, poor adherence to the prescribed exercises is a common problem. In our work, we explore methods for improving home-based physical therapy experience. We begin by proposing DyAd, a dynamically difficulty adjustment system which captures the trajectory of the hand movement, evaluates the user's performance quantitatively and adjusts the difficulty level for the next trial of the exercise based on the performance measurements. Next, we introduce ExerciseCheck, a remote monitoring and evaluation platform for home-based physical therapy. ExerciseCheck is capable of capturing exercise information, evaluating the performance, providing therapeutic feedback to the patient and the therapist, checking the progress of the user over the course of the physical therapy, and supporting the patient throughout this period. In our experiments, Parkinson patients have tested our system at a clinic and in their homes during their physical therapy period. Our results suggests that ExerciseCheck is a user-friendly application and can assist patients by providing motivation, and guidance to ensure correct execution of the required exercises.
As the second application, and within computer vision paradigm, we focus on visual complexity, an image attribute that humans can subjectively evaluate based on the level of details in the image. Visual complexity has been studied in psychophysics, cognitive science, and, more recently, computer vision, for the purposes of product design, web design, advertising, etc. We first introduce a diverse visual complexity dataset which compromises of seven image categories. We collect the ground-truth scores by comparing the pairwise relationship of images and then convert the pairwise scores to absolute scores using mathematical methods. Furthermore, we propose a method to measure the visual complexity that uses unsupervised information extraction from intermediate convolutional layers of deep neural networks. We derive an activation energy metric that combines convolutional layer activations to quantify visual complexity. The high correlations between ground-truth labels and computed energy scores in our experiments show superiority of our method compared to the previous works. Finally, as an example of the relationship between visual complexity and other image attributes, we demonstrate that, within the context of a category, visually more complex images are more memorable to human observers
3D Sensor Placement and Embedded Processing for People Detection in an Industrial Environment
Papers I, II and III are extracted from the dissertation and uploaded as separate documents to meet post-publication requirements for self-arciving of IEEE conference papers.At a time when autonomy is being introduced in more and more areas, computer vision plays a very important role. In an industrial environment, the ability to create a real-time virtual version of a volume of interest provides a broad range of possibilities, including safety-related systems such as vision based anti-collision and personnel tracking. In an offshore environment, where such systems are not common, the task is challenging due to rough weather and environmental conditions, but the result of introducing such safety systems could potentially be lifesaving, as personnel work close to heavy, huge, and often poorly instrumented moving machinery and equipment. This thesis presents research on important topics related to enabling computer vision systems in industrial and offshore environments, including a review of the most important technologies and methods. A prototype 3D sensor package is developed, consisting of different sensors and a powerful embedded computer. This, together with a novel, highly scalable point cloud compression and sensor fusion scheme allows to create a real-time 3D map of an industrial area. The question of where to place the sensor packages in an environment where occlusions are present is also investigated. The result is algorithms for automatic sensor placement optimisation, where the goal is to place sensors in such a way that maximises the volume of interest that is covered, with as few occluded zones as possible. The method also includes redundancy constraints where important sub-volumes can be defined to be viewed by more than one sensor. Lastly, a people detection scheme using a merged point cloud from six different sensor packages as input is developed. Using a combination of point cloud clustering, flattening and convolutional neural networks, the system successfully detects multiple people in an outdoor industrial environment, providing real-time 3D positions. The sensor packages and methods are tested and verified at the Industrial Robotics Lab at the University of Agder, and the people detection method is also tested in a relevant outdoor, industrial testing facility. The experiments and results are presented in the papers attached to this thesis.publishedVersio
- …