Search CORE

18 research outputs found

Recognition of activities of daily living from egocentric videos using hands detected by a deep convolutional network

Author: AA Chaaraoui
AA Chaaraoui
F Cardinaux
M Everingham
O Russakovsky
S Ren
THC Nguyen
X Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2018
Field of study

Crossref

Kingston University Research Repository

Continuous human action recognition in ambient assisted living scenarios

Author: Andre Chaaraoui Alexandros
Florez-Revuelta Francisco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2014
Field of study

Ambient assisted living technologies and services make it possible to help elderly and impaired people and increase their personal autonomy. Specifically, vision-based approaches enable the recognition of human behaviour, which in turn allows to build valuable services upon. However, a main constraint is that these have to be able to work online and in real time. In this work, a human action recognition method based on a bag-of-key-poses model and sequence alignment is extended to support continuous human action recognition. The detection of action zones is proposed to locate the most discriminative segments of an action. For the recognition, a method based on a sliding and growing window approach is presented. Furthermore, an evaluation scheme particularly designed for ambient assisted living scenarios is introduced. Experimental results on two publicly available datasets are provided. These show that the proposed action zones lead to a significant improvement and allow real-time processing

Repositorio Institucional de la Universidad de Alicante

Crossref

Kingston University Research Repository

Ihmisten asennon tunnistus syvyyskameralla

Author: Arasalo Ossi
Publication venue
Publication date: 16/12/2019
Field of study

Human pose estimation has many applications from activity analysis to autonomous cars. Modern advances in deep learning research have enabled real time multi-person pose estimation in complex environments. In this thesis, a state of the art deep learning architecture is adapted to work with depth sensors. A dataset is generated using computer graphics instead of annotating thousands of images by hand. Results are promising; trained neural network detects humans in multi-person environments even when occlusion is present. However, there are challenges rising from the difference between the real world and the synthetic data generation, which has to be addressed.Automaattisella ihmisten asennon tunnistuksella on lukuisia sovelluksia aktiviteettianalyysistä itsenäisiin autoihin. Nykyaikainen kehitys syvien neuroverkkojen alalla on mahdollistanut useamman ihmisen reaaliaikaisen asennon tunnistamisen monimutkaisissa ympäristöissä. Tässä diplomityössä adaptoidaan nykyaikainen neuroverkko arkkitehtuuri toimimaan syvyyskameralla saaduilla kuvilla. Neuroverkon opetukseen tarvittava opetusdata generoidoon tietokonegrafiikan avulla sen sijaan, että opetusdata luotaisiin käsityönä. Saadut tulokset ovat lupaavia; opetettu neuroverkko kykenee tunnistamaan samanaikaisesti usean ihmisen monimutkaisessa ympäristössä. Kaikesta huolimatta, simuloidun datan ja todellisen maailman välinen eroavaisuus aiheuttaa ongelmia, jotka täytyy ottaa huomioon

Aaltodoc Publication Archive

Reconhecimento de ações em vídeos baseado na fusão de representações de ritmos visuais

Author: Moreira Thierry Pinheiro, 1990-
Publication venue: [s.n.]
Publication date: 27/02/2019
Field of study

Orientadores: Hélio Pedrini, David Menotti GomesTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Avanços nas tecnologias de captura e armazenamento de vídeos têm promovido uma grande demanda pelo reconhecimento automático de ações. O uso de câmeras para propó- sitos de segurança e vigilância tem aplicações em vários cenários, tais coomo aeroportos, parques, bancos, estações, estradas, hospitais, supermercados, indústrias, estádios, escolas. Uma dificuldade inerente ao problema é a complexidade da cena sob condições habituais de gravação, podendo conter fundo complexo e com movimento, múltiplas pes- soas na cena, interações com outros atores ou objetos e movimentos de câmera. Bases de dados mais recentes são construídas principalmente com gravações compartilhadas no YouTube e com trechos de filmes, situações em que não se restringem esses obstáculos. Outra dificuldade é o impacto da dimensão temporal, pois ela infla o tamanho dos da- dos, aumentando o custo computacional e o espaço de armazenamento. Neste trabalho, apresentamos uma metodologia de descrição de volumes utilizando a representação de Ritmos Visuais (VR). Esta técnica remodela o volume original do vídeo em uma imagem, em que se computam descritores bidimensionais. Investigamos diferentes estratégias para construção do ritmo visual, combinando configurações em diversos domínios de imagem e direções de varredura dos quadros. A partir disso, propomos dois métodos de extração de características originais, denominados Naïve Visual Rhythm (Naïve VR) e Visual Rhythm Trajectory Descriptor (VRTD). A primeira abordagem é a aplicação direta da técnica no volume de vídeo original, formando um descritor holístico que considera os eventos da ação como padrões e formatos na imagem de ritmo visual. A segunda variação foca na análise de pequenas vizinhanças obtidas a partir do processo das trajetórias densas, que permite que o algoritmo capture detalhes despercebidos pela descrição global. Testamos a nossa proposta em oito bases de dados públicas, sendo uma de gestos (SKIG), duas em primeira pessoa (DogCentric e JPL), e cinco em terceira pessoa (Weizmann, KTH, MuHAVi, UCF11 e HMDB51). Os resultados mostram que a técnica empregada é capaz de extrair elementos de movimento juntamente com informações de formato e de aparência, obtendo taxas de acurácia competitivas comparadas com o estado da arteAbstract: Advances in video acquisition and storage technologies have promoted a great demand for automatic recognition of actions. The use of cameras for security and surveillance purposes has applications in several scenarios, such as airports, parks, banks, stations, roads, hospitals, supermarkets, industries, stadiums, schools. An inherent difficulty of the problem is the complexity of the scene under usual recording conditions, which may contain complex background and motion, multiple people on the scene, interactions with other actors or objects, and camera motion. Most recent databases are built primarily with shared recordings on YouTube and with snippets of movies, situations where these obstacles are not restricted. Another difficulty is the impact of the temporal dimension since it expands the size of the data, increasing computational cost and storage space. In this work, we present a methodology of volume description using the Visual Rhythm (VR) representation. This technique reshapes the original volume of the video into an image, where two-dimensional descriptors are computed. We investigated different strategies for constructing the representation by combining configurations in several image domains and traversing directions of the video frames. From this, we propose two feature extraction methods, Naïve Visual Rhythm (Naïve VR) and Visual Rhythm Trajectory Descriptor (VRTD). The first approach is the straightforward application of the technique in the original video volume, forming a holistic descriptor that considers action events as patterns and formats in the visual rhythm image. The second variation focuses on the analysis of small neighborhoods obtained from the process of dense trajectories, which allows the algorithm to capture details unnoticed by the global description. We tested our methods in eight public databases, one of hand gestures (SKIG), two in first person (DogCentric and JPL), and five in third person (Weizmann, KTH, MuHAVi, UCF11 and HMDB51). The results show that the developed techniques are able to extract motion elements along with format and appearance information, achieving competitive accuracy rates compared to state-of-the-art action recognition approachesDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2015/03156-7FAPES

Repositorio da Producao Cientifica e Intelectual da Unicamp

Anchor-free Pipeline Temporal Action Localisation

Author: Lee Jiyong
Publication venue
Publication date: 31/12/2023
Field of study

The University of Manchester - Institutional Repository

Hierarchical Task Network planning with common-sense reasoning for multiple-people behaviour analysis

Author: Artikis
Bae
Bechhofer
Bernardin
Bourdev
Candan
Chaaraoui
Chen
Cilla
Cocea
David Villa
Davidson
del Rincon
Dijkstra
Divers
Do
Edwards
Fahlman
Gomez
Gomez-Romero
Han
Hogg
Hong
Huiyu Zhou
Jesus Martinez-del-Rincon
Juan C. Lopez
Kuipers
Li
Maria J. Santofimia
McCarthy
McLaughlin
Minsky
Mueller
Mueller
Nebel
Paul Miller
Rodriguez
SanMiguel
Santofimia
Sebbak
Stewart
Viola
Wilensky
Woodward
Xin Hong
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref