204,790 research outputs found
Lightweight human activity recognition for ambient assisted living
© 2023, IARIA.Ambient assisted living (AAL) systems aim to improve the safety, comfort, and quality of life for the populations with specific attention given to prolonging personal independence during later stages of life. Human activity recognition (HAR) plays a crucial role in enabling AAL systems to recognise and understand human actions. Multi-view human activity recognition (MV-HAR) techniques are particularly useful for AAL systems as they can use information from multiple sensors to capture different perspectives of human activities and can help to improve the robustness and accuracy of activity recognition. In this work, we propose a lightweight activity recognition pipeline that utilizes skeleton data from multiple perspectives to combine the advantages of both approaches and thereby enhance an assistive robot's perception of human activity. The pipeline includes data sampling, input data type, and representation and classification methods. Our method modifies a classic LeNet classification model (M-LeNet) and uses a Vision Transformer (ViT) for the classification task. Experimental evaluation on a multi-perspective dataset of human activities in the home (RH-HAR-SK) compares the performance of these two models and indicates that combining camera views can improve recognition accuracy. Furthermore, our pipeline provides a more efficient and scalable solution in the AAL context, where bandwidth and computing resources are often limited
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
RGB-D-based Action Recognition Datasets: A Survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has
attracted increasing attention since the first work reported in 2010. Over this
period, many benchmark datasets have been created to facilitate the development
and evaluation of new algorithms. This raises the question of which dataset to
select and how to use it in providing a fair and objective comparative
evaluation against state-of-the-art methods. To address this issue, this paper
provides a comprehensive review of the most commonly used action recognition
related RGB-D video datasets, including 27 single-view datasets, 10 multi-view
datasets, and 7 multi-person datasets. The detailed information and analysis of
these datasets is a useful resource in guiding insightful selection of datasets
for future research. In addition, the issues with current algorithm evaluation
vis-\'{a}-vis limitations of the available datasets and evaluation protocols
are also highlighted; resulting in a number of recommendations for collection
of new datasets and use of evaluation protocols
Up in the Air: When Homes Meet the Web of Things
The emerging Internet of Things (IoT) will comprise billions of Web-enabled
objects (or "things") where such objects can sense, communicate, compute and
potentially actuate. WoT is essentially the embodiment of the evolution from
systems linking digital documents to systems relating digital information to
real-world physical items. It is widely understood that significant technical
challenges exist in developing applications in the WoT environment. In this
paper, we report our practical experience in the design and development of a
smart home system in a WoT environment. Our system provides a layered framework
for managing and sharing the information produced by physical things as well as
the residents. We particularly focus on a research prototype named WITS, that
helps the elderly live independently and safely in their own homes, with
minimal support from the decreasing number of individuals in the working-age
population. WITS enables an unobtrusive monitoring of elderly people in a
real-world, inhabituated home environment, by leveraging WoT technologies in
building context-aware, personalized services
Towards Storytelling from Visual Lifelogging: An Overview
Visual lifelogging consists of acquiring images that capture the daily
experiences of the user by wearing a camera over a long period of time. The
pictures taken offer considerable potential for knowledge mining concerning how
people live their lives, hence, they open up new opportunities for many
potential applications in fields including healthcare, security, leisure and
the quantified self. However, automatically building a story from a huge
collection of unstructured egocentric data presents major challenges. This
paper provides a thorough review of advances made so far in egocentric data
analysis, and in view of the current state of the art, indicates new lines of
research to move us towards storytelling from visual lifelogging.Comment: 16 pages, 11 figures, Submitted to IEEE Transactions on Human-Machine
System
Efficient Action Detection in Untrimmed Videos via Multi-Task Learning
This paper studies the joint learning of action recognition and temporal
localization in long, untrimmed videos. We employ a multi-task learning
framework that performs the three highly related steps of action proposal,
action recognition, and action localization refinement in parallel instead of
the standard sequential pipeline that performs the steps in order. We develop a
novel temporal actionness regression module that estimates what proportion of a
clip contains action. We use it for temporal localization but it could have
other applications like video retrieval, surveillance, summarization, etc. We
also introduce random shear augmentation during training to simulate viewpoint
change. We evaluate our framework on three popular video benchmarks. Results
demonstrate that our joint model is efficient in terms of storage and
computation in that we do not need to compute and cache dense trajectory
features, and that it is several times faster than its sequential ConvNets
counterpart. Yet, despite being more efficient, it outperforms state-of-the-art
methods with respect to accuracy.Comment: WACV 2017 camera ready, minor updates about test time efficienc
A Dual-Source Approach for 3D Human Pose Estimation from a Single Image
In this work we address the challenging problem of 3D human pose estimation
from single images. Recent approaches learn deep neural networks to regress 3D
pose directly from images. One major challenge for such methods, however, is
the collection of training data. Specifically, collecting large amounts of
training data containing unconstrained images annotated with accurate 3D poses
is infeasible. We therefore propose to use two independent training sources.
The first source consists of accurate 3D motion capture data, and the second
source consists of unconstrained images with annotated 2D poses. To integrate
both sources, we propose a dual-source approach that combines 2D pose
estimation with efficient 3D pose retrieval. To this end, we first convert the
motion capture data into a normalized 2D pose space, and separately learn a 2D
pose estimation model from the image data. During inference, we estimate the 2D
pose and efficiently retrieve the nearest 3D poses. We then jointly estimate a
mapping from the 3D pose space to the image and reconstruct the 3D pose. We
provide a comprehensive evaluation of the proposed method and experimentally
demonstrate the effectiveness of our approach, even when the skeleton
structures of the two sources differ substantially.Comment: under consideration at Computer Vision and Image Understanding.
Extended version of CVPR-2016 paper, arXiv:1509.0672
A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset
This paper aims to determine which is the best human action recognition
method based on features extracted from RGB-D devices, such as the Microsoft
Kinect. A review of all the papers that make reference to MSR Action3D, the
most used dataset that includes depth information acquired from a RGB-D device,
has been performed. We found that the validation method used by each work
differs from the others. So, a direct comparison among works cannot be made.
However, almost all the works present their results comparing them without
taking into account this issue. Therefore, we present different rankings
according to the methodology used for the validation in orden to clarify the
existing confusion.Comment: 16 pages and 7 table
EV-Action: Electromyography-Vision Multi-Modal Action Dataset
Multi-modal human action analysis is a critical and attractive research
topic. However, the majority of the existing datasets only provide visual
modalities (i.e., RGB, depth and skeleton). To make up this, we introduce a
new, large-scale EV-Action dataset in this work, which consists of RGB, depth,
electromyography (EMG), and two skeleton modalities. Compared with the
conventional datasets, EV-Action dataset has two major improvements: (1) we
deploy a motion capturing system to obtain high quality skeleton modality,
which provides more comprehensive motion information including skeleton,
trajectory, acceleration with higher accuracy, sampling frequency, and more
skeleton markers. (2) we introduce an EMG modality which is usually used as an
effective indicator in the biomechanics area, also it has yet to be well
explored in motion related research. To the best of our knowledge, this is the
first action dataset with EMG modality. The details of EV-Action dataset are
clarified, meanwhile, a simple yet effective framework for EMG-based action
recognition is proposed. Moreover, state-of-the-art baselines are applied to
evaluate the effectiveness of all the modalities. The obtained result clearly
shows the validity of EMG modality in human action analysis tasks. We hope this
dataset can make significant contributions to human motion analysis, computer
vision, machine learning, biomechanics, and other interdisciplinary fields.Comment: IEEE International Conference on Automatic Face & Gesture Recognitio
Activity Recognition based on a Magnitude-Orientation Stream Network
The temporal component of videos provides an important clue for activity
recognition, as a number of activities can be reliably recognized based on the
motion information. In view of that, this work proposes a novel temporal stream
for two-stream convolutional networks based on images computed from the optical
flow magnitude and orientation, named Magnitude-Orientation Stream (MOS), to
learn the motion in a better and richer manner. Our method applies simple
nonlinear transformations on the vertical and horizontal components of the
optical flow to generate input images for the temporal stream. Experimental
results, carried on two well-known datasets (HMDB51 and UCF101), demonstrate
that using our proposed temporal stream as input to existing neural network
architectures can improve their performance for activity recognition. Results
demonstrate that our temporal stream provides complementary information able to
improve the classical two-stream methods, indicating the suitability of our
approach to be used as a temporal video representation.Comment: 8 pages, SIBGRAPI 201
- …