Search CORE

730 research outputs found

RGBD Datasets: Past, Present and Future

Author: Firman Michael
Publication venue
Publication date: 13/04/2016
Field of study

Since the launch of the Microsoft Kinect, scores of RGBD datasets have been released. These have propelled advances in areas from reconstruction to gesture recognition. In this paper we explore the field, reviewing datasets across eight categories: semantics, object pose estimation, camera tracking, scene reconstruction, object tracking, human actions, faces and identification. By extracting relevant information in each category we help researchers to find appropriate data for their needs, and we consider which datasets have succeeded in driving computer vision forward and why. Finally, we examine the future of RGBD datasets. We identify key areas which are currently underexplored, and suggest that future directions may include synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style

arXiv.org e-Print Archive

Crossref

Multimodal human hand motion sensing and analysis - a review

Author: Chen Jing
Ju Zhaojie
Liu Honghai
Xiang Kui
Xue Yaxu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/01/2018
Field of study

Portsmouth University Research Portal (Pure)

Recommended from our members

British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language

Author: Bird Jordan J.
Ekárt Anikó
Faria Diego R.
Publication venue: 'MDPI AG'
Publication date: 09/09/2020
Field of study

In this work, we show that a late fusion approach to multimodality in sign language recognition improves the overall ability of the model in comparison to the singular approaches of image classification (88.14%) and Leap Motion data classification (72.73%). With a large synchronous dataset of 18 BSL gestures collected from multiple subjects, two deep neural networks are benchmarked and compared to derive a best topology for each. The Vision model is implemented by a Convolutional Neural Network and optimised Artificial Neural Network, and the Leap Motion model is implemented by an evolutionary search of Artificial Neural Network topology. Next, the two best networks are fused for synchronised processing, which results in a better overall result (94.44%) as complementary features are learnt in addition to the original task. The hypothesis is further supported by application of the three models to a set of completely unseen data where a multimodality approach achieves the best results relative to the single sensor method. When transfer learning with the weights trained via British Sign Language, all three models outperform standard random weight distribution when classifying American Sign Language (ASL), and the best model overall for ASL classification was the transfer learning multimodality approach, which scored 82.55% accuracy

Nottingham Trent Institutional Repository (IRep)

Aston Publications Explorer

Fusion of pose and head tracking data for immersive mixed-reality application development

Author: Cabrera Quesada Julian
Carballeira López Pablo
Czesak Katarzyna
García Santos Narciso
Mohedano del Pozo Raúl
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2016
Field of study

This work addresses the creation of a development framework where application developers can create, in a natural way, immersive physical activities where users experience a 3D first-person perception of full body control. The proposed frame-work is based on commercial motion sensors and a Head-Mounted Display (HMD), and a uses Unity 3D as a unifying environment where user pose, virtual scene and immersive visualization functions are coordinated. Our proposal is exemplified by the development of a toy application showing its practical us

Crossref

Archivo Digital UPM

An advanced virtual dance performance evaluator

Author: Gowing Marc
Kelly Philip
Monaghan David
O'Connor Noel E.
Publication venue
Publication date: 30/03/2012
Field of study

The ever increasing availability of high speed Internet access has led to a leap in technologies that support real-time realistic interaction between humans in online virtual environments. In the context of this work, we wish to realise the vision of an online dance studio where a dance class is to be provided by an expert dance teacher and to be delivered to online students via the web. In this paper we study some of the technical issues that need to be addressed in this challenging scenario. In particular, we describe an automatic dance analysis tool that would be used to evaluate a student's performance and provide him/her with meaningful feedback to aid improvement

DCU Online Research Access Service

Jester: A Device Abstraction and Data Fusion API for Skeletal Tracking

Author: Schapansky Kevin Samuel
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2014
Field of study

Humans naturally interact with the world in three dimensions. Traditionally, personal computers have relied on 2D mice for input because 3D user tracking systems were cumbersome and expensive. Recently, 3D input hardware has become accurate and affordable enough to be marketed to average consumers and integrated into niche applications. Presently, 3D application developers must learn a different API for each device their software will support, and there is no simple way to integrate sensor data if the system has multiple 3D input devices. This thesis presents Jester, a library designed to simplify the development and improve the accuracy of 3D input-supported applications by providing an easily-extensible set of sensor wrappers that abstract the hardware specific details of capturing skeletal data and fusing sensor data in multiple 3D input device systems. Jester\u27s capabilities are demonstrated by creating a toy application that uses a PrimeSense Carmine and Leap Motion Controller to provide full body and finger skeletal tracking. Jester was able to fuse the data in real time while using the Carmine\u27s data to compensate for ambiguity in the Leap\u27s tracking

DigitalCommons@CalPoly

Towards the Design of a Natural User Interface for Performing and Learning Musical Gestures

Author: Hemery Edgar
Manitsaris Athanasios
Manitsaris Sotiris
Moutarde Fabien
Volioti Christina
Publication venue: Published by Elsevier B.V.
Publication date: 01/01/2015
Field of study

AbstractA large variety of musical instruments, either acoustical or digital, are based on a keyboard scheme. Keyboard instruments can produce sounds through acoustic means but they are increasingly used to control digital sound synthesis processes with nowadays music. Interestingly, with all the different possibilities of sonic outcomes, the input remains a musical gesture. In this paper we present the conceptualization of a Natural User Interface (NUI), named the Intangible Musical Instrument (IMI), aiming to support both learning of expert musical gestures and performing music as a unified user experience. The IMI is designed to recognize metaphors of pianistic gestures, focusing on subtle uses of fingers and upper-body. Based on a typology of musical gestures, a gesture vocabulary has been created, hierarchized from basic to complex. These piano-like gestures are finally recognized and transformed into sounds

Elsevier - Publisher Connector

Crossref

HAL-MINES ParisTech

MIFTel: a multimodal interactive framework based on temporal logic rules

Author: Marini MARCO RAOUL
Publication venue
Publication date: 28/02/2019
Field of study

Human-computer and multimodal interaction are increasingly used in everyday life. Machines are able to get more from the surrounding world, assisting humans in different application areas. In this context, the correct processing and management of signals provided by the environments is determinant for structuring the data. Different sources and acquisition times can be exploited for improving recognition results. On the basis of these assumptions, we are proposing a multimodal system that exploits Allen’s temporal logic combined with a prevision method. The main object is to correlate user’s events with system’s reactions. After post-elaborating coming data from different signal sources (RGB images, depth maps, sounds, proximity sensors, etc.), the system is managing the correlations between recognition/detection results and events in real-time to create an interactive environment for the user. For increasing the recognition reliability, a predictive model is also associated with the proposed method. The modularity of the system grants a full dynamic development and upgrade with custom modules. Finally, a comparison with other similar systems is shown, underlining the high flexibility and robustness of the proposed event management method

Archivio della ricerca- Università di Roma La Sapienza