2,024 research outputs found
GEMINI: A Generic Multi-Modal Natural Interface Framework for Videogames
In recent years videogame companies have recognized the role of player
engagement as a major factor in user experience and enjoyment. This encouraged
a greater investment in new types of game controllers such as the WiiMote, Rock
Band instruments and the Kinect. However, the native software of these
controllers was not originally designed to be used in other game applications.
This work addresses this issue by building a middleware framework, which maps
body poses or voice commands to actions in any game. This not only warrants a
more natural and customized user-experience but it also defines an
interoperable virtual controller. In this version of the framework, body poses
and voice commands are respectively recognized through the Kinect's built-in
cameras and microphones. The acquired data is then translated into the native
interaction scheme in real time using a lightweight method based on spatial
restrictions. The system is also prepared to use Nintendo's Wiimote as an
auxiliary and unobtrusive gamepad for physically or verbally impractical
commands. System validation was performed by analyzing the performance of
certain tasks and examining user reports. Both confirmed this approach as a
practical and alluring alternative to the game's native interaction scheme. In
sum, this framework provides a game-controlling tool that is totally
customizable and very flexible, thus expanding the market of game consumers.Comment: WorldCIST'13 Internacional Conferenc
Multi-Modal Trip Hazard Affordance Detection On Construction Sites
Trip hazards are a significant contributor to accidents on construction and
manufacturing sites, where over a third of Australian workplace injuries occur
[1]. Current safety inspections are labour intensive and limited by human
fallibility,making automation of trip hazard detection appealing from both a
safety and economic perspective. Trip hazards present an interesting challenge
to modern learning techniques because they are defined as much by affordance as
by object type; for example wires on a table are not a trip hazard, but can be
if lying on the ground. To address these challenges, we conduct a comprehensive
investigation into the performance characteristics of 11 different colour and
depth fusion approaches, including 4 fusion and one non fusion approach; using
colour and two types of depth images. Trained and tested on over 600 labelled
trip hazards over 4 floors and 2000m in an active construction
site,this approach was able to differentiate between identical objects in
different physical configurations (see Figure 1). Outperforming a colour-only
detector, our multi-modal trip detector fuses colour and depth information to
achieve a 4% absolute improvement in F1-score. These investigative results and
the extensive publicly available dataset moves us one step closer to assistive
or fully automated safety inspection systems on construction sites.Comment: 9 Pages, 12 Figures, 2 Tables, Accepted to Robotics and Automation
Letters (RA-L
RGB-D datasets using microsoft kinect or similar sensors: a survey
RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
- …