274 research outputs found
Multi-sensor fusion for human-robot interaction in crowded environments
For challenges associated with the ageing population, robot assistants are becoming a promising solution. Human-Robot Interaction (HRI) allows a robot to understand the intention of humans in an environment and react accordingly. This thesis proposes HRI techniques to facilitate the transition of robots from lab-based research to real-world environments. The HRI aspects addressed in this thesis are illustrated in the following scenario: an elderly person, engaged in conversation with friends, wishes to attract a robot's attention. This composite task consists of many problems. The robot must detect and track the subject in a crowded environment. To engage with the user, it must track their hand movement. Knowledge of the subject's gaze would ensure that the robot doesn't react to the wrong person. Understanding the subject's group participation would enable the robot to respect existing human-human interaction. Many existing solutions to these problems are too constrained for natural HRI in crowded environments. Some require initial calibration or static backgrounds. Others deal poorly with occlusions, illumination changes, or real-time operation requirements. This work proposes algorithms that fuse multiple sensors to remove these restrictions and increase the accuracy over the state-of-the-art. The main contributions of this thesis are: A hand and body detection method, with a probabilistic algorithm for their real-time association when multiple users and hands are detected in crowded environments; An RGB-D sensor-fusion hand tracker, which increases position and velocity accuracy by combining a depth-image based hand detector with Monte-Carlo updates using colour images; A sensor-fusion gaze estimation system, combining IR and depth cameras on a mobile robot to give better accuracy than traditional visual methods, without the constraints of traditional IR techniques; A group detection method, based on sociological concepts of static and dynamic interactions, which incorporates real-time gaze estimates to enhance detection accuracy.Open Acces
Recommended from our members
Pictures in Your Mind: Using Interactive Gesture-Controlled Reliefs to Explore Art
Tactile reliefs offer many benefits over the more classic raised line drawings or tactile diagrams, as depth, 3D shape, and surface textures are directly perceivable. Although often created for blind and visually impaired (BVI) people, a wider range of people may benefit from such multimodal material. However, some reliefs are still difficult to understand without proper guidance or accompanying verbal descriptions, hindering autonomous exploration.
In this work, we present a gesture-controlled interactive audio guide (IAG) based on recent low-cost depth cameras that can be operated directly with the hands on relief surfaces during tactile exploration. The interactively explorable, location-dependent verbal and captioned descriptions promise rapid tactile accessibility to 2.5D spatial information in a home or education setting, to online resources, or as a kiosk installation at public places.
We present a working prototype, discuss design decisions, and present the results of two evaluation studies: the first with 13 BVI test users and the second follow-up study with 14 test users across a wide range of people with differences and difficulties associated with perception, memory, cognition, and communication. The participant-led research method of this latter study prompted new, significant and innovative developments
3D head motion, point-of-regard and encoded gaze fixations in real scenes: next-generation portable video-based monocular eye tracking
Portable eye trackers allow us to see where a subject is looking when performing a natural task with free head and body movements. These eye trackers include headgear containing a camera directed at one of the subject\u27s eyes (the eye camera) and another camera (the scene camera) positioned above the same eye directed along the subject\u27s line-of-sight. The output video includes the scene video with a crosshair depicting where the subject is looking -- the point-of-regard (POR) -- that is updated for each frame. This video may be the desired final result or it may be further analyzed to obtain more specific information about the subject\u27s visual strategies. A list of the calculated POR positions in the scene video can also be analyzed. The goals of this project are to expand the information that we can obtain from a portable video-based monocular eye tracker and to minimize the amount of user interaction required to obtain and analyze this information. This work includes offline processing of both the eye and scene videos to obtain robust 2D PORs in scene video frames, identify gaze fixations from these PORs, obtain 3D head motion and ray trace fixations through volumes-of-interest (VOIs) to determine what is being fixated, when and where (3D POR). To avoid the redundancy of ray tracing a 2D POR in every video frame and to group these POR data meaningfully, a fixation-identification algorithm is employed to simplify the long list of 2D POR data into gaze fixations. In order to ray trace these fixations, the 3D motion -- position and orientation over time -- of the scene camera is computed. This camera motion is determined via an iterative structure and motion recovery algorithm that requires a calibrated camera and knowledge of the 3D location of at least four points in the scene (that can be selected from premeasured VOI vertices). The subjects 3D head motion is obtained directly from this camera motion. For the final stage of the algorithm, the 3D locations and dimensions of VOIs in the scene are required. This VOI information in world coordinates is converted to camera coordinates for ray tracing. A representative 2D POR position for each fixation is converted from image coordinates to the same camera coordinate system. Then, a ray is traced from the camera center through this position to determine which (if any) VOI is being fixated and where it is being fixated -- the 3D POR in the world. Results are presented for various real scenes. Novel visualizations of portable eye tracker data created using the results of our algorithm are also presented
Haptic feedback to gaze events
Eyes are the window to the world, and most of the input from the surrounding environment is captured through the eyes. In Human-Computer Interaction too, gaze based interactions are gaining prominence, where the user’s gaze acts as an input to the system. Of late portable and inexpensive eye-tracking devices have made inroads in the market, opening up wider possibilities for interacting with a gaze. However, research on feedback to the gaze-based events is limited. This thesis proposes to study vibrotactile feedback to gaze-based interactions.
This thesis presents a study conducted to evaluate different types of vibrotactile feedback and their role in response to a gaze-based event. For this study, an experimental setup was designed wherein when the user fixated the gaze on a functional object, vibrotactile feedback was provided either on the wrist or on the glasses. The study seeks to answer questions such as the helpfulness of vibrotactile feedback in identifying functional objects, user preference for the type of vibrotactile feedback, and user preference of the location of the feedback. The results of this study indicate that vibrotactile feedback was an important factor in identifying the functional object. The preference for the type of vibrotactile feedback was somewhat inconclusive as there were wide variations among the users over the type of vibrotactile feedback. The personal preference largely influenced the choice of location for receiving the feedback
Information processing on smartphones in public versus private
People increasingly turn to news on mobile devices, often while out and about, attending to daily tasks. Yet, we know little about whether attention to and learning from information on a mobile differs by the setting of use. This study builds on Multiple Resource Theory (Wickens, 1984) and the Resource Competition Framework (Oulasvirta et al., 2005) to compare visual attention to a dynamic newsfeed, varying only the setting: private or public. We use mobile eye-tracking to evaluate the effects of setting on attention and assess correspondent learning differences after exposure to the feed, which allows us to uncover a relationship between attention and learning. Findings indicate higher visual attention to mobile newsfeed posts in public, relative to a private setting. Moreover, scrolling through news on a smartphone in public attenuates some knowledge gain but is beneficial for other learning outcomes
Freeform 3D interactions in everyday environments
PhD ThesisPersonal computing is continuously moving away from traditional input using
mouse and keyboard, as new input technologies emerge. Recently, natural user interfaces
(NUI) have led to interactive systems that are inspired by our physical interactions
in the real-world, and focus on enabling dexterous freehand input in 2D or 3D. Another
recent trend is Augmented Reality (AR), which follows a similar goal to further reduce
the gap between the real and the virtual, but predominately focuses on output, by overlaying
virtual information onto a tracked real-world 3D scene.
Whilst AR and NUI technologies have been developed for both immersive 3D output as
well as seamless 3D input, these have mostly been looked at separately. NUI focuses on
sensing the user and enabling new forms of input; AR traditionally focuses on capturing
the environment around us and enabling new forms of output that are registered to the
real world. The output of NUI systems is mainly presented on a 2D display, while
the input technologies for AR experiences, such as data gloves and body-worn motion
trackers are often uncomfortable and restricting when interacting in the real world.
NUI and AR can be seen as very complimentary, and bringing these two fields together
can lead to new user experiences that radically change the way we interact with
our everyday environments. The aim of this thesis is to enable real-time, low latency,
dexterous input and immersive output without heavily instrumenting the user. The
main challenge is to retain and to meaningfully combine the positive qualities that are
attributed to both NUI and AR systems.
I review work in the intersecting research fields of AR and NUI, and explore freehand
3D interactions with varying degrees of expressiveness, directness and mobility
in various physical settings. There a number of technical challenges that arise when
designing a mixed NUI/AR system, which I will address is this work: What can we capture,
and how? How do we represent the real in the virtual? And how do we physically
couple input and output? This is achieved by designing new systems, algorithms, and
user experiences that explore the combination of AR and NUI
- …