25,437 research outputs found
Computational Learning for Hand Pose Estimation
Rapid advances in human–computer interaction interfaces have been promising a realistic environment for gaming and entertainment in the last few years. However, the use of traditional input devices such as trackballs, keyboards, or joysticks has been a bottleneck for natural interactions between a human and computer as two points of freedom of these devices cannot suitably emulate the interactions in a three-dimensional space. Consequently, a comprehensive hand tracking technology is expected as a smart and intuitive option to these input tools to enhance virtual and augmented reality experiences. In addition, the recent emergence of low-cost depth sensing cameras has led to their broad use of RGB-D data in computer vision, raising expectations of a full 3D interpretation of hand movements for human–computer interaction interfaces. Although the use of hand gestures or hand postures has become essential for a wide range of applications in computer games and augmented/virtual reality, 3D hand pose estimation is still an open and challenging problem because of the following reasons: (i) the hand pose exists in a high-dimensional space because each finger and the palm is associated with several degrees of freedom, (ii) the fingers exhibit self-similarity and often occlude to each other, (iii) global 3D rotations make pose estimation more difficult, and (iv) hands only exist in few pixels in images and the noise in acquired data coupled with fast finger movement confounds continuous hand tracking. The success of hand tracking would naturally depend on synthesizing our knowledge of the hand (i.e., geometric shape, constraints on pose configurations) and latent features about hand poses from the RGB-D data stream (i.e., region of interest, key feature points like finger tips and joints, and temporal continuity). In this thesis, we propose novel methods to leverage the paradigm of analysis by synthesis and create a prediction model using a population of realistic 3D hand poses. The overall goal of this work is to design a concrete framework so the computers can learn and understand about perceptual attributes of human hands (i.e., self-occlusions or self-similarities of the fingers) and to develop a pragmatic solution to the real-time hand pose estimation problem implementable on a standard computer.
This thesis can be broadly divided into four parts: learning hand (i) from recommendiations of similar hand poses, (ii) from low-dimensional visual representations, (iii) by hallucinating geometric representations, and (iv) from a manipulating object. Each research work covers our algorithmic contributions to solve the 3D hand pose estimation problem. Additionally, the research work in the appendix proposes a pragmatic technique for applying our ideas to mobile devices with low computational power. Following a given structure, we first overview the most relevant works on depth sensor-based 3D hand pose estimation in the literature both with and without manipulating an object. Two different approaches prevalent for categorizing hand pose estimation, model-based methods and appearance-based methods, are discussed in detail. In this chapter, we also introduce some works relevant to deep learning and trials to achieve efficient compression of the network structure. Next, we describe a synthetic 3D hand model and its motion constraints for simulating realistic human hand movements. The section for the primary research work starts in the following chapter. We discuss our attempts to produce a better estimation model for 3D hand pose estimation by learning hand articulations from recommendations of similar poses. Specifically, the unknown pose parameters for input depth data are estimated by collaboratively learning the known parameters of all neighborhood poses. Subsequently, we discuss deep-learned, discriminative, and low-dimensional features and a hierarchical solution of the stated problem based on the matrix completion framework. This work is further extended by incorporating a function of geometric properties on the surface of the hand described by heat diffusion, which is robust to capture both the local geometry of the hand and global structural representations. The problem of the hands interactions with a physical object is also considered in the following chapter. The main insight is that the interacting object can be a source of constraint on hand poses. In this view, we employ pose dependency on the shape of the object to learn the discriminative features of the hand–object interaction, rather than losing hand information caused by partial or full object occlusions. Subsequently, we present a compressive learning technique in the appendix. Our approach is flexible, enabling us to add more layers and go deeper in the deep learning architecture while keeping the number of parameters the same. Finally, we conclude this thesis work by summarizing the presented approaches for hand pose estimation and then propose future directions to further achieve performance improvements through (i) realistically rendered synthetic hand images, (ii) incorporating RGB images as an input, (iii) hand perseonalization, (iv) use of unstructured point cloud, and (v) embedding sensing techniques
Vision-based interface applied to assistive robots
This paper presents two vision-based interfaces for disabled people to command a mobile robot for personal assistance. The developed interfaces can be subdivided according to the algorithm of image processing implemented for the detection and tracking of two different body regions. The first interface detects and tracks movements of the user's head, and these movements are transformed into linear and angular velocities in order to command a mobile robot. The second interface detects and tracks movements of the user's hand, and these movements are similarly transformed. In addition, this paper also presents the control laws for the robot. The experimental results demonstrate good performance and balance between complexity and feasibility for real-time applications.Fil: PĂ©rez Berenguer, MarĂa Elisa. Universidad Nacional de San Juan. Facultad de IngenierĂa. Departamento de ElectrĂłnica y Automática. Gabinete de TecnologĂa MĂ©dica; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; ArgentinaFil: Soria, Carlos Miguel. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - San Juan. Instituto de Automática. Universidad Nacional de San Juan. Facultad de IngenierĂa. Instituto de Automática; ArgentinaFil: LĂłpez Celani, Natalia Martina. Universidad Nacional de San Juan. Facultad de IngenierĂa. Departamento de ElectrĂłnica y Automática. Gabinete de TecnologĂa MĂ©dica; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas; ArgentinaFil: Nasisi, Oscar Herminio. Universidad Nacional de San Juan. Facultad de IngenierĂa. Instituto de Automática; ArgentinaFil: Mut, Vicente Antonio. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - San Juan. Instituto de Automática. Universidad Nacional de San Juan. Facultad de IngenierĂa. Instituto de Automática; Argentin
Machine Understanding of Human Behavior
A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior
Human-Machine Interface for Remote Training of Robot Tasks
Regardless of their industrial or research application, the streamlining of
robot operations is limited by the proximity of experienced users to the actual
hardware. Be it massive open online robotics courses, crowd-sourcing of robot
task training, or remote research on massive robot farms for machine learning,
the need to create an apt remote Human-Machine Interface is quite prevalent.
The paper at hand proposes a novel solution to the programming/training of
remote robots employing an intuitive and accurate user-interface which offers
all the benefits of working with real robots without imposing delays and
inefficiency. The system includes: a vision-based 3D hand detection and gesture
recognition subsystem, a simulated digital twin of a robot as visual feedback,
and the "remote" robot learning/executing trajectories using dynamic motion
primitives. Our results indicate that the system is a promising solution to the
problem of remote training of robot tasks.Comment: Accepted in IEEE International Conference on Imaging Systems and
Techniques - IST201
Autonomy Infused Teleoperation with Application to BCI Manipulation
Robot teleoperation systems face a common set of challenges including
latency, low-dimensional user commands, and asymmetric control inputs. User
control with Brain-Computer Interfaces (BCIs) exacerbates these problems
through especially noisy and erratic low-dimensional motion commands due to the
difficulty in decoding neural activity. We introduce a general framework to
address these challenges through a combination of computer vision, user intent
inference, and arbitration between the human input and autonomous control
schemes. Adjustable levels of assistance allow the system to balance the
operator's capabilities and feelings of comfort and control while compensating
for a task's difficulty. We present experimental results demonstrating
significant performance improvement using the shared-control assistance
framework on adapted rehabilitation benchmarks with two subjects implanted with
intracortical brain-computer interfaces controlling a seven degree-of-freedom
robotic manipulator as a prosthetic. Our results further indicate that shared
assistance mitigates perceived user difficulty and even enables successful
performance on previously infeasible tasks. We showcase the extensibility of
our architecture with applications to quality-of-life tasks such as opening a
door, pouring liquids from containers, and manipulation with novel objects in
densely cluttered environments
Pedestrian Detection with Wearable Cameras for the Blind: A Two-way Perspective
Blind people have limited access to information about their surroundings,
which is important for ensuring one's safety, managing social interactions, and
identifying approaching pedestrians. With advances in computer vision, wearable
cameras can provide equitable access to such information. However, the
always-on nature of these assistive technologies poses privacy concerns for
parties that may get recorded. We explore this tension from both perspectives,
those of sighted passersby and blind users, taking into account camera
visibility, in-person versus remote experience, and extracted visual
information. We conduct two studies: an online survey with MTurkers (N=206) and
an in-person experience study between pairs of blind (N=10) and sighted (N=40)
participants, where blind participants wear a working prototype for pedestrian
detection and pass by sighted participants. Our results suggest that both of
the perspectives of users and bystanders and the several factors mentioned
above need to be carefully considered to mitigate potential social tensions.Comment: The 2020 ACM CHI Conference on Human Factors in Computing Systems
(CHI 2020
- …