1,943 research outputs found
A real-time human-robot interaction system based on gestures for assistive scenarios
Natural and intuitive human interaction with robotic systems is a key point to develop robots assisting people in an easy and effective way. In this paper, a Human Robot Interaction (HRI) system able to recognize gestures usually employed in human non-verbal communication is introduced, and an in-depth study of its usability is performed. The system deals with dynamic gestures such as waving or nodding which are recognized using a Dynamic Time Warping approach based on gesture specific features computed from depth maps. A static gesture consisting in pointing at an object is also recognized. The pointed location is then estimated in order to detect candidate objects the user may refer to. When the pointed object is unclear for the robot, a disambiguation procedure by means of either a verbal or gestural dialogue is performed. This skill would lead to the robot picking an object in behalf of the user, which could present difficulties to do it by itself. The overall system — which is composed by a NAO and Wifibot robots, a KinectTM v2 sensor and two laptops — is firstly evaluated in a structured lab setup. Then, a broad set of user tests has been completed, which allows to assess correct performance in terms of recognition rates, easiness of use and response times.Postprint (author's final draft
Tracking and modeling focus of attention in meetings [online]
Abstract
This thesis addresses the problem of tracking the focus of
attention of people. In particular, a system to track the focus
of attention of participants in meetings is developed. Obtaining
knowledge about a person\u27s focus of attention is an important
step towards a better understanding of what people do, how and
with what or whom they interact or to what they refer. In
meetings, focus of attention can be used to disambiguate the
addressees of speech acts, to analyze interaction and for
indexing of meeting transcripts. Tracking a user\u27s focus of
attention also greatly contributes to the improvement of
humanÂcomputer interfaces since it can be used to build interfaces
and environments that become aware of what the user is paying
attention to or with what or whom he is interacting.
The direction in which people look; i.e., their gaze, is closely
related to their focus of attention. In this thesis, we estimate
a subject\u27s focus of attention based on his or her head
orientation. While the direction in which someone looks is
determined by head orientation and eye gaze, relevant literature
suggests that head orientation alone is a su#cient cue for the
detection of someone\u27s direction of attention during social
interaction. We present experimental results from a user study
and from several recorded meetings that support this hypothesis.
We have developed a Bayesian approach to model at whom or what
someone is look ing based on his or her head orientation. To
estimate head orientations in meetings, the participants\u27 faces
are automatically tracked in the view of a panoramic camera and
neural networks are used to estimate their head orientations
from preÂprocessed images of their faces. Using this approach,
the focus of attention target of subjects could be correctly
identified during 73% of the time in a number of evaluation meetÂ
ings with four participants.
In addition, we have investigated whether a person\u27s focus of
attention can be preÂdicted from other cues. Our results show
that focus of attention is correlated to who is speaking in a
meeting and that it is possible to predict a person\u27s focus of
attention
based on the information of who is talking or was talking before
a given moment.
We have trained neural networks to predict at whom a person is
looking, based on information about who was speaking. Using this
approach we were able to predict who is looking at whom with 63%
accuracy on the evaluation meetings using only information about
who was speaking. We show that by using both head orientation
and speaker information to estimate a person\u27s focus, the
accuracy of focus detection can be improved compared to just
using one of the modalities for focus estimation.
To demonstrate the generality of our approach, we have built a
prototype system to demonstrate focusÂaware interaction with a
household robot and other smart appliances in a room using the
developed components for focus of attention tracking. In the
demonstration environment, a subject could interact with a
simulated household robot, a speechÂenabled VCR or with other
people in the room, and the recipient of the subject\u27s speech
was disambiguated based on the user\u27s direction of attention.
Zusammenfassung
Die vorliegende Arbeit beschäftigt sich mit der automatischen
Bestimmung und VerÂfolgung des Aufmerksamkeitsfokus von Personen
in Besprechungen.
Die Bestimmung des Aufmerksamkeitsfokus von Personen ist zum
Verständnis und zur automatischen Auswertung von
Besprechungsprotokollen sehr wichtig. So kann damit
beispielsweise herausgefunden werden, wer zu einem bestimmten
Zeitpunkt wen angesprochen hat beziehungsweise wer wem zugehört
hat. Die automatische BestimÂmung des Aufmerksamkeitsfokus kann
desweiteren zur Verbesserung von Mensch-MaschineÂSchnittstellen
benutzt werden.
Ein wichtiger Hinweis auf die Richtung, in welche eine Person
ihre Aufmerksamkeit richtet, ist die Kopfstellung der Person.
Daher wurde ein Verfahren zur Bestimmung der Kopfstellungen von
Personen entwickelt. Hierzu wurden kĂĽnstliche neuronale Netze
benutzt, welche als Eingaben vorverarbeitete Bilder des Kopfes
einer Person erhalten, und als Ausgabe eine Schätzung der
Kopfstellung berechnen. Mit den trainierten Netzen wurde auf
Bilddaten neuer Personen, also Personen, deren Bilder nicht in
der Trainingsmenge enthalten waren, ein mittlerer Fehler von
neun bis zehn Grad fĂĽr die Bestimmung der horizontalen und
vertikalen Kopfstellung erreicht.
Desweiteren wird ein probabilistischer Ansatz zur Bestimmung von
AufmerksamkeitsÂzielen vorgestellt. Es wird hierbei ein
Bayes\u27scher Ansatzes verwendet um die AÂposterior
iWahrscheinlichkeiten verschiedener Aufmerksamkteitsziele,
gegeben beobachteter Kopfstellungen einer Person, zu bestimmen.
Die entwickelten Ansätze wurden auf mehren Besprechungen mit
vier bis fĂĽnf Teilnehmern evaluiert.
Ein weiterer Beitrag dieser Arbeit ist die Untersuchung,
inwieweit sich die BlickrichÂtung der Besprechungsteilnehmer
basierend darauf, wer gerade spricht, vorhersagen läßt. Es wurde
ein Verfahren entwickelt um mit Hilfe von neuronalen Netzen den
Fokus einer Person basierend auf einer kurzen Historie der
Sprecherkonstellationen zu schätzen.
Wir zeigen, dass durch Kombination der bildbasierten und der
sprecherbasierten Schätzung des Aufmerksamkeitsfokus eine
deutliche verbesserte Schätzung erreicht werden kann.
Insgesamt wurde mit dieser Arbeit erstmals ein System
vorgestellt um automatisch die Aufmerksamkeit von Personen in
einem Besprechungsraum zu verfolgen.
Die entwickelten Ansätze und Methoden können auch zur Bestimmung
der AufmerkÂsamkeit von Personen in anderen Bereichen,
insbesondere zur Steuerung von computÂerisierten, interaktiven
Umgebungen, verwendet werden. Dies wird an einer
Beispielapplikation gezeigt
Computational intelligence approaches to robotics, automation, and control [Volume guest editors]
No abstract available
Robust 3D IMU-LIDAR Calibration and Multi Sensor Probabilistic State Estimation
Autonomous robots are highly complex systems. In order to operate in dynamic environments, adaptability in their decision-making algorithms is a must. Thus, the internal and external information that robots obtain from sensors is critical to re-evaluate their decisions in real time. Accuracy is key in this endeavor, both from the hardware side and the modeling point of view. In order to guarantee the highest performance, sensors need to be correctly calibrated. To this end, some parameters are tuned so that the particular realization of a sensor best matches a generalized mathematical model. This step grows in complexity with the integration of multiple sensors, which is generally a requirement in order to cope with the dynamic nature of real world applications. This project aims to deal with the calibration of an inertial measurement unit, or IMU, and a Light Detection and Ranging device, or LiDAR. An offline batch optimization procedure is proposed to optimally estimate the intrinsic and extrinsic parameters of the model. Then, an online state estimation module that makes use of the aforementioned parameters and the fusion of LiDAR-inertial data for local navigation is proposed. Additionally, it incorporates real time corrections to account for the time-varying nature of the model, essential to deal with exposure
to continued operation and wear and tear. Keywords: sensor fusion, multi-sensor calibration, factor graphs, batch optimization, Gaussian Processes, state estimation, LiDAR-inertial odometry, Error State Kalman Filter, Normal Distributions Transform
Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery
One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions
Grasping, Perching, And Visual Servoing For Micro Aerial Vehicles
Micro Aerial Vehicles (MAVs) have seen a dramatic growth in the consumer market because of their ability to provide new vantage points for aerial photography and videography. However, there is little consideration for physical interaction with the environment surrounding them. Onboard manipulators are absent, and onboard perception, if existent, is used to avoid obstacles and maintain a minimum distance from them. There are many applications, however, which would benefit greatly from aerial manipulation or flight in close proximity to structures. This work is focused on facilitating these types of close interactions between quadrotors and surrounding objects. We first explore high-speed grasping, enabling a quadrotor to quickly grasp an object while moving at a high relative velocity. Next, we discuss planning and control strategies, empowering a quadrotor to perch on vertical surfaces using a downward-facing gripper. Then, we demonstrate that such interactions can be achieved using only onboard sensors by incorporating vision-based control and vision-based planning. In particular, we show how a quadrotor can use a single camera and an Inertial Measurement Unit (IMU) to perch on a cylinder. Finally, we generalize our approach to consider objects in motion, and we present relative pose estimation and planning, enabling tracking of a moving sphere using only an onboard camera and IMU
Comparison of interaction modalities for mobile indoor robot guidance : direct physical interaction, person following, and pointing control
© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksThree advanced natural interaction modalities for mobile robot guidance in an indoor environment were developed and compared using two tasks and quantitative metrics to measure performance and workload. The first interaction modality is based on direct physical interaction requiring the human user to push the robot in order to displace it. The second and third interaction modalities exploit a 3-D vision-based human-skeleton tracking allowing the user to guide the robot by either walking in front of it or by pointing toward a desired location. In the first task, the participants were asked to guide the robot between different rooms in a simulated physical apartment requiring rough movement of the robot through designated areas. The second task evaluated robot guidance in the same environment through a set of waypoints, which required accurate movements. The three interaction modalities were implemented on a generic differential drive mobile platform equipped with a pan-tilt system and a Kinect camera. Task completion time and accuracy were used as metrics to assess the users’ performance, while the NASA-TLX questionnaire was used to evaluate the users’ workload. A study with 24 participants indicated that choice of interaction modality had significant effect on completion time (F(2,61)=84.874, p<0.001), accuracy (F(2,29)=4.937, p=0.016), and workload (F(2,68)=11.948, p<0.001). The direct physical interaction required less time, provided more accuracy and less workload than the two contactless interaction modalities. Between the two contactless interaction modalities, the person-following interaction mod- lity was systematically better than the pointing-control one: The participants completed the tasks faster with less workloadPeer ReviewedPostprint (author's final draft
- …