9,685 research outputs found

    A real-time human-robot interaction system based on gestures for assistive scenarios

    Get PDF
    Natural and intuitive human interaction with robotic systems is a key point to develop robots assisting people in an easy and effective way. In this paper, a Human Robot Interaction (HRI) system able to recognize gestures usually employed in human non-verbal communication is introduced, and an in-depth study of its usability is performed. The system deals with dynamic gestures such as waving or nodding which are recognized using a Dynamic Time Warping approach based on gesture specific features computed from depth maps. A static gesture consisting in pointing at an object is also recognized. The pointed location is then estimated in order to detect candidate objects the user may refer to. When the pointed object is unclear for the robot, a disambiguation procedure by means of either a verbal or gestural dialogue is performed. This skill would lead to the robot picking an object in behalf of the user, which could present difficulties to do it by itself. The overall system — which is composed by a NAO and Wifibot robots, a KinectTM v2 sensor and two laptops — is firstly evaluated in a structured lab setup. Then, a broad set of user tests has been completed, which allows to assess correct performance in terms of recognition rates, easiness of use and response times.Postprint (author's final draft

    Recognition and Estimation of Human Finger Pointing with an RGB Camera for Robot Directive

    Full text link
    In communication between humans, gestures are often preferred or complementary to verbal expression since the former offers better spatial referral. Finger pointing gesture conveys vital information regarding some point of interest in the environment. In human-robot interaction, a user can easily direct a robot to a target location, for example, in search and rescue or factory assistance. State-of-the-art approaches for visual pointing estimation often rely on depth cameras, are limited to indoor environments and provide discrete predictions between limited targets. In this paper, we explore the learning of models for robots to understand pointing directives in various indoor and outdoor environments solely based on a single RGB camera. A novel framework is proposed which includes a designated model termed PointingNet. PointingNet recognizes the occurrence of pointing followed by approximating the position and direction of the index finger. The model relies on a novel segmentation model for masking any lifted arm. While state-of-the-art human pose estimation models provide poor pointing angle estimation accuracy of 28deg, PointingNet exhibits mean accuracy of less than 2deg. With the pointing information, the target is computed followed by planning and motion of the robot. The framework is evaluated on two robotic systems yielding accurate target reaching

    ShakespeareĘĽs Complete Works as a Benchmark for Evaluating Multiscale Document-Navigation Techniques

    Get PDF
    International audienceIn this paper, we describe an experimental platform dedicated to the comparative evaluation of multiscale electronic-document navigation techniques. One noteworthy characteristics of our platform is that it allows the user not only to translate the document (for example, to pan and zoom) but also to tilt the virtual camera to obtain freely chosen perspective views of the document. Second, the platform makes it possible to explore, with semantic zooming, the 150,000 verses that comprise the complete works of William Shakespeare. We argue that reaching and selecting one specific verse in this very large text corpus amounts to a perfectly well defined Fitts task, leading to rigorous assessments of target acquisition performance. For lack of a standard, the various multiscale techniques that have been reported recently in the literature are difficult to compare. We recommend that Shakespeare's complete works, converted into a single document that can be zoomed both geometrically and semantically, be used as a benchmark to facilitate systematic experimental comparisons, using Fitts' target acquisition paradigm

    Shape basis interpretation for monocular deformable 3D reconstruction

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper, we propose a novel interpretable shape model to encode object non-rigidity. We first use the initial frames of a monocular video to recover a rest shape, used later to compute a dissimilarity measure based on a distance matrix measurement. Spectral analysis is then applied to this matrix to obtain a reduced shape basis, that in contrast to existing approaches, can be physically interpreted. In turn, these pre-computed shape bases are used to linearly span the deformation of a wide variety of objects. We introduce the low-rank basis into a sequential approach to recover both camera motion and non-rigid shape from the monocular video, by simply optimizing the weights of the linear combination using bundle adjustment. Since the number of parameters to optimize per frame is relatively small, specially when physical priors are considered, our approach is fast and can potentially run in real time. Validation is done in a wide variety of real-world objects, undergoing both inextensible and extensible deformations. Our approach achieves remarkable robustness to artifacts such as noisy and missing measurements and shows an improved performance to competing methods.Peer ReviewedPostprint (author's final draft

    Point Anywhere: Directed Object Estimation from Omnidirectional Images

    Full text link
    One of the intuitive instruction methods in robot navigation is a pointing gesture. In this study, we propose a method using an omnidirectional camera to eliminate the user/object position constraint and the left/right constraint of the pointing arm. Although the accuracy of skeleton and object detection is low due to the high distortion of equirectangular images, the proposed method enables highly accurate estimation by repeatedly extracting regions of interest from the equirectangular image and projecting them onto perspective images. Furthermore, we found that training the likelihood of the target object in machine learning further improves the estimation accuracy.Comment: Accepted to SIGGRAPH 2023 Poster. Project page: https://github.com/NKotani/PointAnywher

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Personalized Interaction with High-Resolution Wall Displays

    Get PDF
    Fallende Hardwarepreise sowie eine zunehmende Offenheit gegenüber neuartigen Interaktionsmodalitäten haben in den vergangen Jahren den Einsatz von wandgroßen interaktiven Displays möglich gemacht, und in der Folge ist ihre Anwendung, unter anderem in den Bereichen Visualisierung, Bildung, und der Unterstützung von Meetings, erfolgreich demonstriert worden. Aufgrund ihrer Größe sind Wanddisplays für die Interaktion mit mehreren Benutzern prädestiniert. Gleichzeitig kann angenommen werden, dass Zugang zu persönlichen Daten und Einstellungen — mithin personalisierte Interaktion — weiterhin essentieller Bestandteil der meisten Anwendungsfälle sein wird. Aktuelle Benutzerschnittstellen im Desktop- und Mobilbereich steuern Zugriffe über ein initiales Login. Die Annahme, dass es nur einen Benutzer pro Bildschirm gibt, zieht sich durch das gesamte System, und ermöglicht unter anderem den Zugriff auf persönliche Daten und Kommunikation sowie persönliche Einstellungen. Gibt es hingegen mehrere Benutzer an einem großen Bildschirm, müssen hierfür Alternativen gefunden werden. Die daraus folgende Forschungsfrage dieser Dissertation lautet: Wie können wir im Kontext von Mehrbenutzerinteraktion mit wandgroßen Displays personalisierte Schnittstellen zur Verfügung stellen? Die Dissertation befasst sich sowohl mit personalisierter Interaktion in der Nähe (mit Touch als Eingabemodalität) als auch in etwas weiterer Entfernung (unter Nutzung zusätzlicher mobiler Geräte). Grundlage für personalisierte Mehrbenutzerinteraktion sind technische Lösungen für die Zuordnung von Benutzern zu einzelnen Interaktionen. Hierzu werden zwei Alternativen untersucht: In der ersten werden Nutzer via Kamera verfolgt, und in der zweiten werden Mobilgeräte anhand von Ultraschallsignalen geortet. Darauf aufbauend werden Interaktionstechniken vorgestellt, die personalisierte Interaktion unterstützen. Diese nutzen zusätzliche Mobilgeräte, die den Zugriff auf persönliche Daten sowie Interaktion in einigem Abstand von der Displaywand ermöglichen. Einen weiteren Teil der Arbeit bildet die Untersuchung der praktischen Auswirkungen der Ausgabe- und Interaktionsmodalitäten für personalisierte Interaktion. Hierzu wird eine qualitative Studie vorgestellt, die Nutzerverhalten anhand des kooperativen Mehrbenutzerspiels Miners analysiert. Der abschließende Beitrag beschäftigt sich mit dem Analyseprozess selber: Es wird das Analysetoolkit für Wandinteraktionen GIAnT vorgestellt, das Nutzerbewegungen, Interaktionen, und Blickrichtungen visualisiert und dadurch die Untersuchung der Interaktionen stark vereinfacht.An increasing openness for more diverse interaction modalities as well as falling hardware prices have made very large interactive vertical displays more feasible, and consequently, applications in settings such as visualization, education, and meeting support have been demonstrated successfully. Their size makes wall displays inherently usable for multi-user interaction. At the same time, we can assume that access to personal data and settings, and thus personalized interaction, will still be essential in most use-cases. In most current desktop and mobile user interfaces, access is regulated via an initial login and the complete user interface is then personalized to this user: Access to personal data, configurations and communications all assume a single user per screen. In the case of multiple people using one screen, this is not a feasible solution and we must find alternatives. Therefore, this thesis addresses the research question: How can we provide personalized interfaces in the context of multi-user interaction with wall displays? The scope spans personalized interaction both close to the wall (using touch as input modality) and further away (using mobile devices). Technical solutions that identify users at each interaction can replace logins and enable personalized interaction for multiple users at once. This thesis explores two alternative means of user identification: Tracking using RGB+depth-based cameras and leveraging ultrasound positioning of the users' mobile devices. Building on this, techniques that support personalized interaction using personal mobile devices are proposed. In the first contribution on interaction, HyDAP, we examine pointing from the perspective of moving users, and in the second, SleeD, we propose using an arm-worn device to facilitate access to private data and personalized interface elements. Additionally, the work contributes insights on practical implications of personalized interaction at wall displays: We present a qualitative study that analyses interaction using a multi-user cooperative game as application case, finding awareness and occlusion issues. The final contribution is a corresponding analysis toolkit that visualizes users' movements, touch interactions and gaze points when interacting with wall displays and thus allows fine-grained investigation of the interactions
    • …
    corecore