2,949 research outputs found

    Integrating 2D Mouse Emulation with 3D Manipulation for Visualizations on a Multi-Touch Table

    Get PDF
    We present the Rizzo, a multi-touch virtual mouse that has been designed to provide the fine grained interaction for information visualization on a multi-touch table. Our solution enables touch interaction for existing mouse-based visualizations. Previously, this transition to a multi-touch environment was difficult because the mouse emulation of touch surfaces is often insufficient to provide full information visualization functionality. We present a unified design, combining many Rizzos that have been designed not only to provide mouse capabilities but also to act as zoomable lenses that make precise information access feasible. The Rizzos and the information visualizations all exist within a touch-enabled 3D window management system. Our approach permits touch interaction with both the 3D windowing environment as well as with the contents of the individual windows contained therein. We describe an implementation of our technique that augments the VisLink 3D visualization environment to demonstrate how to enable multi-touch capabilities on all visualizations written with the popular prefuse visualization toolkit.

    Analysis of Hand Segmentation in the Wild

    Full text link
    A large number of works in egocentric vision have concentrated on action and object recognition. Detection and segmentation of hands in first-person videos, however, has less been explored. For many applications in this domain, it is necessary to accurately segment not only hands of the camera wearer but also the hands of others with whom he is interacting. Here, we take an in-depth look at the hand segmentation problem. In the quest for robust hand segmentation methods, we evaluated the performance of the state of the art semantic segmentation methods, off the shelf and fine-tuned, on existing datasets. We fine-tune RefineNet, a leading semantic segmentation method, for hand segmentation and find that it does much better than the best contenders. Existing hand segmentation datasets are collected in the laboratory settings. To overcome this limitation, we contribute by collecting two new datasets: a) EgoYouTubeHands including egocentric videos containing hands in the wild, and b) HandOverFace to analyze the performance of our models in presence of similar appearance occlusions. We further explore whether conditional random fields can help refine generated hand segmentations. To demonstrate the benefit of accurate hand maps, we train a CNN for hand-based activity recognition and achieve higher accuracy when a CNN was trained using hand maps produced by the fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for fine-grained action recognition and show that an accuracy of 58.6% can be achieved by just looking at a single hand pose which is much better than the chance level (12.5%).Comment: Accepted at CVPR 201

    Interactive Manipulation of 3D Scene Projections

    Get PDF
    Linear perspective is a good approximation to the format in which the human visual system conveys 3D scene information to the brain. Artists expressing 3D scenes, however, create nonlinear projections that balance their linear perspective view of a scene with elements of aesthetic style, layout and relative importance of scene objects. Manipulating the many parameters of a linear perspective camera to achieve a desired view is not easy. Controlling and combining mul-tiple such cameras to specify a nonlinear projection is an even more cumbersome task. This paper presents a direct interface, where an artist manipulates in 2D the desired projection of a few features of the 3D scene. The features represent a rich set of constraints which define the overall projection of the 3D scene. Desirable properties of local linear perspective and global scene coherence drive a heuristic algorithm that attempts to interactively satisfy the sketched constraints as a weight-averaged projection of a minimal set of linear perspective cameras. This paper shows that 2D fea-ture constraints are a direct and effective approach to control both the 2D layout of scene objects and the conceptually complex, high dimensional parameter space of nonlinear scene projection. The simplicity of our interface also makes it an appealing alternative to standard through-the-lens and widget based techniques to control a single linear perspective camera

    Exploring the Multi-touch Interaction Design Space for 3D Virtual Objects to Support Procedural Training Tasks

    Get PDF
    Multi-touch interaction has the potential to be an important input method for realistic training in 3D environments. However, multi-touch interaction has not been explored much in 3D tasks, especially when trying to leverage realistic, real-world interaction paradigms. A systematic inquiry into what realistic gestures look like for 3D environments is required to understand how users translate real-world motions to multi-touch motions. Once those gestures are defined, it is important to see how we can leverage those gestures to enhance training tasks. In order to explore the interaction design space for 3D virtual objects, we began by conducting our first study exploring user-defined gestures. From this work we identified a taxonomy and design guidelines for 3D multi-touch gestures and how perspective view plays a role in the chosen gesture. We also identified a desire to use pressure on capacitive touch screens. Since the best way to implement pressure still required some investigation, our second study evaluated two different pressure estimation techniques in two different scenarios. Once we had a taxonomy of gestures we wanted to examine whether implementing these realistic multi-touch interactions in a training environment provided training benefits. Our third study compared multi-touch interaction to standard 2D mouse interaction and to actual physical training and found that multi-touch interaction performed better than 2D mouse and as well as physical training. This study showed us that multi-touch training using a realistic gesture set can perform as well as training on the actual apparatus. One limitation of the first training study was that the user had constrained perspective to allow for us to focus on isolating the gestures. Since users can change their perspective in a real life training scenario and therefore gain spatial knowledge of components, we wanted to see if allowing users to alter their perspective helped or hindered training. Our final study compared training with Unconstrained multi-touch interaction, Constrained multi-touch interaction, or training on the actual physical apparatus. Results show that the Unconstrained multi-touch interaction and the Physical groups had significantly better performance scores than the Constrained multi-touch interaction group, with no significant difference between the Unconstrained multi-touch and Physical groups. Our results demonstrate that allowing users more freedom to manipulate objects as they would in the real world benefits training. In addition to the research already performed, we propose several avenues for future research into the interaction design space for 3D virtual objects that we believe will be of value to researchers and designers of 3D multi-touch training environments

    Where To Start? Transferring Simple Skills to Complex Environments

    Full text link
    Robot learning provides a number of ways to teach robots simple skills, such as grasping. However, these skills are usually trained in open, clutter-free environments, and therefore would likely cause undesirable collisions in more complex, cluttered environments. In this work, we introduce an affordance model based on a graph representation of an environment, which is optimised during deployment to find suitable robot configurations to start a skill from, such that the skill can be executed without any collisions. We demonstrate that our method can generalise a priori acquired skills to previously unseen cluttered and constrained environments, in simulation and in the real world, for both a grasping and a placing task.Comment: Accepted at CoRL 2022. Videos are available on our project webpage at https://www.robot-learning.uk/where-to-star

    Shapes and Context: In-the-Wild Image Synthesis & Manipulation

    Full text link
    We introduce a data-driven approach for interactively synthesizing in-the-wild images from semantic label maps. Our approach is dramatically different from recent work in this space, in that we make use of no learning. Instead, our approach uses simple but classic tools for matching scene context, shapes, and parts to a stored library of exemplars. Though simple, this approach has several notable advantages over recent work: (1) because nothing is learned, it is not limited to specific training data distributions (such as cityscapes, facades, or faces); (2) it can synthesize arbitrarily high-resolution images, limited only by the resolution of the exemplar library; (3) by appropriately composing shapes and parts, it can generate an exponentially large set of viable candidate output images (that can say, be interactively searched by a user). We present results on the diverse COCO dataset, significantly outperforming learning-based approaches on standard image synthesis metrics. Finally, we explore user-interaction and user-controllability, demonstrating that our system can be used as a platform for user-driven content creation.Comment: Project Page: http://www.cs.cmu.edu/~aayushb/OpenShapes
    corecore