2,949 research outputs found
Integrating 2D Mouse Emulation with 3D Manipulation for Visualizations on a Multi-Touch Table
We present the Rizzo, a multi-touch virtual mouse that has been designed to provide the fine grained interaction for information visualization on a multi-touch table. Our solution enables touch interaction for existing mouse-based visualizations. Previously, this transition to a multi-touch environment was difficult because the mouse emulation of touch surfaces is often insufficient to provide full information visualization functionality. We present a unified design, combining many Rizzos that have been designed not only to provide mouse capabilities but also to act as zoomable lenses that make precise information access feasible. The Rizzos and the information visualizations all exist within a touch-enabled 3D window management system. Our approach permits touch interaction with both the 3D windowing environment as well as with the contents of the individual windows contained therein. We describe an implementation of our technique that augments the VisLink 3D visualization environment to demonstrate how to enable multi-touch capabilities on all visualizations written with the popular prefuse visualization toolkit.
Analysis of Hand Segmentation in the Wild
A large number of works in egocentric vision have concentrated on action and
object recognition. Detection and segmentation of hands in first-person videos,
however, has less been explored. For many applications in this domain, it is
necessary to accurately segment not only hands of the camera wearer but also
the hands of others with whom he is interacting. Here, we take an in-depth look
at the hand segmentation problem. In the quest for robust hand segmentation
methods, we evaluated the performance of the state of the art semantic
segmentation methods, off the shelf and fine-tuned, on existing datasets. We
fine-tune RefineNet, a leading semantic segmentation method, for hand
segmentation and find that it does much better than the best contenders.
Existing hand segmentation datasets are collected in the laboratory settings.
To overcome this limitation, we contribute by collecting two new datasets: a)
EgoYouTubeHands including egocentric videos containing hands in the wild, and
b) HandOverFace to analyze the performance of our models in presence of similar
appearance occlusions. We further explore whether conditional random fields can
help refine generated hand segmentations. To demonstrate the benefit of
accurate hand maps, we train a CNN for hand-based activity recognition and
achieve higher accuracy when a CNN was trained using hand maps produced by the
fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for
fine-grained action recognition and show that an accuracy of 58.6% can be
achieved by just looking at a single hand pose which is much better than the
chance level (12.5%).Comment: Accepted at CVPR 201
Interactive Manipulation of 3D Scene Projections
Linear perspective is a good approximation to the format in which the human visual system conveys 3D scene information to the brain. Artists expressing 3D scenes, however, create nonlinear projections that balance their linear perspective view of a scene with elements of aesthetic style, layout and relative importance of scene objects. Manipulating the many parameters of a linear perspective camera to achieve a desired view is not easy. Controlling and combining mul-tiple such cameras to specify a nonlinear projection is an even more cumbersome task. This paper presents a direct interface, where an artist manipulates in 2D the desired projection of a few features of the 3D scene. The features represent a rich set of constraints which define the overall projection of the 3D scene. Desirable properties of local linear perspective and global scene coherence drive a heuristic algorithm that attempts to interactively satisfy the sketched constraints as a weight-averaged projection of a minimal set of linear perspective cameras. This paper shows that 2D fea-ture constraints are a direct and effective approach to control both the 2D layout of scene objects and the conceptually complex, high dimensional parameter space of nonlinear scene projection. The simplicity of our interface also makes it an appealing alternative to standard through-the-lens and widget based techniques to control a single linear perspective camera
Exploring the Multi-touch Interaction Design Space for 3D Virtual Objects to Support Procedural Training Tasks
Multi-touch interaction has the potential to be an important input method for realistic training in 3D environments. However, multi-touch interaction has not been explored much in 3D tasks, especially when trying to leverage realistic, real-world interaction paradigms. A systematic inquiry into what realistic gestures look like for 3D environments is required to understand how users translate real-world motions to multi-touch motions. Once those gestures are defined, it is important to see how we can leverage those gestures to enhance training tasks. In order to explore the interaction design space for 3D virtual objects, we began by conducting our first study exploring user-defined gestures. From this work we identified a taxonomy and design guidelines for 3D multi-touch gestures and how perspective view plays a role in the chosen gesture. We also identified a desire to use pressure on capacitive touch screens. Since the best way to implement pressure still required some investigation, our second study evaluated two different pressure estimation techniques in two different scenarios. Once we had a taxonomy of gestures we wanted to examine whether implementing these realistic multi-touch interactions in a training environment provided training benefits. Our third study compared multi-touch interaction to standard 2D mouse interaction and to actual physical training and found that multi-touch interaction performed better than 2D mouse and as well as physical training. This study showed us that multi-touch training using a realistic gesture set can perform as well as training on the actual apparatus. One limitation of the first training study was that the user had constrained perspective to allow for us to focus on isolating the gestures. Since users can change their perspective in a real life training scenario and therefore gain spatial knowledge of components, we wanted to see if allowing users to alter their perspective helped or hindered training. Our final study compared training with Unconstrained multi-touch interaction, Constrained multi-touch interaction, or training on the actual physical apparatus. Results show that the Unconstrained multi-touch interaction and the Physical groups had significantly better performance scores than the Constrained multi-touch interaction group, with no significant difference between the Unconstrained multi-touch and Physical groups. Our results demonstrate that allowing users more freedom to manipulate objects as they would in the real world benefits training. In addition to the research already performed, we propose several avenues for future research into the interaction design space for 3D virtual objects that we believe will be of value to researchers and designers of 3D multi-touch training environments
Where To Start? Transferring Simple Skills to Complex Environments
Robot learning provides a number of ways to teach robots simple skills, such
as grasping. However, these skills are usually trained in open, clutter-free
environments, and therefore would likely cause undesirable collisions in more
complex, cluttered environments. In this work, we introduce an affordance model
based on a graph representation of an environment, which is optimised during
deployment to find suitable robot configurations to start a skill from, such
that the skill can be executed without any collisions. We demonstrate that our
method can generalise a priori acquired skills to previously unseen cluttered
and constrained environments, in simulation and in the real world, for both a
grasping and a placing task.Comment: Accepted at CoRL 2022. Videos are available on our project webpage at
https://www.robot-learning.uk/where-to-star
Shapes and Context: In-the-Wild Image Synthesis & Manipulation
We introduce a data-driven approach for interactively synthesizing
in-the-wild images from semantic label maps. Our approach is dramatically
different from recent work in this space, in that we make use of no learning.
Instead, our approach uses simple but classic tools for matching scene context,
shapes, and parts to a stored library of exemplars. Though simple, this
approach has several notable advantages over recent work: (1) because nothing
is learned, it is not limited to specific training data distributions (such as
cityscapes, facades, or faces); (2) it can synthesize arbitrarily
high-resolution images, limited only by the resolution of the exemplar library;
(3) by appropriately composing shapes and parts, it can generate an
exponentially large set of viable candidate output images (that can say, be
interactively searched by a user). We present results on the diverse COCO
dataset, significantly outperforming learning-based approaches on standard
image synthesis metrics. Finally, we explore user-interaction and
user-controllability, demonstrating that our system can be used as a platform
for user-driven content creation.Comment: Project Page: http://www.cs.cmu.edu/~aayushb/OpenShapes
- …