Search CORE

5,967 research outputs found

Learning to Recognize Touch Gestures: Recurrent vs. Convolutional Features and Dynamic Sampling

Author: Arné Julien
Canu Stephane
Debard Quentin
Wolf Christian
Publication venue: HAL CCSD
Publication date: 15/05/2018
Field of study

International audienceWe propose a fully automatic method for learning gestures on big touch devices in a potentially multiuser context. The goal is to learn general models capable of adapting to different gestures, user styles and hardware variations (e.g. device sizes, sampling frequencies and regularities). Based on deep neural networks, our method features a novel dynamic sampling and temporal normalization component, transforming variable length gestures into fixed length representations while preserving finger/surface contact transitions, that is, the topology of the signal. This sequential representation is then processed with a convolutional model capable, unlike recurrent networks, of learning hierarchical representations with different levels of abstraction. To demonstrate the interest of the proposed method, we introduce a new touch gestures dataset with 6591 gestures performed by 27 people, which is, up to our knowledge, the first of its kind: a publicly available multi-touch gesture dataset for interaction. We also tested our method on a standard dataset of symbolic touch gesture recognition, the MMG dataset, outperforming the state of the art and reporting close to perfect performance

Learning to recognize touch gestures: recurrent vs. convolutional features and dynamic sampling

Author: Arné Julien
Canu Stéphane
Debard Quentin
Wolf Christian
Publication venue
Publication date: 19/02/2018
Field of study

We propose a fully automatic method for learning gestures on big touch devices in a potentially multi-user context. The goal is to learn general models capable of adapting to different gestures, user styles and hardware variations (e.g. device sizes, sampling frequencies and regularities). Based on deep neural networks, our method features a novel dynamic sampling and temporal normalization component, transforming variable length gestures into fixed length representations while preserving finger/surface contact transitions, that is, the topology of the signal. This sequential representation is then processed with a convolutional model capable, unlike recurrent networks, of learning hierarchical representations with different levels of abstraction. To demonstrate the interest of the proposed method, we introduce a new touch gestures dataset with 6591 gestures performed by 27 people, which is, up to our knowledge, the first of its kind: a publicly available multi-touch gesture dataset for interaction. We also tested our method on a standard dataset of symbolic touch gesture recognition, the MMG dataset, outperforming the state of the art and reporting close to perfect performance.Comment: 9 pages, 4 figures, accepted at the 13th IEEE Conference on Automatic Face and Gesture Recognition (FG2018). Dataset available at http://itekube7.itekube.co

arXiv.org e-Print Archive

HAL - Normandie Université

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

ImageSpirit: Verbal Guided Image Parsing

Author: Cheng Ming-Ming
Crook Nigel
Lin Wen-Yan
Mitra Niloy
Sturgess Paul
Torr Philip
Vineet Vibhav
Warrell Jonathan
Zheng Shuai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixel. In this paper we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interests enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g. smart phones, Google Glass, living room devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the tradeoffs compared to traditional mouse based interactions, results are reported for both a large scale quantitative evaluation and a user study.Comment: http://mmcheng.net/imagespirit

arXiv.org e-Print Archive

CiteSeerX

Institutional Knowledge at Singapore Management University

UCL Discovery

Oxford Brookes University: RADAR

Learning 3D Navigation Protocols on Touch Interfaces with Cooperative Multi-Agent Reinforcement Learning

Author: D Bowman
DA Bowman
FR Ortega
J Cashion
K Hinckley
Publication venue
Publication date: 27/08/2019
Field of study

Using touch devices to navigate in virtual 3D environments such as computer assisted design (CAD) models or geographical information systems (GIS) is inherently difficult for humans, as the 3D operations have to be performed by the user on a 2D touch surface. This ill-posed problem is classically solved with a fixed and handcrafted interaction protocol, which must be learned by the user. We propose to automatically learn a new interaction protocol allowing to map a 2D user input to 3D actions in virtual environments using reinforcement learning (RL). A fundamental problem of RL methods is the vast amount of interactions often required, which are difficult to come by when humans are involved. To overcome this limitation, we make use of two collaborative agents. The first agent models the human by learning to perform the 2D finger trajectories. The second agent acts as the interaction protocol, interpreting and translating to 3D operations the 2D finger trajectories from the first agent. We restrict the learned 2D trajectories to be similar to a training set of collected human gestures by first performing state representation learning, prior to reinforcement learning. This state representation learning is addressed by projecting the gestures into a latent space learned by a variational auto encoder (VAE).Comment: 17 pages, 8 figures. Accepted at The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2019 (ECMLPKDD 2019

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

Freeform User Interfaces for Graphical Computing

Author: 五十嵐健夫
Publication venue: 東京大学大学院工学系研究科情報工学専攻
Publication date: 29/03/2000
Field of study

報告番号: 甲15222 ; 学位授与年月日: 2000-03-29 ; 学位の種別: 課程博士 ; 学位の種類: 博士(工学) ; 学位記番号: 博工第4717号 ; 研究科・専攻: 工学系研究科情報工学専

UT Repository