Search CORE

2,100 research outputs found

A CNN Based Framework for Unistroke Numeral Recognition in Air-Writing

Author: Ghosh Subhankar
Pal Umapada
Roy Prasun
Publication venue
Publication date: 14/03/2023
Field of study

Air-writing refers to virtually writing linguistic characters through hand gestures in three-dimensional space with six degrees of freedom. This paper proposes a generic video camera-aided convolutional neural network (CNN) based air-writing framework. Gestures are performed using a marker of fixed color in front of a generic video camera, followed by color-based segmentation to identify the marker and track the trajectory of the marker tip. A pre-trained CNN is then used to classify the gesture. The recognition accuracy is further improved using transfer learning with the newly acquired data. The performance of the system varies significantly on the illumination condition due to color-based segmentation. In a less fluctuating illumination condition, the system is able to recognize isolated unistroke numerals of multiple languages. The proposed framework has achieved 97.7%, 95.4% and 93.7% recognition rates in person independent evaluations on English, Bengali and Devanagari numerals, respectively.Comment: Accepted in The International Conference on Frontiers of Handwriting Recognition (ICFHR) 201

arXiv.org e-Print Archive

Collaborative robot control with hand gestures

Author: Baccar Ali
Publication venue
Publication date: 01/01/2021
Field of study

Mestrado de dupla diplomação com a Université Libre de TunisThis thesis focuses on hand gesture recognition by proposing an architecture to control a collaborative robot in real-time vision based on hand detection, tracking, and gesture recognition for interaction with an application via hand gestures. The first stage of our system allows detecting and tracking a bar e hand in a cluttered background using skin detection and contour comparison. The second stage allows recognizing hand gestures using a Machine learning method algorithm. Finally an interface has been developed to control the robot over. Our hand gesture recognition system consists of two parts, in the first part for every frame captured from a camera we extract the keypoints for every training image using a machine learning algorithm, and we appoint the keypoints from every image into a keypoint map. This map is treated as an input for our processing algorithm which uses several methods to recognize the fingers in each hand. In the second part, we use a 3D camera with Infrared capabilities to get a 3D model of the hand to implement it in our system, after that we track the fingers in each hand and recognize them which made it possible to count the extended fingers and to distinguish each finger pattern. An interface to control the robot has been made that utilizes the previous steps that gives a real-time process and a dynamic 3D representation.Esta dissertação trata do reconhecimento de gestos realizados com a mão humana, propondo uma arquitetura para interagir com um robô colaborativo, baseado em visão computacional, rastreamento e reconhecimento de gestos. O primeiro estágio do sistema desenvolvido permite detectar e rastrear a presença de uma mão em um fundo desordenado usando detecção de pele e comparação de contornos. A segunda fase permite reconhecer os gestos das mãos usando um algoritmo do método de aprendizado de máquina. Finalmente, uma interface foi desenvolvida para interagir com robô. O sistema de reconhecimento de gestos manuais está dividido em duas partes. Na primeira parte, para cada quadro capturado de uma câmera, foi extraído os pontos-chave de cada imagem de treinamento usando um algoritmo de aprendizado de máquina e nomeamos os pontos-chave de cada imagem em um mapa de pontos-chave. Este mapa é tratado como uma entrada para o algoritmo de processamento que usa vários métodos para reconhecer os dedos em cada mão. Na segunda parte, foi utilizado uma câmera 3D com recursos de infravermelho para obter um modelo 3D da mão para implementá-lo em no sistema desenvolvido, e então, foi realizado os rastreio dos dedos de cada mão seguido pelo reconhecimento que possibilitou contabilizar os dedos estendidos e para distinguir cada padrão de dedo. Foi elaborado uma interface para interagir com o robô manipulador que utiliza as etapas anteriores que fornece um processo em tempo real e uma representação 3D dinâmica

Biblioteca Digital do IPB

To Draw or Not to Draw: Recognizing Stroke-Hover Intent in Gesture-Free Bare-Hand Mid-Air Drawing Tasks

Author: Bohari Umema Hakimuddin
Publication venue
Publication date: 18/01/2019
Field of study

Over the past several decades, technological advancements have introduced new modes of communication with the computers, introducing a shift from traditional mouse and keyboard interfaces. While touch based interactions are abundantly being used today, latest developments in computer vision, body tracking stereo cameras, and augmented and virtual reality have now enabled communicating with the computers using spatial input in the physical 3D space. These techniques are now being integrated into several design critical tasks like sketching, modeling, etc. through sophisticated methodologies and use of specialized instrumented devices. One of the prime challenges in design research is to make this spatial interaction with the computer as intuitive as possible for the users. Drawing curves in mid-air with fingers, is a fundamental task with applications to 3D sketching, geometric modeling, handwriting recognition, and authentication. Sketching in general, is a crucial mode for effective idea communication between designers. Mid-air curve input is typically accomplished through instrumented controllers, specific hand postures, or pre-defined hand gestures, in presence of depth and motion sensing cameras. The user may use any of these modalities to express the intention to start or stop sketching. However, apart from suffering with issues like lack of robustness, the use of such gestures, specific postures, or the necessity of instrumented controllers for design specific tasks further result in an additional cognitive load on the user. To address the problems associated with different mid-air curve input modalities, the presented research discusses the design, development, and evaluation of data driven models for intent recognition in non-instrumented, gesture-free, bare-hand mid-air drawing tasks. The research is motivated by a behavioral study that demonstrates the need for such an approach due to the lack of robustness and intuitiveness while using hand postures and instrumented devices. The main objective is to study how users move during mid-air sketching, develop qualitative insights regarding such movements, and consequently implement a computational approach to determine when the user intends to draw in mid-air without the use of an explicit mechanism (such as an instrumented controller or a specified hand-posture). By recording the user’s hand trajectory, the idea is to simply classify this point as either hover or stroke. The resulting model allows for the classification of points on the user’s spatial trajectory. Drawing inspiration from the way users sketch in mid-air, this research first specifies the necessity for an alternate approach for processing bare hand mid-air curves in a continuous fashion. Further, this research presents a novel drawing intent recognition work flow for every recorded drawing point, using three different approaches. We begin with recording mid-air drawing data and developing a classification model based on the extracted geometric properties of the recorded data. The main goal behind developing this model is to identify drawing intent from critical geometric and temporal features. In the second approach, we explore the variations in prediction quality of the model by improving the dimensionality of data used as mid-air curve input. Finally, in the third approach, we seek to understand the drawing intention from mid-air curves using sophisticated dimensionality reduction neural networks such as autoencoders. Finally, the broad level implications of this research are discussed, with potential development areas in the design and research of mid-air interactions

Texas A&M Repository

Deep Belief Networks for Recognizing Handwriting Captured by Leap Motion Controller

Author: Pulungan Reza
Setiawan Abas
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2018
Field of study

Leap Motion controller is an input device that can track hands and fingers position quickly and precisely. In some gaming environment, a need may arise to capture letters written in the air by Leap Motion, which cannot be directly done right now. In this paper, we propose an approach to capture and recognize which letter has been drawn by the user with Leap Motion. This approach is based on Deep Belief Networks (DBN) with Resilient Backpropagation (Rprop) fine-tuning. To assess the performance of our proposed approach, we conduct experiments involving 30,000 samples of handwritten capital letters, 8,000 of which are to be recognized. Our experiments indicate that DBN with Rprop achieves an accuracy of 99.71%, which is better than DBN with Backpropagation or Multi-Layer Perceptron (MLP), either with Backpropagation or with Rprop. Our experiments also show that Rprop makes the process of fine-tuning significantly faster and results in a much more accurate recognition compared to ordinary Backpropagation. The time needed to recognize a letter is in the order of 5,000 microseconds, which is excellent even for online gaming experience

IAES journal

Crossref

ZENODO

Institute of Advanced Engineering and Science

Accessible options for deaf people in e-Learning platforms: technology solutions for sign language translation

Author: Francisco Manuela
Martins Paulo
Morgado Leonel
Rocha Tânia
Rodrigues Henrique
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

AbstractThis paper presents a study on potential technology solutions for enhancing the communication process for deaf people on e-learning platforms through translation of Sign Language (SL). Considering SL in its global scope as a spatial-visual language not limited to gestures or hand/forearm movement, but also to other non-dexterity markers such as facial expressions, it is necessary to ascertain whether the existing technology solutions can be effective options for the SL integration on e-learning platforms. Thus, we aim to present a list of potential technology options for the recognition, translation and presentation of SL (and potential problems) through the analysis of assistive technologies, methods and techniques, and ultimately to contribute for the development of the state of the art and ensure digital inclusion of the deaf people in e-learning platforms. The analysis show that some interesting technology solutions are under research and development to be available for digital platforms in general, but yet some critical challenges must solved and an effective integration of these technologies in e-learning platforms in particular is still missing

Elsevier - Publisher Connector

Crossref

Repositório Aberto da Universidade Aberta

A Pervasive Middleware for Activity Recognition with Smartphones

Author: Vaka Prakash Reddy
Publication venue
Publication date
Field of study

Title from PDF of title page, viewed on August 28, 2015Thesis advisor: Yugyung LeeVitaIncludes bibliographic references (pages 61-67)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2015Activity Recognition (AR) is an important research topic in pervasive computing. With the rapid increase in the use of pervasive devices, huge sensor data is generated from diverse devices on a daily basis. Analysis of the sensor data is a significant area of research for AR. There are several devices and techniques available for AR, but the increasing number of sensor devices and data demands new approaches for adaptive, lightweight and accurate AR. We propose a new middleware called the Pervasive Middleware for Activity Recognition (PEMAR) to address these problems. We implemented PEMAR on a Big Data platform incorporating machine-learning techniques to make it adaptive and accurate for the AR of sensor data. The middleware is composed of the following: (1) Filtering and Segmentation to detect different activities; (2) A human centered adaptive approach to create accurate personal models, leveraging on the existing impersonal models; (3) An activity library to serve different mobile applications; and (4) Activity Recognition services to accurately perform AR. We evaluated recognition accuracy of PEMAR using a generated dataset (15 activities, 50 subjects) and USC-Human Activity Dataset (12 activities, 14 subjects) and observed a better accuracy for personal trained AR compared to impersonal trained AR. We tested the applicability and adaptivity of PEMAR by using several motion based applications.Introduction -- Related work -- Middleware for gesture recognition -- Implementation and applications -- Results and evaluation -- Conclusion and future wor

University of Missouri: MOspace