4 research outputs found
Free-hand gesture recognition with 3D-CNNs for in-car infotainment control in real-time
In this contribution we present a novel approach to transform data from time-of-flight (ToF) sensors to be interpretable by Convolutional Neural Networks (CNNs). As ToF data tends to be overly noisy depending on various factors such as illumination, reflection coefficient and distance, the need for a robust algorithmic approach becomes evident. By spanning a three-dimensional grid of fixed size around each point cloud we are able to transform three-dimensional input to become processable by CNNs. This simple and effective neighborhood-preserving methodology demonstrates that CNNs are indeed able to extract the relevant information and learn a set of filters, enabling them to differentiate a complex set of ten different gestures obtained from 20 different individuals and containing 600.000 samples overall. Our 20-fold cross-validation shows the generalization performance of the network, achieving an accuracy of up to 98.5% on validation sets comprising 20.000 data samples. The real-time applicability of our system is demonstrated via an interactive validation on an infotainment system running with up to 40fps on an iPad in the vehicle interior
Prototipo de traductor de lenguaje de señas peruanas básicas usando machine learning
La investigación se enfoca en el desarrollo de un prototipo que permita reconocer una señal
básica (cualquier letra estática del abecedario de lenguaje de señas peruano) capturada por
webcam basándose en la lengua de señas del Perú dentro de un tiempo corto estimado.
El prototipo utiliza la tecnología de Machine Learning usando una Convolutional Neural
Network (CNN), en español llamada red neuronal convolucional de tres capas y Support
Vector Machine (SVM) para la segunda fase; La primera capa de la CNN definida como
“HandSegNet” ubica y aísla a la mano a detectar, la segunda capa definida como “PoseNet”
que usa algoritmos de detección de puntos clave obtenidos de una imagen 2D RGB y la
tercera capa deriva estos puntos base más el punto de vista de la cámara, considerando
iluminación, perspectiva y giro a un plano 3D obtenidos a partir del paper “Learning to
Estimate 3D Hand Pose from Single RGB Images” de Christian Zimmermann & Thomas
Brox investigadores de la universidad de Freiburg en Alemania.
La limitante de la red de Zimmermann se reduce a entregar coordenadas de los dedos de la
mano 3D, para lo cual se desarrolló una etapa que pueda definir las poses del abecedario
peruano usando umbrales de confianza, por ángulos y rizo, a la salida de estas coordenadas
relativas usa la máquina de vectores de soporte (SVM), también la compara con una red
neuronal estándar usando como función de activación la regresión lineal, finalmente escribe
la letra sobre la seña correspondiente definida en el algoritmo.
ACAT es un sistema desarrollado por S. Hawking, que tras su muerte se volvió de código
abierto, este sistema fue nuestra inspiración, buscamos desarrollar un prototipo para ayudar
de la misma forma a las personas con discapacidad auditiva del CEBE Don José de San
Martin, así los niños con discapacidad auditiva (D.A) sean capaces de transmitir algunos
pensamientos clave mediante las letras del abecedario estáticas, usando solo una webcam
común y corriente a comparativa de sistemas ya hechos que usan cámaras complejas y de
alto costo como la Kinect.
A un trabajo futuro se espera una mejora considerable del prototipo y que los niños puedan
comunicarse con sus amigos o familiares de manera libre y feliz como funciona un
traductor actual de idiomas.The thesis focuses on the development of a prototype that allows us to recognize a basic and
static signal of a peruvian signal language captured by webcam within a short estimated time.
The prototype uses Machine Learning as main technology; First phase uses three-layer
Convolutional Neural Network (CNN) and for the second phase uses Support Vector Machine
(SVM).
The first layer of the CNN defined as “HandSegNet” that locates and isolates the hand to be
detected for the webcam, the second layer defined as “PoseNet” that uses key point detection
algorithms obtained from a 2D RGB image and positioning to a 3D plane and the third layer
derives these points base to the camera's point of view, this part represents the image’s depth,
considering lighting, perspective and rotation to a 3D plane obtained from the paper “Learning
to Estimate 3D Hand Pose from Single RGB Images” by Christian Zimmermann & Thomas
Brox researchers from the University of Freiburg in Germany that has free access for use.
The limitation of the Zimmermann network is reduced to delivering coordinates of the 3D
planeo f fingers of the hand, we créate a Class to define the poses of the Peruvian alphabet
using confidence thresholds, by angles and curls, at the exit of these Relative coordinates we
use the support vector machine (SVM), finally write the letter over the corresponding sign
defined in the Class.
ACAT is a system developed by S. Hawking, which after his death became open source
software, the system was our inspiration, we sought to develop the prototype to help people
with hearing disabilities from CEBE Don José de San Martin for children with hearing
impairment (DA) , so they’ll be able to transmit some key thoughts through the static letters of
the alphabet, using only an ordinary webcam to comparison of ready-made systems that use
complex cameras like the Kinect.
In a future job, we consider a considerable improvement of the prototype, so children could
communicate with their friends or family freely and happily as a current language translator
Works with the system working over AWS service