3,795 research outputs found

    A comparative review of dynamic neural networks and hidden Markov model methods for mobile on-device speech recognition

    Get PDF
    The adoption of high-accuracy speech recognition algorithms without an effective evaluation of their impact on the target computational resource is impractical for mobile and embedded systems. In this paper, techniques are adopted to minimise the required computational resource for an effective mobile-based speech recognition system. A Dynamic Multi-Layer Perceptron speech recognition technique, capable of running in real time on a state-of-the-art mobile device, has been introduced. Even though a conventional hidden Markov model when applied to the same dataset slightly outperformed our approach, its processing time is much higher. The Dynamic Multi-layer Perceptron presented here has an accuracy level of 96.94% and runs significantly faster than similar techniques

    A Novel Approach for Speech to Text Recognition System Using Hidden Markov Model

    Get PDF
    Speech recognition is the application of sophisticated algorithms which involve the transforming of the human voice to text. Speech identification is essential as it utilizes by several biometric identification systems and voice-controlled automation systems. Variations in recording equipment, speakers, situations, and environments make speech recognition a tough undertaking. Three major phases comprise speech recognition: speech pre-processing, feature extraction, and speech categorization. This work presents a comprehensive study with the objectives of comprehending, analyzing, and enhancing these models and approaches, such as Hidden Markov Models and Artificial Neural Networks, employed in the voice recognition system for feature extraction and classification

    Vision-based gesture recognition system for human-computer interaction

    Get PDF
    Hand gesture recognition, being a natural way of human computer interaction, is an area of active research in computer vision and machine learning. This is an area with many different possible applications, giving users a simpler and more natural way to communicate with robots/systems interfaces, without the need for extra devices. So, the primary goal of gesture recognition research is to create systems, which can identify specific human gestures and use them to convey information or for device control. This work intends to study and implement a solution, generic enough, able to interpret user commands, composed of a set of dynamic and static gestures, and use those solutions to build an application able to work in a realtime human-computer interaction systems. The proposed solution is composed of two modules controlled by a FSM (Finite State Machine): a real time hand tracking and feature extraction system, supported by a SVM (Support Vector Machine) model for static hand posture classification and a set of HMMs (Hidden Markov Models) for dynamic single stroke hand gesture recognition. The experimental results showed that the system works very reliably, being able to recognize the set of defined commands in real-time. The SVM model for hand posture classification, trained with the selected hand features, achieved an accuracy of 99,2%. The proposed solution as the advantage of being computationally simple to train and use, and at the same time generic enough, allowing its application in any robot/system command interface

    End-to-End Multiview Gesture Recognition for Autonomous Car Parking System

    Get PDF
    The use of hand gestures can be the most intuitive human-machine interaction medium. The early approaches for hand gesture recognition used device-based methods. These methods use mechanical or optical sensors attached to a glove or markers, which hinders the natural human-machine communication. On the other hand, vision-based methods are not restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine. Therefore, vision gesture recognition has been a popular area of research for the past thirty years. Hand gesture recognition finds its application in many areas, particularly the automotive industry where advanced automotive human-machine interface (HMI) designers are using gesture recognition to improve driver and vehicle safety. However, technology advances go beyond active/passive safety and into convenience and comfort. In this context, one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence (CPAMI) at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking. In this thesis, we leverage the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for self-parking system. We propose a 3DCNN gesture model architecture that we train on a publicly available hand gesture database. We apply transfer learning methods to fine-tune the pre-trained gesture model on a custom-made data, which significantly improved the proposed system performance in real world environment. We adapt the architecture of the end-to-end solution to expand the state of the art video classifier from a single image as input (fed by monocular camera) to a multiview 360 feed, offered by a six cameras module. Finally, we optimize the proposed solution to work on a limited resources embedded platform (Nvidia Jetson TX2) that is used by automakers for vehicle-based features, without sacrificing the accuracy robustness and real time functionality of the system

    Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition

    Get PDF
    Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation
    • …
    corecore