23 research outputs found
Robust Reinforcement Learning-based Autonomous Driving Agent for Simulation and Real World
Deep Reinforcement Learning (DRL) has been successfully used to solve
different challenges, e.g. complex board and computer games, recently. However,
solving real-world robotics tasks with DRL seems to be a more difficult
challenge. The desired approach would be to train the agent in a simulator and
transfer it to the real world. Still, models trained in a simulator tend to
perform poorly in real-world environments due to the differences. In this
paper, we present a DRL-based algorithm that is capable of performing
autonomous robot control using Deep Q-Networks (DQN). In our approach, the
agent is trained in a simulated environment and it is able to navigate both in
a simulated and real-world environment. The method is evaluated in the
Duckietown environment, where the agent has to follow the lane based on a
monocular camera input. The trained agent is able to run on limited hardware
resources and its performance is comparable to state-of-the-art approaches.Comment: \c{opyright} 2020 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Grapheme-to-Phoneme Conversion with Convolutional Neural Networks
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections, furthermore, a model, which utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate
End-to-end Convolutional Neural Networks for Intent Detection
Convolutional Neural Networks (CNNs) have been applied to various machine learn-ing tasks, such as computer vision, speech technologies and machine translation. One of the main advantages of CNNs is the representation learning capability from high-dimensional data. End-to-end CNN models have been massively explored in computer vision domain, and this approach has also been attempted in other domains as well. In this paper, a novel end-to-end CNN architecture with residual connections is presented for intent detection, which is one of the main goals for building a spoken language understanding (SLU) system. Experiments on two datasets (ATIS and Snips) were carried out. The results demonstrate that the proposed model outperforms previous solutions
End-to-end convolutional neural networks for intent detection
Convolutional Neural Networks (CNNs) have been applied to various machine learning tasks, such as computer vision, speech technologies and machine translation. One of the main advantages of CNNs is the representation learning capability from highdimensional data. End-to-end CNN models have been massively explored in computer vision domain and this approach has also been attempted in other domains as well. In this paper, a novel end-to-end CNN architecture with residual connections is presented for intent detection, which is one of the main goals for building a spoken language understanding (SLU) system. Experiments on two datasets (ATIS and Snips) were carried out. The results demonstrate that the proposed model outperforms previous solutions