Search CORE

2,366 research outputs found

FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks

Author: Choi Sungwook
Hwang Kyuyeon
Lee Minjae
Park Jinhwan
Shin Sungho
Sung Wonyong
Publication venue
Publication date: 30/09/2016
Field of study

In this paper, a neural network based real-time speech recognition (SR) system is developed using an FPGA for very low-power operation. The implemented system employs two recurrent neural networks (RNNs); one is a speech-to-character RNN for acoustic modeling (AM) and the other is for character-level language modeling (LM). The system also employs a statistical word-level LM to improve the recognition accuracy. The results of the AM, the character-level LM, and the word-level LM are combined using a fairly simple N-best search algorithm instead of the hidden Markov model (HMM) based network. The RNNs are implemented using massively parallel processing elements (PEs) for low latency and high throughput. The weights are quantized to 6 bits to store all of them in the on-chip memory of an FPGA. The proposed algorithm is implemented on a Xilinx XC7Z045, and the system can operate much faster than real-time.Comment: Accepted to SiPS 201

arXiv.org e-Print Archive

Crossref

DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition in Virtual Assistants

Author: Acero Alex
Barnes Megan
Li Lin
Moniz Joel Ruben Antony
Muralidharan Deepak
Pan Jingjing
Pulman Stephen
Williams Jason
Zhang Weicheng
Publication venue
Publication date: 14/08/2021
Field of study

Named entity recognition (NER) is usually developed and tested on text from well-written sources. However, in intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error. In applications, entity labels may change frequently, and non-textual properties like topicality or popularity may be needed to choose among alternatives. We describe a NER system intended to address these problems. We test and train this system on a proprietary user-derived dataset. We compare with a baseline text-only NER system; the baseline enhanced with external gazetteers; and the baseline enhanced with the search and indirect labelling techniques we describe below. The final configuration gives around 6% reduction in NER error rate. We also show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.Comment: Interspeech 202

arXiv.org e-Print Archive

Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation

Author: A Dempster
C Garcia
Christophe Garcia
D Chen
Franck Mamalet
H Li
J Lim
J Weinman
K Jung
Khaoula Elagouni
L Bahl
M Li
Pascale Sébillot
Q Ye
R Casey
R Yager
S Lucas
T Sato
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2014
Field of study

International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

HAL-Rennes 1