5,297 research outputs found
FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks
In this paper, a neural network based real-time speech recognition (SR)
system is developed using an FPGA for very low-power operation. The implemented
system employs two recurrent neural networks (RNNs); one is a
speech-to-character RNN for acoustic modeling (AM) and the other is for
character-level language modeling (LM). The system also employs a statistical
word-level LM to improve the recognition accuracy. The results of the AM, the
character-level LM, and the word-level LM are combined using a fairly simple
N-best search algorithm instead of the hidden Markov model (HMM) based network.
The RNNs are implemented using massively parallel processing elements (PEs) for
low latency and high throughput. The weights are quantized to 6 bits to store
all of them in the on-chip memory of an FPGA. The proposed algorithm is
implemented on a Xilinx XC7Z045, and the system can operate much faster than
real-time.Comment: Accepted to SiPS 201
Language Modeling with Power Low Rank Ensembles
We present power low rank ensembles (PLRE), a flexible framework for n-gram
language modeling where ensembles of low rank matrices and tensors are used to
obtain smoothed probability estimates of words in context. Our method can be
understood as a generalization of n-gram modeling to non-integer n, and
includes standard techniques such as absolute discounting and Kneser-Ney
smoothing as special cases. PLRE training is efficient and our approach
outperforms state-of-the-art modified Kneser Ney baselines in terms of
perplexity on large corpora as well as on BLEU score in a downstream machine
translation task
Compositional Morphology for Word Representations and Language Modelling
This paper presents a scalable method for integrating compositional
morphological representations into a vector-based probabilistic language model.
Our approach is evaluated in the context of log-bilinear language models,
rendered suitably efficient for implementation inside a machine translation
decoder by factoring the vocabulary. We perform both intrinsic and extrinsic
evaluations, presenting results on a range of languages which demonstrate that
our model learns morphological representations that both perform well on word
similarity tasks and lead to substantial reductions in perplexity. When used
for translation into morphologically rich languages with large vocabularies,
our models obtain improvements of up to 1.2 BLEU points relative to a baseline
system using back-off n-gram models.Comment: Proceedings of the 31st International Conference on Machine Learning
(ICML
Syntactic phrase-based statistical machine translation
Phrase-based statistical machine translation (PBSMT) systems represent the dominant approach in MT today. However, unlike systems in other paradigms, it has proven difficult to date to incorporate syntactic knowledge in order to improve translation quality. This paper improves on recent research which uses 'syntactified' target language phrases, by incorporating supertags as constraints to better resolve parse tree fragments. In addition, we do not impose any sentence-length limit, and using a log-linear decoder, we outperform a state-of-the-art PBSMT system by over 1.3 BLEU points (or 3.51% relative) on the NIST 2003 Arabic-English test corpus
Modeling musicological information as trigrams in a system for simultaneous chord and local key extraction
In this paper, we discuss the introduction of a trigram musicological model in a simultaneous chord and local key extraction system. By enlarging the context of the musicological model, we hoped to achieve a higher accuracy that could justify the associated higher complexity and computational load of the search for the optimal solution. Experiments on multiple data sets have demonstrated that the trigram model has indeed a larger predictive power (a lower perplexity). This raised predictive power resulted in an improvement in the key extraction capabilities, but no improvement in chord extraction when compared to a system with a bigram musicological model
ELICA: An Automated Tool for Dynamic Extraction of Requirements Relevant Information
Requirements elicitation requires extensive knowledge and deep understanding
of the problem domain where the final system will be situated. However, in many
software development projects, analysts are required to elicit the requirements
from an unfamiliar domain, which often causes communication barriers between
analysts and stakeholders. In this paper, we propose a requirements ELICitation
Aid tool (ELICA) to help analysts better understand the target application
domain by dynamic extraction and labeling of requirements-relevant knowledge.
To extract the relevant terms, we leverage the flexibility and power of
Weighted Finite State Transducers (WFSTs) in dynamic modeling of natural
language processing tasks. In addition to the information conveyed through
text, ELICA captures and processes non-linguistic information about the
intention of speakers such as their confidence level, analytical tone, and
emotions. The extracted information is made available to the analysts as a set
of labeled snippets with highlighted relevant terms which can also be exported
as an artifact of the Requirements Engineering (RE) process. The application
and usefulness of ELICA are demonstrated through a case study. This study shows
how pre-existing relevant information about the application domain and the
information captured during an elicitation meeting, such as the conversation
and stakeholders' intentions, can be captured and used to support analysts
achieving their tasks.Comment: 2018 IEEE 26th International Requirements Engineering Conference
Workshop
Context-Dependent Acoustic Modeling without Explicit Phone Clustering
Phoneme-based acoustic modeling of large vocabulary automatic speech
recognition takes advantage of phoneme context. The large number of
context-dependent (CD) phonemes and their highly varying statistics require
tying or smoothing to enable robust training. Usually, Classification and
Regression Trees are used for phonetic clustering, which is standard in Hidden
Markov Model (HMM)-based systems. However, this solution introduces a secondary
training objective and does not allow for end-to-end training. In this work, we
address a direct phonetic context modeling for the hybrid Deep Neural Network
(DNN)/HMM, that does not build on any phone clustering algorithm for the
determination of the HMM state inventory. By performing different
decompositions of the joint probability of the center phoneme state and its
left and right contexts, we obtain a factorized network consisting of different
components, trained jointly. Moreover, the representation of the phonetic
context for the network relies on phoneme embeddings. The recognition accuracy
of our proposed models on the Switchboard task is comparable and outperforms
slightly the hybrid model using the standard state-tying decision trees.Comment: Submitted to Interspeech 202
- âŠ