22,906 research outputs found
External Lexical Information for Multilingual Part-of-Speech Tagging
Morphosyntactic lexicons and word vector representations have both proven
useful for improving the accuracy of statistical part-of-speech taggers. Here
we compare the performances of four systems on datasets covering 16 languages,
two of these systems being feature-based (MEMMs and CRFs) and two of them being
neural-based (bi-LSTMs). We show that, on average, all four approaches perform
similarly and reach state-of-the-art results. Yet better performances are
obtained with our feature-based models on lexically richer datasets (e.g. for
morphologically rich languages), whereas neural-based results are higher on
datasets with less lexical variability (e.g. for English). These conclusions
hold in particular for the MEMM models relying on our system MElt, which
benefited from newly designed features. This shows that, under certain
conditions, feature-based approaches enriched with morphosyntactic lexicons are
competitive with respect to neural methods
Deep learning for speech to text transcription for the portuguese language
Automatic speech recognition (ASR) is the process of transcribing audio recordings into text, i.e. to
transform speech into the respective sequence of words. This process is also commonly known as speechto-
text. Machine learning (ML), the ability of machines to learn from examples, is one of the most relevant
areas of artificial intelligence in today’s world. Deep learning is a subset of ML which makes use of Deep
Neural Networks, a particular type of Artificial Neural Networks (ANNs), which are intended to mimic
human neurons, that possess a large number of layers.
This dissertation reviews the state-of-the-art on automatic speech recognition throughout time, from early
systems which used Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs) to the most
up-to-date end-to-end (E2E) deep neural models. Considering the context of the present work, some deep
learning algorithms used in state-of-the-art approaches are explained in additional detail.
The current work aims to develop an ASR system for the European Portuguese language using deep
learning. This is achieved by implementing a pipeline composed of stages responsible for data acquisition,
data analysis, data pre-processing, model creation and evaluation of results.
With the NVIDIA NeMo framework was possible to implement the QuartzNet15x5 architecture based on 1D
time-channel separable convolutions. Following a data-centric methodology, the model developed yielded
state-of-the-art Word Error Rate (WER) results of WER = 0.0503; Sumário:
Aprendizagem profunda para transcrição de fala
para texto para a Língua Portuguesa -
O reconhecimento automático de fala (ASR) é o processo de transcrever gravações de áudio em texto, i.e.,
transformar a fala na respectiva sequência de palavras. Esse processo também é comumente conhecido
como speech-to-text. A aprendizagem de máquina (ML), a capacidade das máquinas de aprenderem através
de exemplos, é um dos campos mais relevantes da inteligência artificial no mundo atual. Deep learning é um
subconjunto de ML que faz uso de Redes Neurais Profundas, um tipo particular de Redes Neurais Artificiais
(ANNs), que se destinam a imitar neurónios humanos, que possuem um grande número de camadas
Esta dissertação faz uma revisão ao estado da arte do reconhecimento automático de fala ao longo do
tempo, desde os primeiros sistemas que usavam Hidden Markov Models (HMMs) e Gaussian Mixture
Models (GMMs até sistemas end-to-end (E2E) mais recentes que usam modelos neuronais profundos.
Considerando o contexto do presente trabalho, alguns algoritmos de aprendizagem profunda usados em
abordagens de ponta são explicados mais detalhadamente.
O presente trabalho tem como objetivo desenvolver um sistema ASR para a língua portuguesa europeia
utilizando deep learning. Isso é conseguido por meio da implementação de um pipeline composto por etapas
responsáveis pela aquisição de dados, análise dos dados, pré-processamento dos dados, criação do modelo
e avaliação dos resultados.
Com o framework NVIDIA NeMo foi possível implementar a arquitetura QuartzNet15x5 baseada em convoluções
1D separáveis por canal de tempo. Seguindo uma metodologia centrada em dados, o modelo
desenvolvido produziu resultados de taxa de erro de palavra (WER) semelhantes aos de estado da arte de
WER = 0.0503
Polyglot: Distributed Word Representations for Multilingual NLP
Distributed word representations (word embeddings) have recently contributed
to competitive performance in language modeling and several NLP tasks. In this
work, we train word embeddings for more than 100 languages using their
corresponding Wikipedias. We quantitatively demonstrate the utility of our word
embeddings by using them as the sole features for training a part of speech
tagger for a subset of these languages. We find their performance to be
competitive with near state-of-art methods in English, Danish and Swedish.
Moreover, we investigate the semantic features captured by these embeddings
through the proximity of word groupings. We will release these embeddings
publicly to help researchers in the development and enhancement of multilingual
applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational
Natural Language Learning CoNLL'201
A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging
In this paper, we propose a new approach to construct a system of
transformation rules for the Part-of-Speech (POS) tagging task. Our approach is
based on an incremental knowledge acquisition method where rules are stored in
an exception structure and new rules are only added to correct the errors of
existing rules; thus allowing systematic control of the interaction between the
rules. Experimental results on 13 languages show that our approach is fast in
terms of training time and tagging speed. Furthermore, our approach obtains
very competitive accuracy in comparison to state-of-the-art POS and
morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the
European Journal on Artificial Intelligence. Version 3: Resubmitted after
major revisions. Version 4: Resubmitted after minor revisions. Version 5: to
appear in AI Communications (accepted for publication on 3/12/2015
- …