682 research outputs found
Development of a Modern Greek Broadcast-News Corpus and Speech Recognition System
Proceedings of the 16th Nordic Conference
of Computational Linguistics NODALIDA-2007.
Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit.
University of Tartu, Tartu, 2007.
ISBN 978-9985-4-0513-0 (online)
ISBN 978-9985-4-0514-7 (CD-ROM)
pp. 380-383
The interaction of knowledge sources in word sense disambiguation
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results.
We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems
Feature decay algorithms for fast deployment of accurate statistical machine translation systems
We use feature decay algorithms (FDA) for fast deployment of accurate statistical machine translation systems taking only about half a day for each translation direction. We develop parallel FDA for solving computational scalability problems caused by the abundance of training data for SMT models and LM models and still achieve SMT performance that is on par with using all of the training data or better. Parallel FDA runs separate FDA models on randomized subsets of the training data and combines the instance selections later. Parallel FDA can also be used for selecting the LM corpus based on the training set selected by parallel FDA. The high quality of the selected training data allows us to obtain very accurate translation outputs close to the top performing SMT systems. The relevancy of the selected LM corpus can reach up to 86% reduction in the number of OOV tokens and up to 74% reduction in the perplexity. We perform SMT experiments in all language pairs in the
WMT13 translation task and obtain SMT performance close to the top systems using significantly less resources for training and development
Keyword Transformer: A Self-Attention Model for Keyword Spotting
The Transformer architecture has been successful across many domains,
including natural language processing, computer vision and speech recognition.
In keyword spotting, self-attention has primarily been used on top of
convolutional or recurrent encoders. We investigate a range of ways to adapt
the Transformer architecture to keyword spotting and introduce the Keyword
Transformer (KWT), a fully self-attentional architecture that exceeds
state-of-the-art performance across multiple tasks without any pre-training or
additional data. Surprisingly, this simple architecture outperforms more
complex models that mix convolutional, recurrent and attentive layers. KWT can
be used as a drop-in replacement for these models, setting two new benchmark
records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on
the 12 and 35-command tasks respectively.Comment: Proceedings of INTERSPEEC
THE LATERALIZATION OF THE WORD FREQUENCY EFFECT
Two lexical decision experiments were performed to investigate the lateralization of the word frequency effect and the interaction with word class. Positive results would support proposals of multiple lexicons in lexical retrieval models. The first was a small-n experiment varying visual angle of presentation, signal to noise ratios, visual field and word class (noun/verb) while using high frequency word and nonword items. The purpose of this experiment was to document the sensitivity of observers to material presented varying distances from a central fixation point while replicating previous research results of a word class by visual field interaction. The second experiment was a large-n design varying visual field, word class (noun/verb/mixed), and word frequency, looking for greater differences in lateralization of word class by frequency. Results of the first experiment show no visual field by word class interaction, failing to replicate previous research. Speed of response slowed linearly with increasing visual angle of presentation for both visual fields, despite moving away from a region of macular overlap to parafoveal presentation. In experiment two, a strong frequency effect was found, along with an interaction with word class and visual field. This interaction, however, was due to aberrant performance for medium frequency nouns. Considered along with the inconclusive literature on lateral effects of word class and word frequency, and recent failures to replicate Bradley\u27s (1978) model of open- and closed-class lexicons, the research focus must shift toward a closer examination of the relationship between two areas: (1) the method of generation of a lexical access code (which may or may not involve the RH) and, (2) the relationship between a rapidly calculated direct access code and an ordered lexicon
Hierarchy-based Partition Models: Using Classification Hierarchies to
We propose a novel machine learning
technique that can be used to estimate
probability distributions for categorical
random variables that are equipped with
a natural set of classification hierarchies,
such as words equipped with word class
hierarchies, wordnet hierarchies, and suffix
and affix hierarchies. We evaluate the
estimator on bigram language modelling
with a hierarchy based on word suffixes,
using English, Danish, and Finnish data
from the Europarl corpus with training sets
of up to 1–1.5 million words. The results
show that the proposed estimator outperforms
modified Kneser-Ney smoothing in
terms of perplexity on unseen data. This
suggests that important information is hidden
in the classification hierarchies that we
routinely use in computational linguistics,
but that we are unable to utilize this information
fully because our current statistical
techniques are either based on simple
counting models or designed for sample
spaces with a distance metric, rather than
sample spaces with a non-metric topology
given by a classification hierarchy.
Keywords: machine learning; categorical
variables; classification hierarchies; language
modelling; statistical estimatio
- …