682 research outputs found

    Development of a Modern Greek Broadcast-News Corpus and Speech Recognition System

    Get PDF
    Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. 380-383

    The interaction of knowledge sources in word sense disambiguation

    Get PDF
    Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems

    Feature decay algorithms for fast deployment of accurate statistical machine translation systems

    Get PDF
    We use feature decay algorithms (FDA) for fast deployment of accurate statistical machine translation systems taking only about half a day for each translation direction. We develop parallel FDA for solving computational scalability problems caused by the abundance of training data for SMT models and LM models and still achieve SMT performance that is on par with using all of the training data or better. Parallel FDA runs separate FDA models on randomized subsets of the training data and combines the instance selections later. Parallel FDA can also be used for selecting the LM corpus based on the training set selected by parallel FDA. The high quality of the selected training data allows us to obtain very accurate translation outputs close to the top performing SMT systems. The relevancy of the selected LM corpus can reach up to 86% reduction in the number of OOV tokens and up to 74% reduction in the perplexity. We perform SMT experiments in all language pairs in the WMT13 translation task and obtain SMT performance close to the top systems using significantly less resources for training and development

    Keyword Transformer: A Self-Attention Model for Keyword Spotting

    Full text link
    The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.Comment: Proceedings of INTERSPEEC

    THE LATERALIZATION OF THE WORD FREQUENCY EFFECT

    Get PDF
    Two lexical decision experiments were performed to investigate the lateralization of the word frequency effect and the interaction with word class. Positive results would support proposals of multiple lexicons in lexical retrieval models. The first was a small-n experiment varying visual angle of presentation, signal to noise ratios, visual field and word class (noun/verb) while using high frequency word and nonword items. The purpose of this experiment was to document the sensitivity of observers to material presented varying distances from a central fixation point while replicating previous research results of a word class by visual field interaction. The second experiment was a large-n design varying visual field, word class (noun/verb/mixed), and word frequency, looking for greater differences in lateralization of word class by frequency. Results of the first experiment show no visual field by word class interaction, failing to replicate previous research. Speed of response slowed linearly with increasing visual angle of presentation for both visual fields, despite moving away from a region of macular overlap to parafoveal presentation. In experiment two, a strong frequency effect was found, along with an interaction with word class and visual field. This interaction, however, was due to aberrant performance for medium frequency nouns. Considered along with the inconclusive literature on lateral effects of word class and word frequency, and recent failures to replicate Bradley\u27s (1978) model of open- and closed-class lexicons, the research focus must shift toward a closer examination of the relationship between two areas: (1) the method of generation of a lexical access code (which may or may not involve the RH) and, (2) the relationship between a rapidly calculated direct access code and an ordered lexicon

    Hierarchy-based Partition Models: Using Classification Hierarchies to

    Get PDF
    We propose a novel machine learning technique that can be used to estimate probability distributions for categorical random variables that are equipped with a natural set of classification hierarchies, such as words equipped with word class hierarchies, wordnet hierarchies, and suffix and affix hierarchies. We evaluate the estimator on bigram language modelling with a hierarchy based on word suffixes, using English, Danish, and Finnish data from the Europarl corpus with training sets of up to 1–1.5 million words. The results show that the proposed estimator outperforms modified Kneser-Ney smoothing in terms of perplexity on unseen data. This suggests that important information is hidden in the classification hierarchies that we routinely use in computational linguistics, but that we are unable to utilize this information fully because our current statistical techniques are either based on simple counting models or designed for sample spaces with a distance metric, rather than sample spaces with a non-metric topology given by a classification hierarchy. Keywords: machine learning; categorical variables; classification hierarchies; language modelling; statistical estimatio
    corecore