13,656 research outputs found
Robust Grammatical Analysis for Spoken Dialogue Systems
We argue that grammatical analysis is a viable alternative to concept
spotting for processing spoken input in a practical spoken dialogue system. We
discuss the structure of the grammar, and a model for robust parsing which
combines linguistic sources of information and statistical sources of
information. We discuss test results suggesting that grammatical processing
allows fast and accurate processing of spoken input.Comment: Accepted for JNL
Word frequency affects naming latency in Dutch when age of acquisition is controlled.
Morrison and Ellis (1995) claim that most evidence of frequency effects in word recognition is not genuine but an artefact of the age at which the words have been acquired. The finding that age of acquisition (AOA) has a reliable independent effect on word naming is replicated for the Dutch language. However, it is also shown that the effect of word frequency remains reliable with AOA controlled. A possible interpretation is that the English studies have been based on retrospective student ratings, whereas in the present study a more on-line measure of AOA was used
Recommended from our members
On the Utility of Conjoint and Compositional Frames and Utterance
This paper reports the results of a series of connectionist simulations aimed at establishing the value of different types of contexts as predictors of the grammatical categories of words. A comparison is made between âcompositionalâ frames (Monaghan & Christiansen, 2004), and non-compositional or âconjointâ frames (Mintz, 2003). Attention is given to the role of utterance boundaries both as a category to be predicted and as a predictor. The role of developmental constraints is investigated by examining the effect of restricting the analysis to utterance-final frames. In line with results reported by Monaghan and Christiansen compositional frames are better predictors than conjoint frames, though the latter provide a small performance improvement when combined with compositional frames. Utterance boundaries are shown to be detrimental to performance when included as an item to be predicted while improving performance when included as a predictor. The utility of utterance boundaries is further supported by the finding that when the analysis is restricted to utterance-final frames (which are likely to be a particularly important source of information early in development) frames including utterance boundaries are far better predictors than lexical frames
Recommended from our members
Lexical stress constrains English-learning infants' segmentation in a non-native language.
Infants' ability to segment words in fluent speech is affected by their language experience. In this study we investigated the conditions under which infants can segment words in a non-native language. Using the Head-turn Preference Procedure, we found that monolingual English-learning 8-month-olds can segment bisyllabic words in Spanish (trochees and iambs) but not French (iambs). Our results are incompatible with accounts that rely on distributional learning, language rhythm similarity, or target word prosodic shape alone. Instead, we show that monolingual English-learning infants are able to segment words in a non-native language as long as words have stress, as is the case in English. More specifically, we show that even in a rhythmically different non-native language, English-learning infants can find words by detecting stressed syllables and treating them as word onsets or offsets
Word contexts enhance the neural representation of individual letters in early visual cortex
Visual context facilitates perception, but how this is neurally implemented remains unclear. One example of contextual facilitation is found in reading, where letters are more easily identified when embedded in a word. Bottom-up models explain this word advantage as a post-perceptual decision bias, while top-down models propose that word contexts enhance perception itself. Here, we arbitrate between these accounts by presenting words and nonwords and probing the representational fidelity of individual letters using functional magnetic resonance imaging. In line with top-down models, we find that word contexts enhance letter representations in early visual cortex. Moreover, we observe increased coupling between letter information in visual cortex and brain activity in key areas of the reading network, suggesting these areas may be the source of the enhancement. Our results provide evidence for top-down representational enhancement in word recognition, demonstrating that word contexts can modulate perceptual processing already at the earliest visual regions
Reducing speech recognition time and memory use by means of compound (de-)composition
This paper tackles the problem of Out Of Vocabulary words in Automatic Speech Transcription applications for a compound language (Dutch). A seemingly attractive way to reduce the amount of OOV words in compound languages is to extend the AST system with a compound (de-)composition module. However, thus far, successful implementations of this approach are rather scarce.
We developed a novel data driven compound (de-)composition module and tested it in two different AST experiments. For equal lexicon sizes, we see that our compound processor lowers the OOV rate. Moreover we are able to transform that gain in OOV rate into a reduction of the Word Error Rate of the transcription system. Using our approach we built a system with an 84K lexicon that performs as accurately as a baseline system with a 168K lexicon, but our system is 5-6% faster and requires about 50% less storage for the lexical component, even though this component is encoded in an optimal way (prefix-suffix tree compression)
Dutch hypernym detection : does decompounding help?
This research presents experiments carried out to improve the precision and recall of Dutch hypernym detection. To do so, we applied a data-driven semantic relation finder that starts from a list of automatically extracted domain-specific terms from technical corpora, and generates a list of hypernym relations between these terms. As Dutch technical terms often consist of compounds written in one orthographic unit, we investigated the impact of a decompounding module on the performance of the hypernym detection system.
In addition, we also improved the precision of the system by designing filters taking into account statistical and linguistic information.
The experimental results show that both the precision and recall of the hypernym detection system improved, and that the decompounding module is especially effective for hypernym detection in Dutch
A plea for more interactions between psycholinguistics and natural language processing research
A new development in psycholinguistics is the use of regression analyses on tens of thousands of words, known as the megastudy approach. This development has led to the collection of processing times and subjective ratings (of age of acquisition, concreteness, valence, and arousal) for most of the existing words in English and Dutch. In addition, a crowdsourcing study in the Dutch language has resulted in information about how well 52,000 lemmas are known. This information is likely to be of interest to NLP researchers and computational linguists. At the same time, large-scale measures of word characteristics developed in the latter traditions are likely to be pivotal in bringing the megastudy approach to the next level
Polysemy and brevity versus frequency in language
The pioneering research of G. K. Zipf on the relationship between word
frequency and other word features led to the formulation of various linguistic
laws. The most popular is Zipf's law for word frequencies. Here we focus on two
laws that have been studied less intensively: the meaning-frequency law, i.e.
the tendency of more frequent words to be more polysemous, and the law of
abbreviation, i.e. the tendency of more frequent words to be shorter. In a
previous work, we tested the robustness of these Zipfian laws for English,
roughly measuring word length in number of characters and distinguishing adult
from child speech. In the present article, we extend our study to other
languages (Dutch and Spanish) and introduce two additional measures of length:
syllabic length and phonemic length. Our correlation analysis indicates that
both the meaning-frequency law and the law of abbreviation hold overall in all
the analyzed languages
- âŠ