21,857 research outputs found
Decorrelation and shallow semantic patterns for distributional clustering of nouns and verbs
Distributional approximations to lexical semantics are very useful not only in helping the creation of lexical semantic resources (Kilgariff et al., 2004; Snow et al., 2006), but also when directly applied in tasks that can benefit from large-coverage semantic knowledge such as coreference resolution (Poesio et al., 1998; Gasperin and Vieira, 2004; Versley, 2007), word sense disambiguation (Mc- Carthy et al., 2004) or semantical role labeling (Gordon and Swanson, 2007). We present a model that is built from Webbased corpora using both shallow patterns for grammatical and semantic relations and a window-based approach, using singular value decomposition to decorrelate the feature space which is otherwise too heavily influenced by the skewed topic distribution of Web corpora
Distinction Between Inflection and Derivation of Learning Reduplication in Mandarin
Reduplication as a word-formation process in Mandarin, which is one of the most difficult knowledge to comprehend for scholar and student. Theoretically this research offers an approach that is different from what has been made by previous researchers. Using the M.D.S Simatupang free context approach this research contrasts the reduplicative forms of all word classes and shows the relationships between them (AA, AABB, ABAB, ABB) and their basic forms (A, AB), then based on test of categorical word and test of lexical decomposition as proposed by J.W.M Verhaar, this study analyzes and explains reduplication and inflectional reduplication in Mandarin in order to students understand as their meaning vocabularies. As a result, this research examines the derivational and inflectional reduplication in Mandarin all at once can disseminate the use of morphological theory. In addition, this study discusses Mandarin reduplication based on various word classes that are contained as a basis for the relevant form of reduplication. Beginner research results will be presented in this study in order to stimulate more complete writing, it will be better if this research can be disseminated in order to add learning and reading material for future research
A visual M170 effect of morphological complexity
Recent masked priming studies on visual word recognition have suggested that morphological decomposition is performed prelexically, purely on the basis of the orthographic properties of the word form. Given this, one might expect morphological complexity to modulate early visual evoked activity in electromagnetic measures. We investigated the neural bases of morphological decomposition with magnetoencephalography (MEG). In two experiments, we manipulated morphological complexity in single word lexical decision without priming, once using suffixed words and once using prefixed words. We found that morphologically complex forms display larger amplitudes in the M170, the same component that has been implicated for letterstring and face effects in previous MEG studies. Although letterstring effects have been reported to be left-lateral, we found a right-lateral effect of morphological complexity, suggesting that both hemispheres may be involved in early analysis of word forms
Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic
Language modeling for an inflected language
such as Arabic poses new challenges for speech recognition and
machine translation due to its rich morphology. Rich morphology
results in large increases in out-of-vocabulary (OOV) rate and
poor language model parameter estimation in the absence of large
quantities of data. In this study, we present a joint
morphological-lexical language model (JMLLM) that takes
advantage of Arabic morphology. JMLLM combines
morphological segments with the underlying lexical items and
additional available information sources with regards to
morphological segments and lexical items in a single joint model.
Joint representation and modeling of morphological and lexical
items reduces the OOV rate and provides smooth probability
estimates while keeping the predictive power of whole words.
Speech recognition and machine translation experiments in
dialectal-Arabic show improvements over word and morpheme
based trigram language models. We also show that as the
tightness of integration between different information sources
increases, both speech recognition and machine translation
performances improve
A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units
We address the design of a unified multilingual system for handwriting
recognition. Most of multi- lingual systems rests on specialized models that
are trained on a single language and one of them is selected at test time.
While some recognition systems are based on a unified optical model, dealing
with a unified language model remains a major issue, as traditional language
models are generally trained on corpora composed of large word lexicons per
language. Here, we bring a solution by con- sidering language models based on
sub-lexical units, called multigrams. Dealing with multigrams strongly reduces
the lexicon size and thus decreases the language model complexity. This makes
pos- sible the design of an end-to-end unified multilingual recognition system
where both a single optical model and a single language model are trained on
all the languages. We discuss the impact of the language unification on each
model and show that our system reaches state-of-the-art methods perfor- mance
with a strong reduction of the complexity.Comment: preprin
Processing of regular and irregular past tense morphology in highly proficient second language learners of English: a self-paced reading study
Dual-system models suggest that English past tense morphology involves two processing routes: rule application for regular verbs and memory retrieval for irregular verbs (Pinker, 1999). In second language (L2) processing research, Ullman (2001a) suggested that both verb types are retrieved from memory, but more recently Clahsen and Felser (2006) and Ullman (2004) argued that past tense rule application can be automatised with experience by L2 learners. To address this controversy, we tested highly proficient Greek-English learners with naturalistic or classroom L2 exposure compared to native English speakers in a self-paced reading task involving past tense forms embedded in plausible sentences. Our results suggest that, irrespective to the type of exposure, proficient L2 learners of extended L2 exposure apply rule-based processing
A broad-coverage distributed connectionist model of visual word recognition
In this study we describe a distributed connectionist model of morphological processing, covering a realistically sized sample of the English language. The purpose of this model is to explore how effects of discrete, hierarchically structured morphological paradigms, can arise as a result of the statistical sub-regularities in the mapping between
word forms and word meanings. We present a model that learns to produce at its output a realistic semantic representation of a word, on presentation of a distributed representation of its orthography. After training, in three experiments, we compare the outputs of the model with the lexical decision latencies for large sets of English nouns and verbs. We show that the model has developed detailed representations of morphological structure, giving rise to effects analogous to those observed in visual lexical decision experiments. In addition, we show how the association between word form and word meaning also
give rise to recently reported differences between regular and irregular verbs, even in their completely regular present-tense forms. We interpret these results as underlining the key importance for lexical processing of the statistical regularities in the mappings between form and meaning
- …
