1,281 research outputs found

    Using NLP technology in CALL

    Get PDF
    This paper outlines the research and guiding research principles of the (I)CALL group at Dublin City University, Ireland. Our research activities include the development of (I)CALL systems targeted at a variety of user groups including advanced Romance language learners, intermediate to advanced German learners, primary and secondary school students as well as students with L1 learning disabilities requiring a variety of system types which cater to individual user needs and abilities. Suitable CL/NLP technology is incorporated where appropriate for the learner

    Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French

    Get PDF
    This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags are used. This highlights two facts: (i) lemmatization helps to reduce lexicon data-sparseness issues for French, (ii) it also makes the parsing process sensitive to correct assignment of POS tags to unknown words

    Penalizing unknown words’ emissions in hmm pos tagger based on Malay affix morphemes

    Get PDF
    The challenge in unsupervised Hidden Markov Model (HMM) training for a POS tagger isthat the training depends on an untagged corpus; the only supervised data limiting  possible tagging of words is a dictionary. Therefore, training cannot properly map  possible tags. The exact morphemes of prefixes, suffixes and circumfixes in the   agglutinative Malay language is examined to assign unknown words’ probable tags based on linguistically meaningful affixes using a morpheme-based POS guessing algorithm for tagging. The algorithm has been integrated into Viterbi algorithm which uses HMM trained parameters for tagging new sentences. In the experiment, this tagger is first, uses character-based prediction to handle unknown words; next, uses morpheme-based POS guessing algorithm; lastly, combination of the first and second.Keywords: Malay POS tagger; morpheme-based; HMM

    A New Form of Humor? Mapping Constraint-Based Computational Morphologies to a Finite-State Representation

    Get PDF
    MorphoLogic’s Humor morphological analyzer engine has been used for the development of several high-quality computational morphologies, among them ones for complex agglutinative languages. However, Humor’s closed source licensing scheme has been an obstacle to making these resources widely available. Moreover, there are other limitations of the rule-based Humor engine: lack of support for morphological guessing and for the integration of frequency information or other weighting of the models. These problems were solved by converting the databases to a finite-state representation that allows for morphological guessing and the addition of weights. Moreover, it has open-source implementations

    Classification of semantic relations in different syntactic structures in medical text using the MeSH hierarchy

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (leaf 38).Two different classification algorithms are evaluated in recognizing semantic relationships of different syntactic compounds. The compounds, which include noun- noun, adjective-noun, noun-adjective, noun-verb, and verb-noun, were extracted from a set of doctors' notes using a part of speech tagger and a parser. Each compound was labeled with a semantic relationship, and each word in the compound was mapped to its corresponding entry in the MeSH hierarchy. MeSH includes only medical terminology so it was extended to include everyday, non-medical terms. The two classification algorithms, neural networks and a classification tree, were trained and tested on the data set for each type of syntactic compound. Models representing different levels of MeSH were generated and fed into the neural networks. Both algorithms performed better than random guessing, and the classification tree performed better than the neural networks in predicting the semantic relationship between phrases from their syntactic structure.by Neha Bhooshan.M.Eng

    A big data approach towards sarcasm detection in Russian

    Full text link
    We present a set of deterministic algorithms for Russian inflection and automated text synthesis. These algorithms are implemented in a publicly available web-service www.passare.ru. This service provides functions for inflection of single words, word matching and synthesis of grammatically correct Russian text. Selected code and datasets are available at https://github.com/passare-ru/PassareFunctions/ Performance of the inflectional functions has been tested against the annotated corpus of Russian language OpenCorpora, compared with that of other solutions, and used for estimating the morphological variability and complexity of different parts of speech in Russian.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0255

    The more similar the better? : factors in learning cognates, false cognates and non-cognate words

    Get PDF
    In this study we explored factors that determine the knowledge of L2 words with orthographic neighbours in L1 (cognates and false cognates). We asked 150 Polish learners of English to translate 105 English non-cognate words, cognates, and false-cognates into Polish, and to assess the confidence of each translation. Confidence ratings allows us to employ a novel analytic procedure which disentangles knowing cognates and false cognates from strategic guessing. Mixed-effects logistic regression models revealed that cognates were known better, whereas false cognates were known worse, relative to non-cognate controls. The advantage of knowing cognates, but not false cognates, was modulated by the degree of similarity to their L1 equivalents. The knowledge of cognates and false cognates was not affected by the frequency of their formal equivalent in L1. Based on these findings we conclude how cross-linguistic formal similarity affects L2 word learnability, proposing a mechanism by which cognates and false cognates are acquired
    • 

    corecore