1,281 research outputs found
Using NLP technology in CALL
This paper outlines the research and guiding research principles of the (I)CALL group at Dublin City University, Ireland. Our research activities include the development of (I)CALL systems targeted at a variety of user groups including advanced Romance language learners, intermediate to advanced German learners, primary and secondary school students as well as students with L1 learning disabilities requiring a variety of system types which cater to individual user needs and abilities. Suitable CL/NLP technology is incorporated where appropriate for the learner
Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French
This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English
Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags are used. This highlights two facts: (i) lemmatization helps to reduce lexicon data-sparseness issues for French, (ii) it also makes the parsing process sensitive to correct assignment of POS tags to unknown words
Penalizing unknown wordsâ emissions in hmm pos tagger based on Malay affix morphemes
The challenge in unsupervised Hidden Markov Model (HMM) training for a POS tagger isthat the training depends on an untagged corpus; the only supervised data limiting possible tagging of words is a dictionary. Therefore, training cannot properly map possible tags. The exact morphemes of prefixes, suffixes and circumfixes in the  agglutinative Malay language is examined to assign unknown wordsâ probable tags based on linguistically meaningful affixes using a morpheme-based POS guessing algorithm for tagging. The algorithm has been integrated into Viterbi algorithm which uses HMM trained parameters for tagging new sentences. In the experiment, this tagger is first, uses character-based prediction to handle unknown words; next, uses morpheme-based POS guessing algorithm; lastly, combination of the first and second.Keywords: Malay POS tagger; morpheme-based; HMM
A New Form of Humor? Mapping Constraint-Based Computational Morphologies to a Finite-State Representation
MorphoLogicâs Humor morphological analyzer engine has been used for the development of several high-quality computational
morphologies, among them ones for complex agglutinative languages. However, Humorâs closed source licensing scheme has been
an obstacle to making these resources widely available. Moreover, there are other limitations of the rule-based Humor engine: lack of
support for morphological guessing and for the integration of frequency information or other weighting of the models. These problems
were solved by converting the databases to a finite-state representation that allows for morphological guessing and the addition of
weights. Moreover, it has open-source implementations
Classification of semantic relations in different syntactic structures in medical text using the MeSH hierarchy
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (leaf 38).Two different classification algorithms are evaluated in recognizing semantic relationships of different syntactic compounds. The compounds, which include noun- noun, adjective-noun, noun-adjective, noun-verb, and verb-noun, were extracted from a set of doctors' notes using a part of speech tagger and a parser. Each compound was labeled with a semantic relationship, and each word in the compound was mapped to its corresponding entry in the MeSH hierarchy. MeSH includes only medical terminology so it was extended to include everyday, non-medical terms. The two classification algorithms, neural networks and a classification tree, were trained and tested on the data set for each type of syntactic compound. Models representing different levels of MeSH were generated and fed into the neural networks. Both algorithms performed better than random guessing, and the classification tree performed better than the neural networks in predicting the semantic relationship between phrases from their syntactic structure.by Neha Bhooshan.M.Eng
Recommended from our members
Review of doctoral research in second-language teaching and learning in England (2006)
A big data approach towards sarcasm detection in Russian
We present a set of deterministic algorithms for Russian inflection and
automated text synthesis. These algorithms are implemented in a publicly
available web-service www.passare.ru. This service provides functions for
inflection of single words, word matching and synthesis of grammatically
correct Russian text. Selected code and datasets are available at
https://github.com/passare-ru/PassareFunctions/ Performance of the inflectional
functions has been tested against the annotated corpus of Russian language
OpenCorpora, compared with that of other solutions, and used for estimating the
morphological variability and complexity of different parts of speech in
Russian.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0255
The more similar the better? : factors in learning cognates, false cognates and non-cognate words
In this study we explored factors that determine the knowledge of L2 words with orthographic neighbours in L1 (cognates and false cognates). We asked 150 Polish learners of English to translate 105 English non-cognate words, cognates, and false-cognates into Polish, and to assess the confidence of each translation. Confidence ratings allows us to employ a novel analytic procedure which disentangles knowing cognates and false cognates from strategic guessing. Mixed-effects logistic regression models revealed that cognates were known better, whereas false cognates were known worse, relative to non-cognate controls. The advantage of knowing cognates, but not false cognates, was modulated by the degree of similarity to their L1 equivalents. The knowledge of cognates and false cognates was not affected by the frequency of their formal equivalent in L1. Based on these findings we conclude how cross-linguistic formal similarity affects L2 word learnability, proposing a mechanism by which cognates and false cognates are acquired
- âŠ