Search CORE

1,281 research outputs found

Using NLP technology in CALL

Author: Greene Cara N.
Keogh Katrina A.
Koller Thomas
van Genabith Josef
Wagner Joachim
Ward Monica
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2004
Field of study

This paper outlines the research and guiding research principles of the (I)CALL group at Dublin City University, Ireland. Our research activities include the development of (I)CALL systems targeted at a variety of user groups including advanced Romance language learners, intermediate to advanced German learners, primary and secondary school students as well as students with L1 learning disabilities requiring a variety of system types which cater to individual user needs and abilities. Suitable CL/NLP technology is incorporated where appropriate for the learner

CiteSeerX

DCU Online Research Access Service

Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French

Author: Candito Marie
Cetinoglu Ozlem
Chrupała Grzegorz
Seddah Djamé
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the English Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when automatically assigned lemmas and POS tags are used. This highlights two facts: (i) lemmatization helps to reduce lexicon data-sparseness issues for French, (ii) it also makes the parsing process sensitive to correct assignment of POS tags to unknown words

Irish Universities

DCU Online Research Access Service

Penalizing unknown words’ emissions in hmm pos tagger based on Malay affix morphemes

Author: Aziz M.J.A.
Mohamed H.
Omar N.
Publication venue: 'African Journals Online (AJOL)'
Publication date: 22/01/2018
Field of study

The challenge in unsupervised Hidden Markov Model (HMM) training for a POS tagger isthat the training depends on an untagged corpus; the only supervised data limiting possible tagging of words is a dictionary. Therefore, training cannot properly map possible tags. The exact morphemes of prefixes, suffixes and circumfixes in the agglutinative Malay language is examined to assign unknown words’ probable tags based on linguistically meaningful affixes using a morpheme-based POS guessing algorithm for tagging. The algorithm has been integrated into Viterbi algorithm which uses HMM trained parameters for tagging new sentences. In the experiment, this tagger is first, uses character-based prediction to handle unknown words; next, uses morpheme-based POS guessing algorithm; lastly, combination of the first and second.Keywords: Malay POS tagger; morpheme-based; HMM

AJOL - African Journals Online

A New Form of Humor? Mapping Constraint-Based Computational Morphologies to a Finite-State Representation

Author: Novák Attila
Publication venue: ELRA
Publication date: 01/01/2014
Field of study

MorphoLogic’s Humor morphological analyzer engine has been used for the development of several high-quality computational morphologies, among them ones for complex agglutinative languages. However, Humor’s closed source licensing scheme has been an obstacle to making these resources widely available. Moreover, there are other limitations of the rule-based Humor engine: lack of support for morphological guessing and for the integration of frequency information or other weighting of the models. These problems were solved by converting the databases to a finite-state representation that allows for morphological guessing and the addition of weights. Moreover, it has open-source implementations

Repository of the Academy's Library

Classification of semantic relations in different syntactic structures in medical text using the MeSH hierarchy

Author: Bhooshan Neha
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2005
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (leaf 38).Two different classification algorithms are evaluated in recognizing semantic relationships of different syntactic compounds. The compounds, which include noun- noun, adjective-noun, noun-adjective, noun-verb, and verb-noun, were extracted from a set of doctors' notes using a part of speech tagger and a parser. Each compound was labeled with a semantic relationship, and each word in the compound was mapped to its corresponding entry in the MeSH hierarchy. MeSH includes only medical terminology so it was extended to include everyday, non-medical terms. The two classification algorithms, neural networks and a classification tree, were trained and tested on the data set for each type of syntactic compound. Models representing different levels of MeSH were generated and fed into the neural networks. Both algorithms performed better than random guessing, and the classification tree performed better than the neural networks in predicting the semantic relationship between phrases from their syntactic structure.by Neha Bhooshan.M.Eng

DSpace@MIT

Recommended from our members

Review of doctoral research in second-language teaching and learning in England (2006)

Author: Doughty
Emma Marsden
Ginnis
Grabe
Granger
Hoey
Kolb
Naiman
Oxford
Pring
Suzanne Graham
Thomas
Thomas
Thorndike
van Dijk
Vygotsky
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2009
Field of study

Central Archive at the University of Reading

Crossref

A big data approach towards sarcasm detection in Russian

Author: Gurin A. A.
Sadykov T. M.
Zhukov T. A.
Publication venue
Publication date: 01/06/2023
Field of study

We present a set of deterministic algorithms for Russian inflection and automated text synthesis. These algorithms are implemented in a publicly available web-service www.passare.ru. This service provides functions for inflection of single words, word matching and synthesis of grammatically correct Russian text. Selected code and datasets are available at https://github.com/passare-ru/PassareFunctions/ Performance of the inflectional functions has been tested against the annotated corpus of Russian language OpenCorpora, compared with that of other solutions, and used for estimating the morphological variability and complexity of different parts of speech in Russian.Comment: arXiv admin note: substantial text overlap with arXiv:1706.0255

arXiv.org e-Print Archive

The more similar the better? : factors in learning cognates, false cognates and non-cognate words

Author: Otwinowska Agnieszka
Szewczyk Jakub
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2019
Field of study

In this study we explored factors that determine the knowledge of L2 words with orthographic neighbours in L1 (cognates and false cognates). We asked 150 Polish learners of English to translate 105 English non-cognate words, cognates, and false-cognates into Polish, and to assess the confidence of each translation. Confidence ratings allows us to employ a novel analytic procedure which disentangles knowing cognates and false cognates from strategic guessing. Mixed-effects logistic regression models revealed that cognates were known better, whereas false cognates were known worse, relative to non-cognate controls. The advantage of knowing cognates, but not false cognates, was modulated by the degree of similarity to their L1 equivalents. The knowledge of cognates and false cognates was not affected by the frequency of their formal equivalent in L1. Based on these findings we conclude how cross-linguistic formal similarity affects L2 word learnability, proposing a mechanism by which cognates and false cognates are acquired

Jagiellonian Univeristy Repository