10,011 research outputs found
Morphological strategies training: The effectiveness and feasibility of morphological strategies training for students of English as a foreign language with and without spelling difficulties
The aim of this study was primarily to investigate the effects of morphological strategies training on students with and without spelling difficulties in English as a foreign language (EFL), but also to assess the feasibility of morphological strategies training in a classroom context. The intervention was piloted in the sixth grade of a Greek primary school: 23 Greek-speaking students, aged 11-12, were assigned to the treatment group receiving explicit teaching on inflectional and derivational morphemic patterns of English words. The control group, composed of 25 Greek-speaking students of the same age, attending a different classroom of the same school, was taught English spelling in a conventional (visual-memory based) way. Both quantitative and qualitative methods were employed to gain insights: a pre- and post-test, an observation schedule, a student questionnaire and a teacher interview. The pre- and post-test results indicated that the metamorphological training yielded specific effects on targeted morpheme patterns. The same results were obtained from a sub-group of nine poor spellers in the treatment group, compared to a sub-group of six poor spellers in the control one. The observation data revealed that the metamorphological training promoted students' active participation and the questionnaire data indicated that students got satisfaction from their training. Finally, interview data highlighted that teachers considered the intervention as a feasible way of improving students' morphological processing skills in spelling
Recommended from our members
Augmenting Naive Bayes Classifiers with Statistical Language Models
We augment naive Bayes models with statistical n-gram language models to address short- comings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier which allows for a local Markov dependence among observations; a model we re- fer to as the Chain Augmented Naive Bayes (CAN) Bayes classifier. CAN models have two advantages over standard naive Bayes classifiers. First, they relax some of the indepen- dence assumptions of naive Bayes—allowing a local Markov chain dependence in the observed variables—while still permitting efficient inference and learning. Second, they permit straight- forward application of sophisticated smoothing techniques from statistical language modeling, which allows one to obtain better parameter estimates than the standard Laplace smoothing used in naive Bayes classification. In this paper, we introduce CAN models and apply them to various text classification problems. To demonstrate the language independent and task independent nature of these classifiers, we present experimental results on several text clas- sification problems—authorship attribution, text genre classification, and topic detection—in several languages—Greek, English, Japanese and Chinese. We then systematically study the key factors in the CAN model that can influence the classification performance, and analyze the strengths and weaknesses of the model
Survey of Arabic Checker Techniques
It is known that the importance of spell checking, which increases with the expanding of technologies, using the Internet and the local dialects, in addition to non-awareness of linguistic language. So, this importance increases with the Arabic language, which has many complexities and specificities that differ from other languages. This paper explains these specificities and presents the existing works based on techniques categories that are used, as well as explores these techniques. Besides, it gives directions for future work
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
Argumentation mining (AM) requires the identification of complex discourse
structures and has lately been applied with success monolingually. In this
work, we show that the existing resources are, however, not adequate for
assessing cross-lingual AM, due to their heterogeneity or lack of complexity.
We therefore create suitable parallel corpora by (human and machine)
translating a popular AM dataset consisting of persuasive student essays into
German, French, Spanish, and Chinese. We then compare (i) annotation projection
and (ii) bilingual word embeddings based direct transfer strategies for
cross-lingual AM, finding that the former performs considerably better and
almost eliminates the loss from cross-lingual transfer. Moreover, we find that
annotation projection works equally well when using either costly human or
cheap machine translations. Our code and data are available at
\url{http://github.com/UKPLab/coling2018-xling_argument_mining}.Comment: Accepted at Coling 201
Errors lingüístics en el domini biomèdic: Cap a una tipologia d’errors per a l’espanyol
L’objectiu d’aquest treball és l’anàlisi d’errors continguts en un corpus d’informes
mèdics en llenguatge natural i el disseny d’una tipologia d’errors, ja que no hi va haver una
revisió sistemàtica sobre verificació i correcció d’errors en documentació clínica en castellà. En
el desenvolupament de sistemes automàtics de detecció i correcció, és d’interès aprofundir en la
naturalesa dels errors lingüístics que es produeixen en els informes clínics per tal de detectar-los i
tractar-los adequadament. Els resultats mostren que els errors d’omissió són els més freqüents en
la mostra analitzada i que la longitud de la paraula sens dubte influeix en la freqüència d’error.
La tipificació dels patrons d’error proporcionats permet el desenvolupament d’un mòdul basat
en coneixements lingüístics, actualment en curs, que serà capaç de millorar el rendiment dels
sistemes de correcció de detecció i correcció d’errors per al domini biomèdicThe objective of this work is the analysis of errors contained in a corpus of medical reports in
natural language and the design of a typology of errors, as there was no systematic review on
verification and correction of errors in clinical documentation in Spanish. In the development
of automatic detection and correction systems, it is of great interest to delve into the nature of
the linguistic errors that occur in clinical reports, in order to detect and treat them properly.
The results show that omission errors are the most frequent ones in the analyzed sample, and
that word length certainly influences error frequency. The typification of error patterns provided
is enabling the development of a module based on linguistic knowledge, which is currently in
progress. This will help to improve the performance of error detection and correction systems for
the biomedical domain.This work was supported by the Spanish National Research Agency (AEI) through project LaTe4PSP
(PID2019-107652RB-I00/AEI/10.13039/501100011033). Furthermore, the main autor is supported by
Ministerio de Universidades of Spain through the national program Ayudas para la formación de profesorado
universitario (FPU), with reference FPU16/0332
Analysis of errors in the automatic translation of questions for translingual QA systems
Purpose – This study aims to focus on the evaluation of systems for the automatic translation of questions destined to translingual question-answer (QA) systems. The efficacy of online translators when performing as tools in QA systems is analysed using a collection of documents in the Spanish language.
Design/methodology/approach – Automatic translation is evaluated in terms of the functionality of actual translations produced by three online translators (Google Translator, Promt Translator, and Worldlingo) by means of objective and subjective evaluation measures, and the typology of errors produced was identified. For this purpose, a comparative study of the quality of the translation of factual questions of the CLEF collection of queries was carried out, from German and French to Spanish.
Findings – It was observed that the rates of error for the three systems evaluated here are greater in the translations pertaining to the language pair German-Spanish. Promt was identified as the most reliable translator of the three (on average) for the two linguistic combinations evaluated. However, for the Spanish-German pair, a good assessment of the Google online translator was obtained as well. Most errors (46.38 percent) tended to be of a lexical nature, followed by those due to a poor translation of the interrogative particle of the query (31.16 percent).
Originality/value – The evaluation methodology applied focuses above all on the finality of the translation. That is, does the resulting question serve as effective input into a translingual QA system? Thus, instead of searching for “perfection”, the functionality of the question and its capacity to lead one to an adequate response are appraised. The results obtained contribute to the development of
improved translingual QA systems
Introduction to the special issue on annotated corpora
International audienceLes corpus annotés sont toujours plus cruciaux, aussi bien pour la recherche scien- tifique en linguistique que le traitement automatique des langues. Ce numéro spécial passe brièvement en revue l’évolution du domaine et souligne les défis à relever en restant dans le cadre actuel d’annotations utilisant des catégories analytiques, ainsi que ceux remettant en question le cadre lui-même. Il présente trois articles, l’un concernant l’évaluation de la qualité d’annotation, et deux concernant des corpus arborés du français, l’un traitant du plus ancien projet de corpus arboré du français, le French Treebank, le second concernant la conversion de corpus français dans le schéma interlingue des Universal Dependencies, offrant ainsi une illustration de l’histoire du développement des corpus arborés.Annotated corpora are increasingly important for linguistic scholarship, science and technology. This special issue briefly surveys the development of the field and points to challenges within the current framework of annotation using analytical categories as well as challenges to the framework itself. It presents three articles, one concerning the evaluation of the quality of annotation, and two concerning French treebanks, one dealing with the oldest project for French, the French Treebank, the second concerning the conversion of French corpora into the cross-lingual framework of Universal Dependencies, thus offering an illustration of the history of treebank development worldwide
- …