Search CORE

484 research outputs found

AN INVESTIGATION INTO THE CROSS-LINGUISTIC ROBUSTNESS OF TEXTUAL EQUIVALENCE TECHNIQUES

Author: Alshahrani Amal
Publication venue
Publication date: 31/12/2018
Field of study

The University of Manchester - Institutional Repository

Investigating the Relationship between the Morphological Processing of Regular and Irregular Words and L2 Vocabulary Acquisition

Author: Ahmed Masrai
James Milton
Publication venue: 'Australian International Academic Centre'
Publication date: 01/01/2015
Field of study

Word formation in Arabic is rather different from English and relies more heavily on derivation rather than word creation. This study tests whether this difference may impact on the learning of words in English. Results of the study suggest that words that are irregularly derived in English are subject to a frequency effect in learning while regularly derived words are not. Results suggest that the predisposition of English for these irregular constructions may be a barrier to learning for learners with an aarabic speaking L1 background

Australian International Academic Centre: AIAC Journals

Directory of Open Access Journals

Cronfa at Swansea University

Proceedings of the 17th Annual Conference of the European Association for Machine Translation

Author
Publication venue: Hrvatsko društvo za jezične tehnologije
Publication date: 01/01/2014
Field of study

Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

An Empirical Work on Stable and Changing Elements in Historical Text Reuse

Author: Berger Maria
Publication venue
Publication date: 02/05/2019
Field of study

Georg-August-University Göttingen

Recommended from our members

Paraphrase identification using knowledge-lean techniques

Author: Eyecioglu Ozmutlu Asli
Publication venue
Publication date: 14/11/2016
Field of study

This research addresses the problem of identification of sentential paraphrases; that is, the ability of an estimator to predict well whether two sentential text fragments are paraphrases. The paraphrase identification task has practical importance in the Natural Language Processing (NLP) community because of the need to deal with the pervasive problem of linguistic variation. Accurate methods for identifying paraphrases should help to improve the performance of NLP systems that require language understanding. This includes key applications such as machine translation, information retrieval and question answering amongst others. Over the course of the last decade, a growing body of research has been conducted on paraphrase identification and it has become an individual working area of NLP. Our objective is to investigate whether techniques concentrating on automated understanding of text requiring less resource may achieve results comparable to methods employing more sophisticated NLP processing tools and other resources. These techniques, which we call “knowledge-lean”, range from simple, shallow overlap methods based on lexical items or n-grams through to more sophisticated methods that employ automatically generated distributional thesauri. The work begins by focusing on techniques that exploit lexical overlap and text-based statistical techniques that are much less in need of NLP tools. We investigate the question “To what extent can these methods be used for the purpose of a paraphrase identification task?” For the two gold standard data, we obtained competitive results on the Microsoft Research Paraphrase Corpus (MSRPC) and reached the state-of-the-art results on the Twitter Paraphrase Corpus, using only n-gram overlap features in conjunction with support vector machines (SVMs). These techniques do not require any language specific tools or external resources and appear to perform well without the need to normalise colloquial language such as that found on Twitter. It was natural to extend the scope of the research and to consider experimenting on another language, which is poor in resources. The scarcity of available paraphrase data led us to construct our own corpus; we have constructed a paraphrasecorpus in Turkish. This corpus is relatively small but provides a representative collection, including a variety of texts. While there is still debate as to whether a binary or fine-grained judgement satisfies a paraphrase corpus, we chose to provide data for a sentential textual similarity task by agreeing on fine-grained scoring, knowing that this could be converted to binary scoring, but not the other way around. The correlation between the results from different corpora is promising. Therefore, it can be surmised that languages poor in resources can benefit from knowledge-lean techniques. Discovering the strengths of knowledge-lean techniques extended with a new perspective to techniques that use distributional statistical features of text by representing each word as a vector (word2vec). While recent research focuses on larger fragments of text with word2vec, such as phrases, sentences and even paragraphs, a new approach is presented by introducing vectors of character n-grams that carry the same attributes as word vectors. The proposed method has the ability to capture syntactic relations as well as semantic relations without semantic knowledge. This is proven to be competitive on Twitter compared to more sophisticated methods

Sussex Research Online

Improving Learning and Teaching through Automated Short-Answer Marking

Author: Siddiqi Raheel
Publication venue
Publication date: 31/12/2010
Field of study

The University of Manchester - Institutional Repository

Tackling Sequence to Sequence Mapping Problems with Neural Networks

Author: Yu Lei
Publication venue
Publication date: 01/01/2017
Field of study

In Natural Language Processing (NLP), it is important to detect the relationship between two sequences or to generate a sequence of tokens given another observed sequence. We call the type of problems on modelling sequence pairs as sequence to sequence (seq2seq) mapping problems. A lot of research has been devoted to finding ways of tackling these problems, with traditional approaches relying on a combination of hand-crafted features, alignment models, segmentation heuristics, and external linguistic resources. Although great progress has been made, these traditional approaches suffer from various drawbacks, such as complicated pipeline, laborious feature engineering, and the difficulty for domain adaptation. Recently, neural networks emerged as a promising solution to many problems in NLP, speech recognition, and computer vision. Neural models are powerful because they can be trained end to end, generalise well to unseen examples, and the same framework can be easily adapted to a new domain. The aim of this thesis is to advance the state-of-the-art in seq2seq mapping problems with neural networks. We explore solutions from three major aspects: investigating neural models for representing sequences, modelling interactions between sequences, and using unpaired data to boost the performance of neural models. For each aspect, we propose novel models and evaluate their efficacy on various tasks of seq2seq mapping.Comment: PhD thesi

arXiv.org e-Print Archive

Oxford University Research Archive

How many words do you need to speak Arabic? An Arabic vocabulary size test

Author: Ahmed Masrai
Alderson J.C.
Anglin J.
Atwell E.
Buckwalter T.
Cohen D.
DeVellis R.F.
James Milton
Laufer B.
Marslen-Wilson W.D.
Masrai A.
McCarthy J.
Meara P.
Meara P.
Messick S.
Nagy W.
Nation P.
Nation P.
Pallant J.
Palmer H.E.
Thorndike E.
Publication venue: 'Informa UK Limited'
Publication date: 03/01/2017
Field of study

This study describes a vocabulary size test in Arabic used with 339 nativespeaking learners at school and university in Saudi Arabia. Native speakervocabulary size scores should provide targets for attainment for learners ofArabic, should inform the writers of course books and teaching materials,and the test itself should allow learners to monitor their progress towardsthe goal of fluency. Educated native speakers of Arabic possess arecognition vocabulary about 25,000 words, a total which is largecompared with equivalent test scores of native speakers of English. Theresults also suggest that acquisition increases in speed with age and thisis tentatively explained by the highly regular system of morphologicalderivation which Arabic uses and which, it is thought, is acquired inadolescence. This again appears different from English where the rate ofacquisition appears to decline with age. While the test appears reliableand valid, there are issues surrounding the definition of a word in Arabicand further research into how words are stored, retrieved and processedin Arabic is needed to inform the construction of further tests whichmight, it is thought, profitably use a more encompassing definition ofthe lemma as the basis for testing

Crossref

Cronfa at Swansea University