Search CORE

116 research outputs found

Conquering Language: Using NLP on a Massive Scale to Build High Dimensional Language Models from the Web

Author: A. Heydon
B. New
D. Hiemstra
D. Hindle
K.W. Church
P. Cimiano
P.D. Turney
P.G. Ipeirotis
R. Besançon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

International audienceDictionaries only contain some of the information we need to know about a language. The growth of the Web, the maturation of linguistic process-ing tools, and the decline in price of memory storage allow us to envision de-scriptions of languages that are much larger than before. We can conceive of building a complete language model for a language using all the text that is found on the Web for this language. This article describes our current project to do just that

Crossref

INRIA a CCSD electronic archive server

A Comparison of Different Machine Transliteration Models

Author: Choi K.
Isahara H.
Oh J.
Publication venue: 'AI Access Foundation'
Publication date: 06/10/2011
Field of study

Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models -- grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model -- have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance

arXiv.org e-Print Archive

Crossref

The extraction, introduction, transfer, diffusion and integration of loanwords in Japan : loanwords in a literate society.

Author: Forth Simon William
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/01/2006
Field of study

This doctoral thesis seeks primarily to establish a model which shows how loanwords in Japanese evolve through a stepwise process. The process starts well before the actual borrowing itself, when Japanese school children acquire a stratum of English morphemes to which conventional pronunciations have been ascribed. This stratum could be said to be composed of a large set of orthography-pronunciation analogies. Foreign words are then extracted from foreign word stocks by agents of introduction, typically advertising copywriters or magazine journalists. However, since these words are unsuitable for use in Japanese as is, the agents then proceed to domesticate them according to Japanese rules of phonology, orthography, morphology, syntax and semantics. The next step involves transference into the public zone, crucially via the written word, before being disseminated and finally integrated. A few researchers have hinted that such a process exists but have taken it no further. Here, proof is evinced by interviews with the agents themselves and together with documentary and quantitative corpus analyses it is shown that lexical borrowing of western words in Japanese proceeds in accordance with such a model. It is furthermore shown that these agents adhere to one of three broad cultural environments and borrow/domesticate words within this genre. They then pass along channels of tran,~ference, dissemination and integration in accordance with genre specific patterns. Investigation of these genre-specific channels of evolution constitutes the second research objective. Three other research objectives are addressed within the framework of this model, namely genre-specific patterns of transference and dissemination, when a word changes from being a foreign word to being an integrated loanword, and factors governing the displacement of native words by loanwords

White Rose E-theses Online

OpenGrey Repository

Script Effects as the Hidden Drive of the Mind, Cognition, and Culture

Author: Pae Hye K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access volume reveals the hidden power of the script we read in and how it shapes and drives our minds, ways of thinking, and cultures. Expanding on the Linguistic Relativity Hypothesis (i.e., the idea that language affects the way we think), this volume proposes the “Script Relativity Hypothesis” (i.e., the idea that the script in which we read affects the way we think) by offering a unique perspective on the effect of script (alphabets, morphosyllabaries, or multi-scripts) on our attention, perception, and problem-solving. Once we become literate, fundamental changes occur in our brain circuitry to accommodate the new demand for resources. The powerful effects of literacy have been demonstrated by research on literate versus illiterate individuals, as well as cross-scriptal transfer, indicating that literate brain networks function differently, depending on the script being read. This book identifies the locus of differences between the Chinese, Japanese, and Koreans, and between the East and the West, as the neural underpinnings of literacy. To support the “Script Relativity Hypothesis”, it reviews a vast corpus of empirical studies, including anthropological accounts of human civilization, social psychology, cognitive psychology, neuropsychology, applied linguistics, second language studies, and cross-cultural communication. It also discusses the impact of reading from screens in the digital age, as well as the impact of bi-script or multi-script use, which is a growing trend around the globe. As a result, our minds, ways of thinking, and cultures are now growing closer together, not farther apart. ; Examines the origin, emergence, and co-evolution of written language, the human mind, and culture within the purview of script effects Investigates how the scripts we read over time shape our cognition, mind, and thought patterns Provides a new outlook on the four representative writing systems of the world Discusses the consequences of literacy for the functioning of the min

OAPEN Library

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

Author
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

English speakers' common orthographic errors in Arabic as L2 writing system : an analytical case study

Author: Hisham Saleh A
Publication venue: Newcastle University
Publication date: 01/01/2015
Field of study

PhD ThesisThe research involving Arabic Writing System (WS) is quite limited. Yet, researching writing errors of L2WS Arabic against a certain L1WS seems to be relatively neglected. This study attempts to identify, describe, and explain common orthographic errors in Arabic writing amongst English-speaking learners. First, it outlines the Arabic Writing System’s (AWS) characteristics and available empirical studies of L2WS Arabic. This study embraced the Error Analysis approach, utilising a mixed-method design that deployed quantitative and qualitative tools (writing tests, questionnaire, and interview). The data were collected from several institutions around the UK, which collectively accounted for 82 questionnaire responses, 120 different writing samples from 44 intermediate learners, and six teacher interviews. The hypotheses for this research were; a) English-speaking learners of Arabic make common orthographic errors similar to those of Arabic native speakers; b) English-speaking learners share several common orthographic errors with other learners of Arabic as a second/foreign language (AFL); and c) English-speaking learners of Arabic produce their own common orthographic errors which are specifically related to the differences between the two WSs. The results confirmed all three hypotheses. Specifically, English-speaking learners of L2WS Arabic commonly made six error types: letter ductus (letter shape), orthography (spelling), phonology, letter dots, allographemes (i.e. letterform), and direction. Gemination and L1WS transfer error rates were not found to be major. Another important result showed that five letter groups in addition to two letters are particularly challenging to English-speaking learners. Study results indicated that error causes were likely to be from one of four factors: script confusion, orthographic difficulties, phonological realisation, and teaching/learning strategies. These results are generalizable as the data were collected from several institutions in different parts of the UK. Suggestions and implications as well as recommendations for further research are outlined accordingly in the conclusion chapter

Newcastle University eTheses

Talking Bits:An investigation into the nature of digital communication technology and its impact on society

Author: Thomsen Michael
Publication venue: Department of Architecture, Design & Media Technology, Aalborg University
Publication date: 18/06/2010
Field of study

VBN

V International Colloquium Proceedings

Author: Colloquium International
Publication venue: Digital Commons at Loyola Marymount University and Loyola Law School
Publication date: 22/08/2019
Field of study

Loyola Marymount University

Tartu Ülikooli toimetised. Tööd semiootika alalt. 1964-1992. 0259-4668

Author
Publication venue: Tartu : Tartu University Press
Publication date: 01/01/2001
Field of study

http://www.ester.ee/record=b1331700*es

DSpace at Tartu University Library

Alliteration and assonance as mnemonic devices in second language word-pair learning

Author: Green Michael
Publication venue
Publication date
Field of study

The central question addressed in this thesis is to what extent phonological patterns, in particular alliteration and assonance, aid the recall and retention of word pairs for Japanese L1 learners of English. The research builds on previous findings from a series of classroom-based quasi-experimental work, principally from the team of Boers, Lindstromberg and Eyckmans, which shows a mnemonic advantage for collocations and compounds that have phonological patterns, compared to equivalent word strings with no phonological overlap. This advantage appears in both free- and cued-recall tests, and across a variety of temporal intervals (up to two weeks). Much of the prior research has drawn participant samples from a Dutch L1 speaking population. Furthermore, these studies have mainly used target items deemed to be familiar to the participants. This thesis is motivated by the need to question if the previous empirical findings generalise to a population whose L1 phonological constructs are different from those of Dutch L1 speakers. The purpose is to test if Japanese L1 speakers have a different perception of alliteration and assonance, and if so, whether this impacts on their learning behaviour. A further aim is to investigate whether the mnemonic effect applies to unfamiliar target items. In addition, the thesis considers the extent to which the cognitive process of form-based priming underpins the mnemonic effect. A series of four experiments are conducted which progressively examine the processing advantage conferred by alliterating and assonating patterns. Different sets of experimental stimuli are used, including high-frequency, low-frequency, and pseudoword items. Treatment phases often incorporate a dictation activity when using familiar word stimuli, or a study phase when using unfamiliar stimuli. A variety of testing instruments are adopted to measure recall of the written forms of the stimuli, or the forms plus meanings of novel stimuli, over differing periods of time. One study uses a Lexical Decision Task to ascertain if phonological patterns aid lexical processing. Overall, the findings indicate that phonological patterns do confer a small mnemonic advantage for known stimuli, though the effect dissipates with time. However, the extent to which orthographic similarity plays a facilitatory role remains unclear. When participants are asked to learn novel word-pairs the results are more ambiguous; alliteration seems to have a greater mnemonic effect than assonance, but the cognitive challenge of learning new material appears to mitigate any robust mnemonic effects. The data from the Lexical Decision Task do not support any strong claims that perceptual priming is the determining factor for the processing advantage. In answer to the central question, it can be inferred from the findings that both phonological and orthographic patterns are a useful pedagogical tool for helping language learners recall and retain multi-word strings

Online Research @ Cardiff