7,617 research outputs found
On the Similarities Between Native, Non-native and Translated Texts
We present a computational analysis of three language varieties: native,
advanced non-native, and translation. Our goal is to investigate the
similarities and differences between non-native language productions and
translations, contrasting both with native language. Using a collection of
computational methods we establish three main results: (1) the three types of
texts are easily distinguishable; (2) non-native language and translations are
closer to each other than each of them is to native language; and (3) some of
these characteristics depend on the source or native language, while others do
not, reflecting, perhaps, unified principles that similarly affect translations
and non-native language.Comment: ACL2016, 12 page
The Mechanics of Embodiment: A Dialogue on Embodiment and Computational Modeling
Embodied theories are increasingly challenging traditional views of cognition by arguing that conceptual representations that constitute our knowledge are grounded in sensory and motor experiences, and processed at this sensorimotor level, rather than being represented and processed abstractly in an amodal conceptual system. Given the established empirical foundation, and the relatively underspecified theories to date, many researchers are extremely interested in embodied cognition but are clamouring for more mechanistic implementations. What is needed at this stage is a push toward explicit computational models that implement sensory-motor grounding as intrinsic to cognitive processes. In this article, six authors from varying backgrounds and approaches address issues concerning the construction of embodied computational models, and illustrate what they view as the critical current and next steps toward mechanistic theories of embodiment. The first part has the form of a dialogue between two fictional characters: Ernest, the �experimenter�, and Mary, the �computational modeller�. The dialogue consists of an interactive sequence of questions, requests for clarification, challenges, and (tentative) answers, and touches the most important aspects of grounded theories that should inform computational modeling and, conversely, the impact that computational modeling could have on embodied theories. The second part of the article discusses the most important open challenges for embodied computational modelling
ON MONITORING LANGUAGE CHANGE WITH THE SUPPORT OF CORPUS PROCESSING
One of the fundamental characteristics of language is that it can change over time. One
method to monitor the change is by observing its corpora: a structured language
documentation. Recent development in technology, especially in the field of Natural
Language Processing allows robust linguistic processing, which support the description of
diverse historical changes of the corpora. The interference of human linguist is inevitable as
it determines the gold standard, but computer assistance provides considerable support by
incorporating computational approach in exploring the corpora, especially historical
corpora. This paper proposes a model for corpus development, where corpus are annotated
to support further computational operations such as lexicogrammatical pattern matching,
automatic retrieval and extraction. The corpus processing operations are performed by local
grammar based corpus processing software on a contemporary Indonesian corpus. This
paper concludes that data collection and data processing in a corpus are equally crucial
importance to monitor language change, and none can be set aside
Bilingual newsgroups in Catalonia: a challenge for machine translation
This paper presents a linguistic analysis of a corpus of messages written in Catalan and Spanish, which come from several informal newsgroups on the Universitat Oberta de Catalunya (Open University of Catalonia; henceforth, UOC) Virtual Campus. The surrounding environment is one of extensive bilingualism and contact between Spanish and Catalan. The study was carried out as part of the INTERLINGUA project conducted by the UOC's Internet Interdisciplinary Institute (IN3). Its main goal is to ascertain the linguistic characteristics of the e-mail register in the newsgroups in order to assess their implications for the creation of an online machine translation environment. The results shed empirical light on the relevance of characteristics of the e-mail register, the impact of language contact and interference, and their implications for the use of machine translation for CMC data in order to facilitate cross-linguistic communication on the Internet
Distributional analyses in the picture-word interference paradigm: Exploring the semantic interference and the distractor frequency effects.
he present study explores the distributional features of two important effects within the picture-word interference paradigm: the semantic interference and the distractor frequency effects. These two effects display different and specific distributional profiles. Semantic interference appears greatly reduced in faster response times, while it reaches its full magnitude only in slower responses. This can be interpreted as a sign of fluctuant attentional efficiency in resolving response conflict. In contrast, the distractor frequency effect is mediated mainly by a distributional shift, with low frequency distractors uniformly shifting reaction times distribution towards a slower range of latencies. This finding fits with the idea that distractor frequency exerts its effect by modulating the point in time in which operations required to discard the distractor can start. Taken together, these results are congruent with current theoretical accounts of both the semantic interference and distractor frequency effects. Critically, distributional analyses highlight and further describe the different cognitive dynamics underlying these two effects, suggesting that this analytical tool is able to offer important insights about lexical access during speech productio
Cognitive constraints and island effects
Competence-based theories of island effects play a central role in generative grammar, yet the graded nature of many syntactic islands has never been properly accounted for. Categorical syntactic accounts of island effects have persisted in spite of a wealth of data suggesting that island effects are not categorical in nature and that nonstructural manipulations that leave island structures intact can radically alter judgments of island violations. We argue here, building on work by Paul Deane, Robert Kluender, and others, that processing factors have the potential to account for this otherwise unexplained variation in acceptability judgments.
We report the results of self-paced reading experiments and controlled acceptability studies that explore the relationship between processing costs and judgments of acceptability. In each of the three self-paced reading studies, the data indicate that the processing cost of different types of island violations can be significantly reduced to a degree comparable to that of nonisland filler-gap constructions by manipulating a single nonstructural factor. Moreover, this reduction in processing cost is accompanied by significant improvements in acceptability. This evidence favors the hypothesis that island-violating constructions involve numerous processing pressures that aggregate to drive processing difficulty above a threshold, resulting in unacceptability. We examine the implications of these findings for the grammar of filler-gap dependencies
Language classification from bilingual word embedding graphs
We study the role of the second language in bilingual word embeddings in
monolingual semantic evaluation tasks. We find strongly and weakly positive
correlations between down-stream task performance and second language
similarity to the target language. Additionally, we show how bilingual word
embeddings can be employed for the task of semantic language classification and
that joint semantic spaces vary in meaningful ways across second languages. Our
results support the hypothesis that semantic language similarity is influenced
by both structural similarity as well as geography/contact.Comment: To be published at Coling 201
Presenting GECO : an eyetracking corpus of monolingual and bilingual sentence reading
This paper introduces GECO, the Ghent Eye-tracking Corpus, a monolingual and bilingual corpus of eye-tracking data of participants reading a complete novel. English monolinguals and Dutch-English bilinguals read an entire novel, which was presented in paragraphs on the screen. The bilinguals read half of the novel in their first language, and the other half in their second language. In this paper we describe the distributions and descriptive statistics of the most important reading time measures for the two groups of participants. This large eye-tracking corpus is perfectly suited for both exploratory purposes as well as more directed hypothesis testing, and it can guide the formulation of ideas and theories about naturalistic reading processes in a meaningful context. Most importantly, this corpus has the potential to evaluate the generalizability of monolingual and bilingual language theories and models to reading of long texts and narratives
- …