Search CORE

27,646 research outputs found

Effects of Pragmatic Inference on Phoneme Identification

Author: Ettlinger Marc
Rohde Hannah
Publication venue
Publication date: 01/01/2010
Field of study

Edinburgh Research Explorer

eScholarship - University of California

On the Effect of Semantically Enriched Context Models on Software Modularization

Author: Hage Jurriaan
Jansen Slinger
Khadka Ravi
Saeidi Amir
Publication venue: 'Aspect-Oriented Software Association (AOSA)'
Publication date: 04/08/2017
Field of study

Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

ZENODO

Utrecht University Repository

FigShare

Identifying cognates in English-Dutch and French-Dutch by means of orthographic information and cross-lingual word embeddings

Author: Labat Sofie
Lefever Els
Singh Pranaydeep
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography

Dependency relations as source context in phrase-based SMT

Author: Haque Rejwanul
Naskar Sudip Kumar
van den Bosch Antal
Way Andy
Publication venue
Publication date: 01/01/2009
Field of study

The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a Dutch—English translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline

Waseda University Repository

DCU Online Research Access Service

Pitch ability as an aptitude for tone learning

Author: Bowles Anita R.
Chang Charles B.
Karuzis Valerie P.
Publication venue: 'Wiley'
Publication date: 01/12/2016
Field of study

Tone languages such as Mandarin use voice pitch to signal lexical contrasts, presenting a challenge for second/foreign language (L2) learners whose native languages do not use pitch in this manner. The present study examined components of an aptitude for mastering L2 lexical tone. Native English speakers with no previous tone language experience completed a Mandarin word learning task, as well as tests of pitch ability, musicality, L2 aptitude, and general cognitive ability. Pitch ability measures improved predictions of learning performance beyond musicality, L2 aptitude, and general cognitive ability and also predicted transfer of learning to new talkers. In sum, although certain nontonal measures help predict successful tone learning, the central components of tonal aptitude are pitch-specific perceptual measures

Boston University Institutional Repository (OpenBU)

Recommended from our members

N400 evidence for musical facilitation of word boundary identification in second language exposure

Author: Moya Sepulveda Dayna Andrea
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Lexical acquisition requires the ability to identify word boundaries in a continuous auditory speech stream. This complex task is even more challenging when learning a new language in adulthood. Previous studies have shown that word boundary identification can be enhanced by pairing musical tones with native language phonemes. The objective of this dissertation study was to investigate whether musical tones also have this effect in a novel pseudo-language that uses non-native speech sounds. The N400, a brain event-related potential that has been linked with familiarity responses and detection of statistical regularities during exposure to pseudowords, provides an index of brain activation associated with semantico-lexical processing. In this study, language-like stimuli incorporating a French phoneme (a high, front, rounded vowel that is not part of the English phonetic inventory) were presented to typically developing English monolingual adults. Participants were presented to one of two types of exposure conditions for 7 minutes: monotone presentation of the concatenated language-like stimuli; or the same speech stream with a musical tone associated with each syllable. The exposure protocol was based on Schön, Boyer, Moreno et. al. (2008). Exposure was followed by a lexical decision task, requiring participants to distinguish “words” (heard during the exposure in a concatenated speech stream) from “part words” (end of one word and the beginning of another, crossing word boundaries). High-density EEG was recorded during the lexical decision and analyzed offline to determine N400 event-related responses to the stimuli in each condition. Although behavioral measures did not reveal any significant differences between groups or conditions, we found a N4 significantly different response to “partword” in the tone-exposed group, compared to the monotone. This difference only occurred in a frontal region with a right-hemisphere bias, and was not found to be significant over the left hemisphere. This difference suggests that participants in the tone group were supported in differentiating “words” from “partwords”, supporting the view that the inclusion of tonal information is beneficial in the early stages of L2 lexical learning

Columbia University Academic Commons

Towards an Indexical Model of Situated Language Comprehension for Cognitive Agents in Physical Worlds

Author: Laird John
Mininger Aaron
Mohan Shiwali
Publication venue
Publication date: 08/04/2016
Field of study

We propose a computational model of situated language comprehension based on the Indexical Hypothesis that generates meaning representations by translating amodal linguistic symbols to modal representations of beliefs, knowledge, and experience external to the linguistic system. This Indexical Model incorporates multiple information sources, including perceptions, domain knowledge, and short-term and long-term experiences during comprehension. We show that exploiting diverse information sources can alleviate ambiguities that arise from contextual use of underspecific referring expressions and unexpressed argument alternations of verbs. The model is being used to support linguistic interactions in Rosie, an agent implemented in Soar that learns from instruction.Comment: Advances in Cognitive Systems 3 (2014

arXiv.org e-Print Archive

Computational Sociolinguistics: A Survey

Author: de Jong Franciska
Doğruöz A. Seza
Nguyen Dong
Rosé Carolyn P.
Publication venue
Publication date: 01/01/2016
Field of study

Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

EUR Research Repository

University of Twente Research Information