27,646 research outputs found
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
Dependency relations as source context in phrase-based SMT
The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical
choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and
supertags have been explored as effective source context in SMT. In this paper, we show that position-independent syntactic dependency relations of the head of a source phrase can be modeled as useful source context to improve target phrase selection and thereby improve overall performance of PB-SMT. On a DutchâEnglish translation task, by combining dependency relations and syntactic contextual features (part-of-speech), we achieved a 1.0 BLEU (Papineni et al., 2002) point improvement (3.1% relative) over the baseline
Pitch ability as an aptitude for tone learning
Tone languages such as Mandarin use voice pitch to signal lexical contrasts, presenting a challenge for second/foreign language (L2) learners whose native languages do not use pitch in this manner. The present study examined components of an aptitude for mastering L2 lexical tone. Native English speakers with no previous tone language experience completed a Mandarin word learning task, as well as tests of pitch ability, musicality, L2 aptitude, and general cognitive ability. Pitch ability measures improved predictions of learning performance beyond musicality, L2 aptitude, and general cognitive ability and also predicted transfer of learning to new talkers. In sum, although certain nontonal measures help predict successful tone learning, the central components of tonal aptitude are pitch-specific perceptual measures
Recommended from our members
N400 evidence for musical facilitation of word boundary identification in second language exposure
Lexical acquisition requires the ability to identify word boundaries in a continuous auditory speech stream. This complex task is even more challenging when learning a new language in adulthood. Previous studies have shown that word boundary identification can be enhanced by pairing musical tones with native language phonemes. The objective of this dissertation study was to investigate whether musical tones also have this effect in a novel pseudo-language that uses non-native speech sounds. The N400, a brain event-related potential that has been linked with familiarity responses and detection of statistical regularities during exposure to pseudowords, provides an index of brain activation associated with semantico-lexical processing. In this study, language-like stimuli incorporating a French phoneme (a high, front, rounded vowel that is not part of the English phonetic inventory) were presented to typically developing English monolingual adults. Participants were presented to one of two types of exposure conditions for 7 minutes: monotone presentation of the concatenated language-like stimuli; or the same speech stream with a musical tone associated with each syllable. The exposure protocol was based on Schön, Boyer, Moreno et. al. (2008). Exposure was followed by a lexical decision task, requiring participants to distinguish âwordsâ (heard during the exposure in a concatenated speech stream) from âpart wordsâ (end of one word and the beginning of another, crossing word boundaries). High-density EEG was recorded during the lexical decision and analyzed offline to determine N400 event-related responses to the stimuli in each condition. Although behavioral measures did not reveal any significant differences between groups or conditions, we found a N4 significantly different response to âpartwordâ in the tone-exposed group, compared to the monotone. This difference only occurred in a frontal region with a right-hemisphere bias, and was not found to be significant over the left hemisphere. This difference suggests that participants in the tone group were supported in differentiating âwordsâ from âpartwordsâ, supporting the view that the inclusion of tonal information is beneficial in the early stages of L2 lexical learning
Towards an Indexical Model of Situated Language Comprehension for Cognitive Agents in Physical Worlds
We propose a computational model of situated language comprehension based on
the Indexical Hypothesis that generates meaning representations by translating
amodal linguistic symbols to modal representations of beliefs, knowledge, and
experience external to the linguistic system. This Indexical Model incorporates
multiple information sources, including perceptions, domain knowledge, and
short-term and long-term experiences during comprehension. We show that
exploiting diverse information sources can alleviate ambiguities that arise
from contextual use of underspecific referring expressions and unexpressed
argument alternations of verbs. The model is being used to support linguistic
interactions in Rosie, an agent implemented in Soar that learns from
instruction.Comment: Advances in Cognitive Systems 3 (2014
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
- âŠ