4,012 research outputs found
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
Deep Active Learning for Named Entity Recognition
Deep learning has yielded state-of-the-art performance on many natural
language processing tasks including named entity recognition (NER). However,
this typically requires large amounts of labeled data. In this work, we
demonstrate that the amount of labeled training data can be drastically reduced
when deep learning is combined with active learning. While active learning is
sample-efficient, it can be computationally expensive since it requires
iterative retraining. To speed this up, we introduce a lightweight architecture
for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and
word encoders and a long short term memory (LSTM) tag decoder. The model
achieves nearly state-of-the-art performance on standard datasets for the task
while being computationally much more efficient than best performing models. We
carry out incremental active learning, during the training process, and are
able to nearly match state-of-the-art performance with just 25\% of the
original training data
Spatial evolution of human dialects
The geographical pattern of human dialects is a result of history. Here, we
formulate a simple spatial model of language change which shows that the final
result of this historical evolution may, to some extent, be predictable. The
model shows that the boundaries of language dialect regions are controlled by a
length minimizing effect analogous to surface tension, mediated by variations
in population density which can induce curvature, and by the shape of coastline
or similar borders. The predictability of dialect regions arises because these
effects will drive many complex, randomized early states toward one of a
smaller number of stable final configurations. The model is able to reproduce
observations and predictions of dialectologists. These include dialect
continua, isogloss bundling, fanning, the wave-like spread of dialect features
from cities, and the impact of human movement on the number of dialects that an
area can support. The model also provides an analytical form for S\'{e}guy's
Curve giving the relationship between geographical and linguistic distance, and
a generalisation of the curve to account for the presence of a population
centre. A simple modification allows us to analytically characterize the
variation of language use by age in an area undergoing linguistic change
Mind the Orthography: Revisiting the Contribution of Prereading Phonological Awareness to Reading Acquisition
published Online First March 21, 2022.Reading acquisition is based on a set of preliteracy skills that lay the foundation for future reading abilities.
Phonological awareness—the ability to identify and manipulate the sound units of oral language—
has been reported to play a central role in reading acquisition. However, current evidence is mixed with
respect to its universal contribution to reading acquisition across orthographies. This longitudinal study
examines the development and contribution of phonological awareness to early reading skills in
Spanish, a transparent orthography. The results of a comprehensive battery of phonological awareness
skills in a large sample of children (Time 1 n = 616, 296 females, mean age 5.6, from middle to high
socioeconomic backgrounds; Time 2 n = 397) with no reading experience at study onset suggest that the
development of phonological awareness is delayed in Spanish. Furthermore, our results show that phonological
awareness does not contribute to the prediction of reading acquisition above and beyond other
preliteracy skills. Letter knowledge indexes children’s ability to identify phonemes and thus takes a
more central role in the prediction of early reading skills. Therefore, we underscore the need to thoughtfully
address the distinctive features of the reading acquisition process across orthographies, which
should be taken into account in models of reading and learning to read.This project was funded by ANII FSED_2_2015_1_120741 and ANII
FSED_2_2016_1_131230 Grants. Camila Zugarramurdi received a PhD
Scholarship from Fundación Carolina
The source ambiguity problem: Distinguishing the effects of grammar and processing on acceptability judgments
Judgments of linguistic unacceptability may theoretically arise from either grammatical deviance or significant processing difficulty. Acceptability data are thus naturally ambiguous in theories that explicitly distinguish formal and functional constraints. Here, we consider this source ambiguity problem in the context of Superiority effects: the dispreference for ordering a wh-phrase in front of a syntactically “superior” wh-phrase in multiple wh-questions, e.g., What did who buy? More specifically, we consider the acceptability contrast between such examples and so-called D-linked examples, e.g., Which toys did which parents buy? Evidence from acceptability and self-paced reading experiments demonstrates that (i) judgments and processing times for Superiority violations vary in parallel, as determined by the kind of wh-phrases they contain, (ii) judgments increase with exposure, while processing times decrease, (iii) reading times are highly predictive of acceptability judgments for the same items, and (iv) the effects of the complexity of the wh-phrases combine in both acceptability judgments and reading times. This evidence supports the conclusion that D-linking effects are likely reducible to independently motivated cognitive mechanisms whose effects emerge in a wide range of sentence contexts. This in turn suggests that Superiority effects, in general, may owe their character to differential processing difficulty
A Deflationary Account of Mental Representation
Among the cognitive capacities of evolved creatures is the capacity to represent. Theories in cognitive neuroscience typically explain our manifest representational capacities by positing internal representations, but there is little agreement about how these representations function, especially with the relatively recent proliferation of connectionist, dynamical, embodied, and enactive approaches to cognition. In this talk I sketch an account of the nature and function of representation in cognitive neuroscience that couples a realist construal of representational vehicles with a pragmatic account of mental content. I call the resulting package a deflationary account of mental representation and I argue that it avoids the problems that afflict competing accounts
- …