4,786 research outputs found
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
A Unified multilingual semantic representation of concepts
Semantic representation lies at the core of several applications in Natural Language Processing. However, most existing semantic representation techniques cannot be used effectively for the representation of individual word senses. We put forward a novel multilingual concept representation, called MUFFIN , which not only enables accurate representation of word senses in different languages, but also provides multiple advantages over existing approaches. MUFFIN represents a given concept in a unified semantic space irrespective of the language of interest, enabling cross-lingual comparison of different concepts. We evaluate our approach in two different evaluation benchmarks, semantic similarity and Word Sense Disambiguation, reporting state-of-the-art performance on several standard datasets
Quantitative Perspectives on Fifty Years of the Journal of the History of Biology
Journal of the History of Biology provides a fifty-year long record for
examining the evolution of the history of biology as a scholarly discipline. In
this paper, we present a new dataset and preliminary quantitative analysis of
the thematic content of JHB from the perspectives of geography, organisms, and
thematic fields. The geographic diversity of authors whose work appears in JHB
has increased steadily since 1968, but the geographic coverage of the content
of JHB articles remains strongly lopsided toward the United States, United
Kingdom, and western Europe and has diversified much less dramatically over
time. The taxonomic diversity of organisms discussed in JHB increased steadily
between 1968 and the late 1990s but declined in later years, mirroring broader
patterns of diversification previously reported in the biomedical research
literature. Finally, we used a combination of topic modeling and nonlinear
dimensionality reduction techniques to develop a model of multi-article fields
within JHB. We found evidence for directional changes in the representation of
fields on multiple scales. The diversity of JHB with regard to the
representation of thematic fields has increased overall, with most of that
diversification occurring in recent years. Drawing on the dataset generated in
the course of this analysis, as well as web services in the emerging digital
history and philosophy of science ecosystem, we have developed an interactive
web platform for exploring the content of JHB, and we provide a brief overview
of the platform in this article. As a whole, the data and analyses presented
here provide a starting-place for further critical reflection on the evolution
of the history of biology over the past half-century.Comment: 45 pages, 14 figures, 4 table
- …