46,177 research outputs found
On the Effect of Semantically Enriched Context Models on Software Modularization
Many of the existing approaches for program comprehension rely on the
linguistic information found in source code, such as identifier names and
comments. Semantic clustering is one such technique for modularization of the
system that relies on the informal semantics of the program, encoded in the
vocabulary used in the source code. Treating the source code as a collection of
tokens loses the semantic information embedded within the identifiers. We try
to overcome this problem by introducing context models for source code
identifiers to obtain a semantic kernel, which can be used for both deriving
the topics that run through the system as well as their clustering. In the
first model, we abstract an identifier to its type representation and build on
this notion of context to construct contextual vector representation of the
source code. The second notion of context is defined based on the flow of data
between identifiers to represent a module as a dependency graph where the nodes
correspond to identifiers and the edges represent the data dependencies between
pairs of identifiers. We have applied our approach to 10 medium-sized open
source Java projects, and show that by introducing contexts for identifiers,
the quality of the modularization of the software systems is improved. Both of
the context models give results that are superior to the plain vector
representation of documents. In some cases, the authoritativeness of
decompositions is improved by 67%. Furthermore, a more detailed evaluation of
our approach on JEdit, an open source editor, demonstrates that inferred topics
through performing topic analysis on the contextual representations are more
meaningful compared to the plain representation of the documents. The proposed
approach in introducing a context model for source code identifiers paves the
way for building tools that support developers in program comprehension tasks
such as application and domain concept location, software modularization and
topic analysis
Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context
Mathematical formulae represent complex semantic information in a concise
form. Especially in Science, Technology, Engineering, and Mathematics,
mathematical formulae are crucial to communicate information, e.g., in
scientific papers, and to perform computations using computer algebra systems.
Enabling computers to access the information encoded in mathematical formulae
requires machine-readable formats that can represent both the presentation and
content, i.e., the semantics, of formulae. Exchanging such information between
systems additionally requires conversion methods for mathematical
representation formats. We analyze how the semantic enrichment of formulae
improves the format conversion process and show that considering the textual
context of formulae reduces the error rate of such conversions. Our main
contributions are: (1) providing an openly available benchmark dataset for the
mathematical format conversion task consisting of a newly created test
collection, an extensive, manually curated gold standard and task-specific
evaluation metrics; (2) performing a quantitative evaluation of
state-of-the-art tools for mathematical format conversions; (3) presenting a
new approach that considers the textual context of formulae to reduce the error
rate for mathematical format conversions. Our benchmark dataset facilitates
future research on mathematical format conversions as well as research on many
problems in mathematical information retrieval. Because we annotated and linked
all components of formulae, e.g., identifiers, operators and other entities, to
Wikidata entries, the gold standard can, for instance, be used to train methods
for formula concept discovery and recognition. Such methods can then be applied
to improve mathematical information retrieval systems, e.g., for semantic
formula search, recommendation of mathematical content, or detection of
mathematical plagiarism.Comment: 10 pages, 4 figure
Factors shaping the evolution of electronic documentation systems
The main goal is to prepare the space station technical and managerial structure for likely changes in the creation, capture, transfer, and utilization of knowledge. By anticipating advances, the design of Space Station Project (SSP) information systems can be tailored to facilitate a progression of increasingly sophisticated strategies as the space station evolves. Future generations of advanced information systems will use increases in power to deliver environmentally meaningful, contextually targeted, interconnected data (knowledge). The concept of a Knowledge Base Management System is emerging when the problem is focused on how information systems can perform such a conversion of raw data. Such a system would include traditional management functions for large space databases. Added artificial intelligence features might encompass co-existing knowledge representation schemes; effective control structures for deductive, plausible, and inductive reasoning; means for knowledge acquisition, refinement, and validation; explanation facilities; and dynamic human intervention. The major areas covered include: alternative knowledge representation approaches; advanced user interface capabilities; computer-supported cooperative work; the evolution of information system hardware; standardization, compatibility, and connectivity; and organizational impacts of information intensive environments
CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning
To accelerate software development, much research has been performed to help
people understand and reuse the huge amount of available code resources. Two
important tasks have been widely studied: code retrieval, which aims to
retrieve code snippets relevant to a given natural language query from a code
base, and code annotation, where the goal is to annotate a code snippet with a
natural language description. Despite their advancement in recent years, the
two tasks are mostly explored separately. In this work, we investigate a novel
perspective of Code annotation for Code retrieval (hence called `CoaCor'),
where a code annotation model is trained to generate a natural language
annotation that can represent the semantic meaning of a given code snippet and
can be leveraged by a code retrieval model to better distinguish relevant code
snippets from others. To this end, we propose an effective framework based on
reinforcement learning, which explicitly encourages the code annotation model
to generate annotations that can be used for the retrieval task. Through
extensive experiments, we show that code annotations generated by our framework
are much more detailed and more useful for code retrieval, and they can further
improve the performance of existing code retrieval models significantly.Comment: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 201
Conceptual graph-based knowledge representation for supporting reasoning in African traditional medicine
Although African patients use both conventional or modern and traditional healthcare simultaneously, it has been proven that 80% of people rely on African traditional medicine (ATM). ATM includes medical activities stemming from practices, customs and traditions which were integral to the distinctive African cultures. It is based mainly on the oral transfer of knowledge, with the risk of losing critical knowledge. Moreover, practices differ according to the regions and the availability of medicinal plants. Therefore, it is necessary to compile tacit, disseminated and complex knowledge from various Tradi-Practitioners (TP) in order to determine interesting patterns for treating a given disease. Knowledge engineering methods for traditional medicine are useful to model suitably complex information needs, formalize knowledge of domain experts and highlight the effective practices for their integration to conventional medicine. The work described in this paper presents an approach which addresses two issues. First it aims at proposing a formal representation model of ATM knowledge and practices to facilitate their sharing and reusing. Then, it aims at providing a visual reasoning mechanism for selecting best available procedures and medicinal plants to treat diseases. The approach is based on the use of the Delphi method for capturing knowledge from various experts which necessitate reaching a consensus. Conceptual graph formalism is used to model ATM knowledge with visual reasoning capabilities and processes. The nested conceptual graphs are used to visually express the semantic meaning of Computational Tree Logic (CTL) constructs that are useful for formal specification of temporal properties of ATM domain knowledge. Our approach presents the advantage of mitigating knowledge loss with conceptual development assistance to improve the quality of ATM care (medical diagnosis and therapeutics), but also patient safety (drug monitoring)
Locating bugs without looking back
Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history
The 'what' and 'how' of learning in design, invited paper
Previous experiences hold a wealth of knowledge which we often take for granted and use unknowingly through our every day working lives. In design, those experiences can play a crucial role in the success or failure of a design project, having a great deal of influence on the quality, cost and development time of a product. But how can we empower computer based design systems to acquire this knowledge? How would we use such systems to support design? This paper outlines some of the work which has been carried out in applying and developing Machine Learning techniques to support the design activity; particularly in utilising previous designs and learning the design process
- …