46,177 research outputs found

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

    Full text link
    Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.Comment: 10 pages, 4 figure

    Factors shaping the evolution of electronic documentation systems

    Get PDF
    The main goal is to prepare the space station technical and managerial structure for likely changes in the creation, capture, transfer, and utilization of knowledge. By anticipating advances, the design of Space Station Project (SSP) information systems can be tailored to facilitate a progression of increasingly sophisticated strategies as the space station evolves. Future generations of advanced information systems will use increases in power to deliver environmentally meaningful, contextually targeted, interconnected data (knowledge). The concept of a Knowledge Base Management System is emerging when the problem is focused on how information systems can perform such a conversion of raw data. Such a system would include traditional management functions for large space databases. Added artificial intelligence features might encompass co-existing knowledge representation schemes; effective control structures for deductive, plausible, and inductive reasoning; means for knowledge acquisition, refinement, and validation; explanation facilities; and dynamic human intervention. The major areas covered include: alternative knowledge representation approaches; advanced user interface capabilities; computer-supported cooperative work; the evolution of information system hardware; standardization, compatibility, and connectivity; and organizational impacts of information intensive environments

    CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning

    Full text link
    To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have been widely studied: code retrieval, which aims to retrieve code snippets relevant to a given natural language query from a code base, and code annotation, where the goal is to annotate a code snippet with a natural language description. Despite their advancement in recent years, the two tasks are mostly explored separately. In this work, we investigate a novel perspective of Code annotation for Code retrieval (hence called `CoaCor'), where a code annotation model is trained to generate a natural language annotation that can represent the semantic meaning of a given code snippet and can be leveraged by a code retrieval model to better distinguish relevant code snippets from others. To this end, we propose an effective framework based on reinforcement learning, which explicitly encourages the code annotation model to generate annotations that can be used for the retrieval task. Through extensive experiments, we show that code annotations generated by our framework are much more detailed and more useful for code retrieval, and they can further improve the performance of existing code retrieval models significantly.Comment: 10 pages, 2 figures. Accepted by The Web Conference (WWW) 201

    Conceptual graph-based knowledge representation for supporting reasoning in African traditional medicine

    Get PDF
    Although African patients use both conventional or modern and traditional healthcare simultaneously, it has been proven that 80% of people rely on African traditional medicine (ATM). ATM includes medical activities stemming from practices, customs and traditions which were integral to the distinctive African cultures. It is based mainly on the oral transfer of knowledge, with the risk of losing critical knowledge. Moreover, practices differ according to the regions and the availability of medicinal plants. Therefore, it is necessary to compile tacit, disseminated and complex knowledge from various Tradi-Practitioners (TP) in order to determine interesting patterns for treating a given disease. Knowledge engineering methods for traditional medicine are useful to model suitably complex information needs, formalize knowledge of domain experts and highlight the effective practices for their integration to conventional medicine. The work described in this paper presents an approach which addresses two issues. First it aims at proposing a formal representation model of ATM knowledge and practices to facilitate their sharing and reusing. Then, it aims at providing a visual reasoning mechanism for selecting best available procedures and medicinal plants to treat diseases. The approach is based on the use of the Delphi method for capturing knowledge from various experts which necessitate reaching a consensus. Conceptual graph formalism is used to model ATM knowledge with visual reasoning capabilities and processes. The nested conceptual graphs are used to visually express the semantic meaning of Computational Tree Logic (CTL) constructs that are useful for formal specification of temporal properties of ATM domain knowledge. Our approach presents the advantage of mitigating knowledge loss with conceptual development assistance to improve the quality of ATM care (medical diagnosis and therapeutics), but also patient safety (drug monitoring)

    Locating bugs without looking back

    Get PDF
    Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the code. However, current state-of-the-art IR approaches rely on project history, in particular previously fixed bugs or previous versions of the source code. We present a novel approach that directly scores each current file against the given report, thus not requiring past code and reports. The scoring method is based on heuristics identified through manual inspection of a small sample of bug reports. We compare our approach to eight others, using their own five metrics on their own six open source projects. Out of 30 performance indicators, we improve 27 and equal 2. Over the projects analysed, on average we find one or more affected files in the top 10 ranked files for 76% of the bug reports. These results show the applicability of our approach to software projects without history

    The 'what' and 'how' of learning in design, invited paper

    Get PDF
    Previous experiences hold a wealth of knowledge which we often take for granted and use unknowingly through our every day working lives. In design, those experiences can play a crucial role in the success or failure of a design project, having a great deal of influence on the quality, cost and development time of a product. But how can we empower computer based design systems to acquire this knowledge? How would we use such systems to support design? This paper outlines some of the work which has been carried out in applying and developing Machine Learning techniques to support the design activity; particularly in utilising previous designs and learning the design process
    corecore