145,392 research outputs found

    Using contexts to extract models from code

    No full text
    Behaviour models facilitate the understanding and analysis of software systems by providing an abstract view of their behaviours and also by enabling the use of validation and verification techniques to detect errors. However, depending on the size and complexity of these systems, constructing models may not be a trivial task, even for experienced developers. Model extraction techniques can automatically obtain models from existing code, thus reducing the effort and expertise required of engineers and helping avoid errors often present in manually constructed models. Existing approaches for model extraction often fail to produce faithful models, either because they only consider static information, which may include infeasible behaviours, or because they are based only on dynamic information, thus relying on observed executions, which usually results in incomplete models. This paper describes a model extraction approach based on the concept of contexts, which are abstractions of concrete states of a program, combining static and dynamic information. Contexts merge some of the advantages of using either type of information and, by their combination, can overcome some of their problems. The approach is partially implemented by a tool called LTS Extractor, which translates information collected from execution traces produced by instrumented Java code to labelled transition systems (LTS), which can be analysed in an existing verification tool. Results from case studies are presented and discussed, showing that, considering a certain level of abstraction and a set of execution traces, the produced models are correct descriptions of the programs from which they were extracted. Thus, they can be used for a variety of analyses, such as program understanding, validation, verification, and evolution

    Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts

    Full text link
    Most of the JavaScript code deployed in the wild has been minified, a process in which identifier names are replaced with short, arbitrary and meaningless names. Minified code occupies less space, but also makes the code extremely difficult to manually inspect and understand. This paper presents Context2Name, a deep learningbased technique that partially reverses the effect of minification by predicting natural identifier names for minified names. The core idea is to predict from the usage context of a variable a name that captures the meaning of the variable. The approach combines a lightweight, token-based static analysis with an auto-encoder neural network that summarizes usage contexts and a recurrent neural network that predict natural names for a given usage context. We evaluate Context2Name with a large corpus of real-world JavaScript code and show that it successfully predicts 47.5% of all minified identifiers while taking only 2.9 milliseconds on average to predict a name. A comparison with the state-of-the-art tools JSNice and JSNaughty shows that our approach performs comparably in terms of accuracy while improving in terms of efficiency. Moreover, Context2Name complements the state-of-the-art by predicting 5.3% additional identifiers that are missed by both existing tools

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Discovering transcriptional modules by Bayesian data integration

    Get PDF
    Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

    Using FCA to Suggest Refactorings to Correct Design Defects

    Get PDF
    Design defects are poor design choices resulting in a hard-to- maintain software, hence their detection and correction are key steps of a\ud disciplined software process aimed at yielding high-quality software\ud artifacts. While modern structure- and metric-based techniques enable\ud precise detection of design defects, the correction of the discovered\ud defects, e.g., by means of refactorings, remains a manual, hence\ud error-prone, activity. As many of the refactorings amount to re-distributing\ud class members over a (possibly extended) set of classes, formal concept\ud analysis (FCA) has been successfully applied in the past as a formal\ud framework for refactoring exploration. Here we propose a novel approach\ud for defect removal in object-oriented programs that combines the\ud effectiveness of metrics with the theoretical strength of FCA. A\ud case study of a specific defect, the Blob, drawn from the\ud Azureus project illustrates our approach

    Context-adaptive learning designs by using semantic web services

    Get PDF
    IMS Learning Design (IMS-LD) is a promising technology aimed at supporting learning processes. IMS-LD packages contain the learning process metadata as well as the learning resources. However, the allocation of resources - whether data or services - within the learning design is done manually at design-time on the basis of the subjective appraisals of a learning designer. Since the actual learning context is known at runtime only, IMS-LD applications cannot adapt to a specific context or learner. Therefore, the reusability is limited and high development costs have to be taken into account to support a variety of contexts. To overcome these issues, we propose a highly dynamic approach based on Semantic Web Services (SWS) technology. Our aim is moving from the current data- and metadata-based to a context-adaptive service-orientated paradigm We introduce semantic descriptions of a learning process in terms of user objectives (learning goals) to abstract from any specific metadata standards and used learning resources. At runtime, learning goals are accomplished by automatically selecting and invoking the services that fit the actual user needs and process contexts. As a result, we obtain a dynamic adaptation to different contexts at runtime. Semantic mappings from our standard-independent process models will enable the automatic development of versatile, reusable IMS-LD applications as well as the reusability across multiple metadata standards. To illustrate our approach, we describe a prototype application based on our principles
    • 

    corecore