85 research outputs found

    Improving the tokenisation of identifier names

    Get PDF
    Identifier names are the main vehicle for semantic information during program comprehension. For tool-supported program comprehension tasks, including concept location and requirements traceability, identifier names need to be tokenised into their semantic constituents. In this paper we present an approach to the automated tokenisation of identifier names that improves on existing techniques in two ways. First, it improves the tokenisation accuracy for single-case identifier names and for identifier names containing digits, which existing techniques largely ignore. Second, performance gains over existing techniques are achieved using smaller oracles, making the approach easier to deploy. Accuracy was evaluated by comparing our algorithm to manual tokenizations of 28,000 identifier names drawn from 60 well-known open source Java projects totalling 16.5 MSLOC. Moreover, the projects were used to perform a study of identifier tokenisation features (single case, camel case, use of digits, etc.) per object-oriented construct (class names, method names, local variable names, etc.), thus providing an insight into naming conventions in industrial-scale object-oriented code. Our tokenisation tool and datasets are publicly available

    XTraQue: traceability for product line systems

    Get PDF
    Product line engineering has been increasingly used to support the development and deployment of software systems that share a common set of features and are developed based on the reuse of core assets. The large number and heterogeneity of documents generated during the development of product line systems may cause difficulties to identify common and variable aspects among applications, and to reuse core assets that are available under the product line. In this paper, we present a traceability approach for product line systems. Traceability has been recognised as an important task in in software system development. Traceability relations can improve the quality of the product being developed and reduce development time and cost. We present a rule-based approach to support automatic generation of traceability relations between feature-based object-oriented documents. The traceability rules used in our work are classified into two groups namely (a) direct rules, which support the creation of traceability relations that do not depend on the existence of other relations, and (b) indirect rules, which require the existence of previously generated relations. The documents are represented in XML and the rules are represented in an extension of XQuery. A prototype tool called XTraQue has been implemented. This tool, together with a mobile phone product line case study, has been used to demonstrate and evaluate our work in various experiments. The results of these experiments are encouraging and comparable with other approaches that support automatic generation of traceability relations

    Automated migration of build scripts using dynamic analysis and search-based refactoring

    Get PDF
    The efficiency of a build system is an important factor for developer productivity. As a result, developer teams have been increasingly adopting new build systems that allow higher build parallelization. However, migrating the existing legacy build scripts to new build systems is a tedious and error-prone process. Unfortunately, there is insufficient support for automated migration of build scripts, making the migration more problematic. We propose the first dynamic approach for automated migration of build scripts to new build systems. Our approach works in two phases. First, from a set of execution traces, we synthesize build scripts that accurately capture the intent of the original build. The synthesized build scripts are typically long and hard to maintain. Second, we apply refactorings that raise the abstraction level of the synthesized scripts (e.g., introduce functions for similar fragments). As different refactoring sequences may lead to different build scripts, we use a search-based approach that explores various sequences to identify the best (e.g., shortest) build script. We optimize search-based refactoring with partial-order reduction to faster explore refactoring sequences. We implemented the proposed two phase migration approach in a tool called METAMORPHOSIS that has been recently used at Microsoft

    Conclave: ontology-driven measurement of semantic relatedness between source code elements and problem domain concepts

    Get PDF
    Software maintainers are often challenged with source code changes to improve software systems, or eliminate defects, in unfamiliar programs. To undertake these tasks a sufficient understanding of the system (or at least a small part of it) is required. One of the most time consuming tasks of this process is locating which parts of the code are responsible for some key functionality or feature. Feature (or concept) location techniques address this problem. This paper introduces Conclave, an environment for software analysis, and in particular the Conclave-Mapper tool that provides a feature location facility. This tool explores natural language terms used in programs (e.g. function and variable names), and using textual analysis and a collection of Natural Language Processing techniques, computes synonymous sets of terms. These sets are used to score relatedness between program elements, and search queries or problem domain concepts, producing sorted ranks of program elements that address the search criteria, or concepts. An empirical study is also discussed to evaluate the underlying feature location technique.info:eu-repo/semantics/publishedVersio

    Gitana: a SQL-based Git Repository Inspector

    Get PDF
    International audienceSoftware development projects are notoriously complex and difficult to deal with. Several support tools such as issue tracking, code review and Source Control Management (SCM) systems have been introduced in the past decades to ease development activities. While such tools efficiently track the evolution of a given aspect of the project (e.g., bug reports), they provide just a partial view of the project and often lack of advanced querying mechanisms limiting themselves to command line or simple GUI support. This is particularly true for projects that rely on Git, the most popular SCM system today. In this paper, we propose a conceptual schema for Git and an approach that, given a Git repository, exports its data to a relational database in order to (1) promote data integration with other existing SCM tools and (2) enable writing queries on Git data using standard SQL syntax. To ensure efficiency, our approach comes with an incremental propagation mechanism that refreshes the database content with the latest modifications. We have implemented our approach in Gitana, an open-source tool available on GitHub

    Search based software engineering: Trends, techniques and applications

    Get PDF
    © ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available from the link below.In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives. This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.EPSRC and E

    A Hybrid Model for Dynamic Simulation of Custom Software Projects in a Multiproject Environment

    Get PDF
    This paper describes SimHiProS, a hybrid simulation model of software production. The goal is to gain insight on the dynamics induced by resource sharing in multiproject management. In order to achieve it the hierarchy of decisions in a multiproject organization is modeled and some resource allocation methods based on algorithms from the OR/AI domain are used. Other critical issues such as the hybrid nature of software production and the effects of measurement and control are also incorporated in the model. Some first results are presented.Ministerio de Ciencia e Innovación TIN2004-06689-C03-03Ministerio de Ciencia e Innovación TIN2007-67843-C06-0

    A fully automatic gridding method for cDNA microarray images

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Processing cDNA microarray images is a crucial step in gene expression analysis, since any errors in early stages affect subsequent steps, leading to possibly erroneous biological conclusions. When processing the underlying images, accurately separating the sub-grids and spots is extremely important for subsequent steps that include segmentation, quantification, normalization and clustering.</p> <p>Results</p> <p>We propose a parameterless and fully automatic approach that first detects the sub-grids given the entire microarray image, and then detects the locations of the spots in each sub-grid. The approach, first, detects and corrects rotations in the images by applying an affine transformation, followed by a polynomial-time optimal multi-level thresholding algorithm used to find the positions of the sub-grids in the image and the positions of the spots in each sub-grid. Additionally, a new validity index is proposed in order to find the correct number of sub-grids in the image, and the correct number of spots in each sub-grid. Moreover, a refinement procedure is used to correct possible misalignments and increase the accuracy of the method.</p> <p>Conclusions</p> <p>Extensive experiments on real-life microarray images and a comparison to other methods show that the proposed method performs these tasks fully automatically and with a very high degree of accuracy. Moreover, unlike previous methods, the proposed approach can be used in various type of microarray images with different resolutions and spot sizes and does not need any parameter to be adjusted.</p
    corecore