6,542 research outputs found

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Sharing Social Research Data in Ireland: A Practical Tool

    Get PDF
    Your data is valuable and has an importance outside your own original project. Allowing other researchers to reuse your data maximises the impact of your work, and benefits both the scholarly community and society in general. Sharing your data allows other researchers to use your material in ways you may not have thought of, or may not have been able to do within your research project. It allows other researchers to replicate your findings, to verify your results, test your instruments and compare with other studies. It also allows them to use your work to expand knowledge in important areas. It provides value for money by reducing duplication and advancing knowledge and also has a significant value in education, as it allows both graduate and under-graduate students to develop their skills in qualitative and quantitative research by using high-quality data in their studies, without having to conduct their own surveys.Archiving your data also guarantees its long-term preservation and accessibility. As many research teams are assembled only for individual projects, long-term preservation and access to research data collections can only be guaranteed if they are deposited in an archive which will manage them, ensure access and provide user-support. In addition, the archives will ensure that the datasets do not become obsolescent or corrupted.Finally, increasingly funders require that you make your research data available as a condition of their funding your research, so that other researchers can test your findings, and use your data to extend research in your area. Equally, publishers are also specifying access to research data as a condition for publication

    A matching approach to business services and software services

    Get PDF
    Recent studies have shown that service-oriented architecture (SOA) has the potential to revive enterprise legacy systems (Cai et al., 2011; Gaševic and Hatala, 2010; De Castro et al., 2011; Chengjun, 2008; Elgedawy, 2009; Tian et al., 2007; Chen et al., 2009; Zhang et al., 2006; Sindhgatta and Ponnalagu, 2008; Khadka, 2011), making their continued service in the corporate world viable. In the process of reengineering legacy systems to service-oriented architecture, some software services extracted in legacy system can be reused to implement business services in target systems. In order to achieve efficient reuse to software services, a matching approach is proposed to extract the software services related to specified business services, where service semantics and structure similarity measures are integrated to evaluate the similarity degree between business service and software services. Experiments indicate that the approach can efficiently map business services to relevant software services, and then legacy systems can be reused as much as possible

    Design and Implementation of the UniProt Website

    Get PDF
    The UniProt consortium is the main provider of protein sequence and annotation data for much of the life sciences community. The "www.uniprot.org":http://www.uniprot.org website is the primary access point to this data and to documentation and basic tools for the data. This paper discusses the design and implementation of the new website, which was released in July 2008, and shows how it improves data access for users with different levels of experience, as well as to machines for programmatic access

    Software Evolution Understanding: Automatic Extraction of Software Identifiers Map for Object-Oriented Software Systems

    Get PDF
    Software companies usually develop a set of product variants within the same family that share certain functions and differ in others. Variations across software variants occur to meet different customer requirements. Thus, software product variants evolve overtime to cope with new requirements. A software engineer who deals with this family may find it difficult to understand the evolution scenarios that have taken place over time. In addition, software identifier names are important resources to understand the evolution scenarios in this family. This paper introduces an automatic approach called Juana’s approach to detect the evolution scenario across two product variants at the source code level and identifies the common and unique software identifier names across software variants source code. Juana’s approach refers to common and unique identifier names as a software identifiers map and computes it by comparing software variants to each other. Juana considers all software identifier names such as package, class, attribute, and method. The novelty of this approach is that it exploits common and unique identifier names across the source code of software variants, to understand the evolution scenarios across software family in an efficient way. For validity, Juana was applied on ArgoUML and Mobile Media software variants. The results of this evaluation validate the relevance and the performance of the approach as all evolution scenarios were correctly detected via a software identifiers map

    Toward an Effective Automated Tracing Process

    Get PDF
    Traceability is defined as the ability to establish, record, and maintain dependency relations among various software artifacts in a software system, in both a forwards and backwards direction, throughout the multiple phases of the project’s life cycle. The availability of traceability information has been proven vital to several software engineering activities such as program comprehension, impact analysis, feature location, software reuse, and verification and validation (V&V). The research on automated software traceability has noticeably advanced in the past few years. Various methodologies and tools have been proposed in the literature to provide automatic support for establishing and maintaining traceability information in software systems. This movement is motivated by the increasing attention traceability has been receiving as a critical element of any rigorous software development process. However, despite these major advances, traceability implementation and use is still not pervasive in industry. In particular, traceability tools are still far from achieving performance levels that are adequate for practical applications. Such low levels of accuracy require software engineers working with traceability tools to spend a considerable amount of their time verifying the generated traceability information, a process that is often described as tedious, exhaustive, and error-prone. Motivated by these observations, and building upon a growing body of work in this area, in this dissertation we explore several research directions related to enhancing the performance of automated tracing tools and techniques. In particular, our work addresses several issues related to the various aspects of the IR-based automated tracing process, including trace link retrieval, performance enhancement, and the role of the human in the process. Our main objective is to achieve performance levels, in terms of accuracy, efficiency, and usability, that are adequate for practical applications, and ultimately to accomplish a successful technology transfer from research to industry

    Test Naming Failures. An Exploratory Study of Bad Naming Practices in Test Code

    Get PDF
    Unit tests are a key component during the software development process, helping ensure that a developer\u27s code is functioning as expected. Developers interact with unit tests when trying to understand, maintain, and when updating code. Good test names are essential for making these various processes easier, which is important considering the substantial costs and effort of software maintenance. Despite this, it has been found that the quality of test code is often lacking, specifically when it comes to test names. When a test fails, its name is often the first thing developers will see when trying to fix the failure, therefore it is important that names are of high quality in order to help with the debugging process. The objective of this work was to find anti-patterns having to do with test method names that may have a negative impact on developer comprehension. In order to do this, a grounded theory study was conducted on 12 open-source Java and C# GitHub projects. From this dataset, many patterns were discovered to be common throughout the test code. Some of these patterns fit the necessary criteria of anti-patterns that would probably hinder developer comprehension. With the avoidance of these anti-patterns it is believed that developers will be able to write better test names that can help speed the time to debug errors as test names will be more comprehensive
    corecore