9 research outputs found

    Classification of changes in API evolution

    Get PDF
    Applications typically communicate with each other, accessing and exposing data and features by using Application Programming Interfaces (APIs). Even though API consumers expect APIs to be steady and well established, APIs are prone to continuous changes, experiencing different evolutive phases through their lifecycle. These changes are of different types, caused by different needs and are affecting consumers in different ways. In this paper, we identify and classify the changes that often happen to APIs, and investigate how all these changes are reflected in the documentation, release notes, issue tracker and API usage logs. The analysis of each step of a change, from its implementation to the impact that it has on API consumers, will help us to have a bigger picture of API evolution. Thus, we review the current state of the art in API evolution and, as a result, we define a classification framework considering both the changes that may occur to APIs and the reasons behind them. In addition, we exemplify the framework using a software platform offering a Web API, called District Health Information System (DHIS2), used collaboratively by several departments of World Health Organization (WHO).Peer ReviewedPostprint (author's final draft

    Software Development Analytics in Practice: A Systematic Literature Review

    Full text link
    Context:Software Development Analytics is a research area concerned with providing insights to improve product deliveries and processes. Many types of studies, data sources and mining methods have been used for that purpose. Objective:This systematic literature review aims at providing an aggregate view of the relevant studies on Software Development Analytics in the past decade (2010-2019), with an emphasis on its application in practical settings. Method:Definition and execution of a search string upon several digital libraries, followed by a quality assessment criteria to identify the most relevant papers. On those, we extracted a set of characteristics (study type, data source, study perspective, development life-cycle activities covered, stakeholders, mining methods, and analytics scope) and classified their impact against a taxonomy. Results:Source code repositories, experimental case studies, and developers are the most common data sources, study types, and stakeholders, respectively. Product and project managers are also often present, but less than expected. Mining methods are evolving rapidly and that is reflected in the long list identified. Descriptive statistics are the most usual method followed by correlation analysis. Being software development an important process in every organization, it was unexpected to find that process mining was present in only one study. Most contributions to the software development life cycle were given in the quality dimension. Time management and costs control were lightly debated. The analysis of security aspects suggests it is an increasing topic of concern for practitioners. Risk management contributions are scarce. Conclusions:There is a wide improvement margin for software development analytics in practice. For instance, mining and analyzing the activities performed by software developers in their actual workbench, the IDE

    Exception-aware Lifecycle Model Construction for Framework APIs

    Full text link
    The implementation of complex software systems usually depends on low-level frameworks or third-party libraries. During their evolution, the APIs adding and removing behaviors may cause unexpected compatibility problems. So, precisely analyzing and constructing the framework/ library's API lifecycle model is of great importance. Existing works have proposed the API existence-changing model for defect detection, while not considering the influence of semantic changes in APIs. In some cases, developers will not remove or deprecate APIs but modify their semantics by adding, removing, or modifying their exception-thrown code, which may bring potential defects to upper-level code. Therefore, besides the API existence model, it is also necessary for developers to be concerned with the exception-related code evolution in APIs, which requires the construction of exception-aware API lifecycle models for framework/library projects. To achieve automatic exception-aware API lifecycle model construction, this paper adopts a static analysis technique to extract exception summary information in the framework API code and adopts a multi-step matching strategy to obtain the changing process of exceptions. Then, it generates exception-aware API lifecycle models for the given framework/library project. With this approach, the API lifecycle extraction tool, JavaExP, is implemented, which is based on Java bytecode analysis. Compared to the state-of-the-art tool, JavaExP achieves both a higher F1 score (+60%) and efficiency (+7x), whose precision of exception matching and changing results is 98%. Compared to the exception-unaware API lifecycle modeling on 60 versions, JavaExp can identify 18% times more API changes. Among the 75,433 APIs under analysis, 20% of APIs have changed their exception-throwing behavior at least once after API introduction, which may bring many hidden compatibility issues.Comment: in Chinese languag

    The Impact of Operating Systems and Environments on Build Results

    Get PDF
    L’intégration continue (IC) est une pratique d’ingénierie logicielle permettant d’identifier et de corriger les fautes logicielles le plus rapidement possible après l’intégration d’un changement de code dans système de contrôle de versions. L’objectif principal de l’IC est d’informer les développeurs des conséquences des changements effectués dans le code. L’IC s’appuie sur différents systèmes d’exploitation et environnements d’exécution pour vérifier si un système fonctionne toujours après l’intégration des changements. Ainsi, de nombreux "builds" sont créés, alors que seulement quelques-uns révèlent de nouvelles fautes. En d’autres termes, un phénomène d’inflation des builds se produit, où le nombre croissant de builds a un rendement décroissant. Cette inflation rend l’interprétation des résultats des builds difficile, car l’inflation augmente l’importance de certaines fautes, alors qu’elle cache l’importance d’autres. Cette thèse fait progresser notre compréhension de l’impact des systèmes d’exploitation et des environnements d’exécution sur les fautes des builds et le biais potentiel encouru à cause de l’inflation des builds par une étude à grande échelle de 30 millions de builds de l’écosystème CPAN. Nous choisissons CPAN parce que CPAN fournit un riche ensemble de données pour l’analyse automatisée des builds sur des douzaines d’environnements (versions de Perl) et systèmes d’exploitation. Cette thèse rapporte une analyse quantitative et qualitative sur les fautes dans les builds pour classer ces fautes et trouver la raison de leur apparition. Nous observons : (1) l’évolution des fautes des builds au fil du temps et rapportons que plus de builds sont effectués, plus le pourcentage de fautes de builds diminue, (2) différents environnements et systèmes d’exploitation mettent en avant différentes fautes, (3) les résultats des builds doivent être filtrés pour identifier des fautes fiables, et (4) la plupart des fautes des builds sont dus à leur dépendance à l’API. Les chercheurs et les praticiens devraient tenir compte de l’impact de l’inflation des builds lorsqu’ils analysent ou exécutent des builds.----------ABSTRACT: Continuous Integration (CI) is a software engineering practice to identify and correct a defect as soon as possible after a code change has been integrated into the version control system. The main purpose of CI is to give developers a quick feedback of code changes. These changes build on different OSes and runtime environments to check backward compatibility as well as to check if the product still works with the new changes. So, many builds are performed, while only a few of them can identify new failures. In other words, a phenomenon of build inflation can be observed, where the increasing number of builds has diminishing returns in terms of identified failures vs. costs of running the builds. This inflation makes interpreting build results challenging as it increases the importance of some failures, while it hides the importance of others. This thesis advances our understanding of the impact of OSes and runtime environments on build failures and build inflation through a large-scale study of 30 million builds of the CPAN ecosystem. We choose CPAN because CPAN provides a rich data set for the analysis of automated builds on dozens of environments (Perl versions) and operating systems. This thesis performs quantitative and qualitative analysis on build failures to classify these failures and find out the reason of their occurrence. We observe: (1) the evolution of build failures over time and report that while more builds are being performed, the percentage of them identifying a failure drops, (2) different OSes and environments are not equally reliable, (3) the build results of CI must be filtered to identify reliable failing data, (4) and most build failures are due to API dependency. Researchers and practitioners should consider the impact of build inflation when they are analyzing and-or performing builds

    Dependency Management 2.0 – A Semantic Web Enabled Approach

    Get PDF
    Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis
    corecore