12 research outputs found

    Breaking Bad? Semantic versioning and impact of breaking changes in Maven Central

    Get PDF
    ust like any software, libraries evolve to incorporate new features, bug fixes, security patches, and refactorings. However, when a library evolves, it may break the contract previously established with its clients by introducing Breaking Changes (BCs) in its API. These changes might trigger compile-time, link-time, or run-time errors in client code. As a result, clients may hesitate to upgrade their dependencies, raising security concerns and making future upgrades even more difficult.Understanding how libraries evolve helps client developers to know which changes to expect and where to expect them, and library developers to understand how they might impact their clients. In the most extensive study to date, Raemaekers et al. investigate to what extent developers of Java libraries hosted on the Maven Central Repository (MCR) follow semantic versioning conventions to signal the introduction of BCs and how these changes impact client projects. Their results suggest that BCs are widespread without regard for semantic versioning, with a significant impact on this http URL this paper, we conduct an external and differentiated replication study of their work. We identify and address some limitations of the original protocol and expand the analysis to a new corpus spanning seven more years of the MCR. We also present a novel static analysis tool for Java bytecode, Maracas, which provides us with: (i) the set of all BCs between two versions of a library; and (ii) the set of locations in client code impacted by individual BCs. Our key findings, derived from the analysis of 119, 879 library upgrades and 293, 817 clients, contrast with the original study and show that 83.4% of these upgrades do comply with semantic versioning. Furthermore, we observe that the tendency to comply with semantic versioning has significantly increased over time. Finally, we find that most BCs affect code that is not used by any client, and that only 7.9% of all clients are affected by BCs. These findings should help (i) library developers to understand and anticipate the impact of their changes; (ii) libra

    Using Genetic Programming to Build Self-Adaptivity into Software-Defined Networks

    Full text link
    Self-adaptation solutions need to periodically monitor, reason about, and adapt a running system. The adaptation step involves generating an adaptation strategy and applying it to the running system whenever an anomaly arises. In this article, we argue that, rather than generating individual adaptation strategies, the goal should be to adapt the control logic of the running system in such a way that the system itself would learn how to steer clear of future anomalies, without triggering self-adaptation too frequently. While the need for adaptation is never eliminated, especially noting the uncertain and evolving environment of complex systems, reducing the frequency of adaptation interventions is advantageous for various reasons, e.g., to increase performance and to make a running system more robust. We instantiate and empirically examine the above idea for software-defined networking -- a key enabling technology for modern data centres and Internet of Things applications. Using genetic programming,(GP), we propose a self-adaptation solution that continuously learns and updates the control constructs in the data-forwarding logic of a software-defined network. Our evaluation, performed using open-source synthetic and industrial data, indicates that, compared to a baseline adaptation technique that attempts to generate individual adaptations, our GP-based approach is more effective in resolving network congestion, and further, reduces the frequency of adaptation interventions over time. In addition, we show that, for networks with the same topology, reusing over larger networks the knowledge that is learned on smaller networks leads to significant improvements in the performance of our GP-based adaptation approach. Finally, we compare our approach against a standard data-forwarding algorithm from the network literature, demonstrating that our approach significantly reduces packet loss.Comment: arXiv admin note: text overlap with arXiv:2205.0435

    Visualization and analysis of software clones

    Get PDF
    Code clones are identical or similar fragments of code in a software system. Simple copy-paste programming practices of developers, reusing existing code fragments instead of implementing from the scratch, limitations of both programming languages and developers are the primary reasons behind code cloning. Despite the maintenance implications of clones, it is not possible to conclude that cloning is harmful because there are also benefits in using them (e.g. faster and independent development). As a result, researchers at least agree that clones need to be analyzed before aggressively refactoring them. Although a large number of state-of-the-art clone detectors are available today, handling raw clone data is challenging due to the textual nature and large volume. To address this issue, we propose a framework for large-scale clone analysis and develop a maintenance support environment based on the framework called VisCad. To manage the large volume of clone data, VisCad employs the Visual Information Seeking Mantra: overview first, zoom and filter, then provide details-on-demand. With VisCad users can analyze and identify distinctive code clones through a set of visualization techniques, metrics covering different clone relations and data filtering operations. The loosely coupled architecture of VisCad allows users to work with any clone detection tool that reports source-coordinates of the found clones. This yields the opportunity to work with the clone detectors of choice, which is important because each clone detector has its own strengths and weaknesses. In addition, we extend the support for clone evolution analysis, which is important to understand the cause and effect of changes at the clone level during the evolution of a software system. Such information can be used to make software maintenance decisions like when to refactor clones. We propose and implement a set of visualizations that can allow users to analyze the evolution of clones from a coarse grain to a fine grain level. Finally, we use VisCad to extract both spatial and temporal clone data to predict changes to clones in a future release/revision of the software, which can be used to rank clone classes as another means of handling a large volume of clone data. We believe that VisCad makes clone comprehension easier and it can be used as a test-bed to further explore code cloning, necessary in building a successful clone management system

    The Software Heritage License Dataset (2022 Edition)

    Full text link
    Context: When software is released publicly, it is common to include with it either the full text of the license or licenses under which it is published, or a detailed reference to them. Therefore public licenses, including FOSS (free, open source software) licenses, are usually publicly available in source code repositories.Objective: To compile a dataset containing as many documents as possible that contain the text of software licenses, or references to the license terms. Once compiled, characterize the dataset so that it can be used for further research, or practical purposes related to license analysis.Method: Retrieve from Software Heritage-the largest publicly available archive of FOSS source code-all versions of all files whose names are commonly used to convey licensing terms. All retrieved documents will be characterized in various ways, using automated and manual analyses.Results: The dataset consists of 6.9 million unique license files. Additional metadata about shipped license files is also provided, making the dataset ready to use in various contexts, including: file length measures, MIME type, SPDX license (detected using ScanCode), and oldest appearance. The results of a manual analysis of 8102 documents is also included, providing a ground truth for further analysis. The dataset is released as open data as an archive file containing all deduplicated license files, plus several portable CSV files with metadata, referencing files via cryptographic checksums.Conclusions: Thanks to the extensive coverage of Software Heritage, the dataset presented in this paper covers a very large fraction of all software licenses for public code. We have assembled a large body of software licenses, characterized it quantitatively and qualitatively, and validated that it is mostly composed of licensing information and includes almost all known license texts. The dataset can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. It can also be used in practice to improve tools detecting licenses in source code

    Assessing and improving the quality of model transformations

    Get PDF
    Software is pervading our society more and more and is becoming increasingly complex. At the same time, software quality demands remain at the same, high level. Model-driven engineering (MDE) is a software engineering paradigm that aims at dealing with this increasing software complexity and improving productivity and quality. Models play a pivotal role in MDE. The purpose of using models is to raise the level of abstraction at which software is developed to a level where concepts of the domain in which the software has to be applied, i.e., the target domain, can be expressed e??ectively. For that purpose, domain-speci??c languages (DSLs) are employed. A DSL is a language with a narrow focus, i.e., it is aimed at providing abstractions speci??c to the target domain. This makes that the application of models developed using DSLs is typically restricted to describing concepts existing in that target domain. Reuse of models such that they can be applied for di??erent purposes, e.g., analysis and code generation, is one of the challenges that should be solved by applying MDE. Therefore, model transformations are typically applied to transform domain-speci??c models to other (equivalent) models suitable for di??erent purposes. A model transformation is a mapping from a set of source models to a set of target models de??ned as a set of transformation rules. MDE is gradually being adopted by industry. Since MDE is becoming more and more important, model transformations are becoming more prominent as well. Model transformations are in many ways similar to traditional software artifacts. Therefore, they need to adhere to similar quality standards as well. The central research question discoursed in this thesis is therefore as follows. How can the quality of model transformations be assessed and improved, in particular with respect to development and maintenance? Recall that model transformations facilitate reuse of models in a software development process. We have developed a model transformation that enables reuse of analysis models for code generation. The semantic domains of the source and target language of this model transformation are so far apart that straightforward transformation is impossible, i.e., a semantic gap has to be bridged. To deal with model transformations that have to bridge a semantic gap, the semantics of the source and target language as well as possible additional requirements should be well understood. When bridging a semantic gap is not straightforward, we recommend to address a simpli??ed version of the source metamodel ??rst. Finally, the requirements on the transformation may, if possible, be relaxed to enable automated model transformation. Model transformations that need to transform between models in di??erent semantic domains are expected to be more complex than those that merely transform syntax. The complexity of a model transformation has consequences for its quality. Quality, in general, is a subjective concept. Therefore, quality can be de??ned in di??erent ways. We de??ned it in the context of model transformation. A model transformation can either be considered as a transformation de??nition or as the process of transforming a source model to a target model. Accordingly, model transformation quality can be de??ned in two di??erent ways. The quality of the de??nition is referred to as its internal quality. The quality of the process of transforming a source model to a target model is referred to as its external quality. There are also two ways to assess the quality of a model transformation (both internal and external). It can be assessed directly, i.e., by performing measurements on the transformation de??nition, or indirectly, i.e., by performing measurements in the environment of the model transformation. We mainly focused on direct assessment of internal quality. However, we also addressed external quality and indirect assessment. Given this de??nition of quality in the context of model transformations, techniques can be developed to assess it. Software metrics have been proposed for measuring various kinds of software artifacts. However, hardly any research has been performed on applying metrics for assessing the quality of model transformations. For four model transformation formalisms with di??fferent characteristics, viz., for ASF+SDF, ATL, Xtend, and QVTO, we de??ned sets of metrics for measuring model transformations developed with these formalisms. While these metric sets can be used to indicate bad smells in the code of model transformations, they cannot be used for assessing quality yet. A relation has to be established between the metric sets and attributes of model transformation quality. For two of the aforementioned metric sets, viz., the ones for ASF+SDF and for ATL, we conducted an empirical study aiming at establishing such a relation. From these empirical studies we learned what metrics serve as predictors for di??erent quality attributes of model transformations. Metrics can be used to quickly acquire insights into the characteristics of a model transformation. These insights enable increasing the overall quality of model transformations and thereby also their maintainability. To support maintenance, and also development in a traditional software engineering process, visualization techniques are often employed. For model transformations this appears as a feasible approach as well. Currently, however, there are few visualization techniques available tailored towards analyzing model transformations. One of the most time-consuming processes during software maintenance is acquiring understanding of the software. We expect that this holds for model transformations as well. Therefore, we presented two complementary visualization techniques for facilitating model transformation comprehension. The ??rst-technique is aimed at visualizing the dependencies between the components of a model transformation. The second technique is aimed at analyzing the coverage of the source and target metamodels by a model transformation. The development of the metric sets, and in particular the empirical studies, have led to insights considering the development of model transformations. Also, the proposed visualization techniques are aimed at facilitating the development of model transformations. We applied the insights acquired from the development of the metric sets as well as the visualization techniques in the development of a chain of model transformations that bridges a number of semantic gaps. We chose to solve this transformational problem not with one model transformation, but with a number of smaller model transformations. This should lead to smaller transformations, which are more understandable. The language on which the model transformations are de??ned, was subject to evolution. In particular the coverage visualization proved to be bene??cial for the co-evolution of the model transformations. Summarizing, we de??ned quality in the context of model transformations and addressed the necessity for a methodology to assess it. Therefore, we de??ned metric sets and performed empirical studies to validate whether they serve as predictors for model transformation quality. We also proposed a number of visualizations to increase model transformation comprehension. The acquired insights from developing the metric sets and the empirical studies, as well as the visualization tools, proved to be bene??cial for developing model transformations

    Assisting Software Developers With License Compliance

    Get PDF
    Open source licensing determines how open source systems are reused, distributed, and modified from a legal perspective. While it facilitates rapid development, it can present difficulty for developers in understanding due to the legal language of these licenses. Because of misunderstandings, systems can incorporate licensed code in a way that violates the terms of the license. Such incompatibilities between licensing can result in the inability to reuse a particular library without either relicensing the system or redesigning the architecture of the system. Prior efforts have predominantly focused on license identification or understanding the underlying phenomena without reasoning about compatibility in a broad scale. The work in this dissertation first investigates the rationale of developers and identifies the areas that developers struggle with respect to free/open source software licensing. First, we investigate the diffusion of licenses and the prevalence of license changes in a large scale empirical study of 16,221 Java systems. We observed a clear lack of traceability and a lack of standardized licensing that led to difficulties and confusion for developers trying to reuse source code. We further investigated the difficulty by surveying the developers of the systems with license changes to understand why they first adopted a license and then changed licenses. Additionally, we performed an analysis on issue trackers and legal mailing lists to extract licensing bugs. From these works, we identified key areas in which developers struggled and needed support. While developers need support to identify license incompatibilities and understand both the cause and implications of the incompatibilities, we observed that state-of-the-art license identification tools did not identify license exceptions. Since these exceptions directly modify the license terms (either the permissions granted by the license or the restrictions imposed by the license), we proposed an approach to complement current license identification techniques in order to classify license exceptions. The approach relies on supervised machine learners to classify the licensing text to identify the particular license exceptions or the lack of a license exception. Subsequently, we built an infrastructure to assist developers with evaluating license compliance warnings for their system. The infrastructure evaluates compliance across the dependency tree of a system to ensure it is compliant with all of the licenses of the dependencies. When an incompatibility is present, it notes the specific library/libraries and the conflicting license(s) so that the developers can investigate these compliance warnings, which would prevent distribution of their software, in their system. We conduct a study on 121,094 open source projects spanning 6 programming languages, and we demonstrate that the infrastructure is able to identify license incompatibilities between these projects and their dependencies

    Utilizing traceable software artifacts to improve bug localization

    Get PDF
    Die Entwicklung von Softwaresystemen ist eine komplexe Aufgabe. Qualitätssicherung versucht auftretenden Softwarefehler (bugs) in Systemen zu vermeiden, jedoch können Fehler nie ausgeschlossen werden. Sobald ein Softwarefehler entdeckt wird, wird typischerweise ein Fehlerbericht (bug report) erstellt. Dieser dient als Ausgangspunkt für den Entwickler den Fehler im Quellcode der Software zu finden und zu beheben (bug fixing). Fehlerberichte sowie weitere Softwareartefakte, z.B. Anforderungen und der Quellcode selbst, werden in Software Repositories abgelegt. Diese erlauben die Artefakte mit trace links zur Nachvollziehbarkeit (traceability) zu verknüpfen. Oftmals ist die Erstellung der trace links im Entwicklungsprozess vorgeschrieben. Dazu zählen u.a. die Luftfahrt- und Automobilindustrie, sowie die Entwicklung von medizinischen Geräten. Das Auffinden von Softwarefehlern in großen Systemen mit tausenden Artefakten ist eine anspruchsvolle, zeitintensive und fehleranfällige Aufgabe, welche eine umfangreiche Projektkenntnis erfordert. Deswegen wird seit Jahren aktiv an der Automatisierung dieses Prozesses geforscht. Weiterhin wird die manuelle Erstellung und Pflege von trace links als Belastung empfunden und sollte weitgehend automatisiert werden. In dieser Arbeit wird ein neuartiger Algorithmus zum Auffinden von Softwarefehlern vorgestellt, der aktiv die erstellten trace links ausnutzt. Die Artefakte und deren Beziehungen dienen zur Erstellung eines Nachvollziehbarkeitsgraphen, welcher analysiert wird um fehlerhafte Quellcodedateien anhand eines Fehlerberichtes zu finden. Jedoch muss angenommen werden, dass nicht alle notwendigen trace links zwischen den Softwareartefakten eines Projektes erstellt wurden. Deswegen wird ein vollautomatisierter, projektunabhängiger Ansatz vorgestellt, der diese fehlenden trace links erstellt (augmentation). Die Grundlage zur Entwicklung dieses Algorithmus ist der typische Entwicklungsprozess eines Softwareprojektes. Die entwickelten Ansätze wurden mit mehr als 32.000 Fehlerberichten von 27 Open-Source Projekten evaluiert und die Ergebnisse zeigen, dass die Einbeziehung von traceability signifikant das Auffinden von Fehlern im Quellcode verbessert. Weiterhin kann der entwickelte Augmentation Algorithmus zuverlässig fehlende trace links erstellen.The development of software systems is a very complex task. Quality assurance tries to prevent defects – software bugs – in deployed systems, but it is impossible to avoid bugs all together, especially during development. Once a bug is observed, typically a bug report is written. It guides the responsible developer to locate the bug in the project's source code, and once found to fix it. The bug reports, along with other development artifacts such as requirements and the source code are stored in software repositories. The repositories also allow to create relationships – trace links – among contained artifacts. Establishing this traceability is demanded in many domains, such as safety related ones like the automotive and aviation industry, or in development of medical devices. However, in large software systems with thousands of artifacts, especially source code files, manually locating a bug is time consuming, error-prone, and requires extensive knowledge of the project. Thus, automating the bug localization process is actively researched since many years. Further, manually creating and maintaining trace links is often considered as a burden, and there is the need to automate this task as well. Multiple studies have shown, that traceability is beneficial for many software development tasks. This thesis presents a novel bug localization algorithm utilizing traceability. The project's artifacts and trace links are used to create a traceability graph. This graph is then analyzed to locate defective source code files for a given bug report. Since the existing trace link set of a project is possibly incomplete, another algorithm is prosed to augment missing links. The algorithm is fully automated, project independent, and derived from a project's development workflow. An evaluation on more than 32,000 bug reports from 27 open-source projects shows, that incorporating traceability information into bug localization significantly improves the bug localization performance compared to two state of the art algorithms. Further, the trace link augmentation approach reliably constructs missing links and therefore simplifies the required trace maintenance

    Using Clustering Techniques to Guide Refactoring of Object-Oriented Classes

    No full text
    Much of the cost of software development is maintenance. Well structured software tends to be cheaper to maintain than poorly structured software, because it is easier to analyze and modify. The research described in this thesis concentrates on determining how to improve the structure of object-oriented classes, the fundamental unit of organization for object-oriented programs. Some refactoring tools can mechanically restructure object-oriented classes, given the appropriate inputs regarding what attributes and methods belong in the revised classes. We address the research question of determining what belongs in those classes, i.e., determining which methods and attributes most belong together and how those methods and attributes can be organized into classes. Clustering techniques can be useful for grouping entities that belong together; however, doing so requires matching an appropriate algorithm to the domain task and choosing appropriate inputs. This thesis identifies clustering techniques suitable for determining the redistribution of existing attributes and methods among object-oriented classes, and discusses the strengths and weaknesses of these techniques. It then describes experiments using these techniques as the basis for refactoring open source Java classes and the changes in the class quality metrics that resulted. Based on these results and on others reported in the literature, it recommends particular clustering techniques for particular refactoring problems. These clustering techniques have been incorporated into an open source refactoring tool that provides low-cost assistance to programmers maintaining object-oriented classes. Such maintenance can reduce the total cost of software development
    corecore