18 research outputs found

    Test Clone Detection via Assertion Fingerprints

    Get PDF
    Large software systems require large test suites to achieve high coverage. Test suites often employ closed unit tests that are self-contained and have no input parameters. To achieve acceptable coverage with self-contained unit tests, developers often clone existing tests and reproduce both boilerplate and essential environment setup code as well as assertions. Existing technologies such as parametrized unit tests and theories could mitigate cloning in test suites. These technologies give developers new ways to express refactorings. However, they do not help detect refactorable clones in the first place, which requires tedious manual effort. This thesis proposes a novel technique, assertion fingerprints, for detecting clones based on the set of assertion/fail calls in test methods. Assertion fingerprints encode the control flow around the ordered set of assertions in methods. We have implemented clone set detection using assertion fingerprints and applied it to 10 test suites for open-source Java programs. We provide an empirical study and a qualitative analysis of our results. Assertion fingerprints enable the discovery of test clones that exhibit strong structural similarities and are amenable to refactoring. Our technique delivers an overall 75% true positive rate on our benchmarks and identifies 44% of the benchmark test methods as clones

    Analyzing Clone Evolution for Identifying the Important Clones for Management

    Get PDF
    Code clones (identical or similar code fragments in a code-base) have dual but contradictory impacts (i.e., both positive and negative impacts) on the evolution and maintenance of a software system. Because of the negative impacts (such as high change-proneness, bug-proneness, and unintentional inconsistencies), software researchers consider code clones to be the number one bad-smell in a code-base. Existing studies on clone management suggest managing code clones through refactoring and tracking. However, a software system's code-base may contain a huge number of code clones, and it is impractical to consider all these clones for refactoring or tracking. In these circumstances, it is essential to identify code clones that can be considered particularly important for refactoring and tracking. However, no existing study has investigated this matter. We conduct our research emphasizing this matter, and perform five studies on identifying important clones by analyzing clone evolution history. In our first study we detect evolutionary coupling of code clones by automatically investigating clone evolution history from thousands of commits of software systems downloaded from on-line SVN repositories. By analyzing evolutionary coupling of code clones we identify a particular clone change pattern, Similarity Preserving Change Pattern (SPCP), such that code clones that evolve following this pattern should be considered important for refactoring. We call these important clones the SPCP clones. We rank SPCP clones considering their strength of evolutionary coupling. In our second study we further analyze evolutionary coupling of code clones with an aim to assist clone tracking. The purpose of clone tracking is to identify the co-change (i.e. changing together) candidates of code clones to ensure consistency of changes in the code-base. Our research in the second study identifies and ranks the important co-change candidates by analyzing their evolutionary coupling. In our third study we perform a deeper analysis on the SPCP clones and identify their cross-boundary evolutionary couplings. On the basis of such couplings we separate the SPCP clones into two disjoint subsets. While one subset contains the non-cross-boundary SPCP clones which can be considered important for refactoring, the other subset contains the cross-boundary SPCP clones which should be considered important for tracking. In our fourth study we analyze the bug-proneness of different types of SPCP clones in order to identify which type(s) of code clones have high tendencies of experiencing bug-fixes. Such clone-types can be given high priorities for management (refactoring or tracking). In our last study we analyze and compare the late propagation tendencies of different types of code clones. Late propagation is commonly regarded as a harmful clone evolution pattern. Findings from our last study can help us prioritize clone-types for management on the basis of their tendencies of experiencing late propagations. We also find that late propagation can be considerably minimized by managing the SPCP clones. On the basis of our studies we develop an automatic system called AMIC (Automatic Mining of Important Clones) that identifies the important clones for management (refactoring and tracking) and ranks these clones considering their evolutionary coupling, bug-proneness, and late propagation tendencies. We believe that our research findings have the potential to assist clone management by pin-pointing the important clones to be managed, and thus, considerably minimizing clone management effort

    TOWARDS AN INTEGRATED METAMODEL BASED APPROACH TO SOFTWARE REFACTORING

    Get PDF

    Maßgeschneiderte Produktlinienextraktion

    Get PDF
    Industry faces an increasing number of challenges regarding the functionality, efficiency and reliability of software. A common approach to reduce the linked development effort and respective costs are model-based languages, such as Matlab/Simulink and statecharts. While these languages help companies during development of single systems, the high demand for customized software is an increasing challenge. As a result, variants with high similarity and only slight differences have to be developed in an efficient way. As reimplementation of complex functionality for each variant is no option, copies of existing solutions are often modified for new customers. In the short-run, this so-called clone-and-own approach allows to save costs as existing solutions can easily be reused. However, this approach also involves risks as the relations between the copied systems are rarely documented and errors have to be fixed for each variant in isolation. Thus, with a growing number of potentially large system copies, the resulting maintenance effort can become a problem. To overcome these problems, this thesis contributes an approach to semi-automatically migrate existing model variants to software product lines. These product lines allow to generate all variants from the identified reusable artifacts. As industry uses a variety of different modeling languages, the focus of the approach lies on an easy adaptation for different languages. Furthermore, the approach can be custom-tailored to include domain knowledge or language-specific details in the variability identification. The first step of the approach performs a high-level analysis of variants to identify outliers (e.g., variants that diverged too much from the rest) and clusters of strongly related variants. The second step executes variability mining to identify corresponding low-level variability relations (i.e. the common and varying parts) for these clusters. The third step uses these detailed variability relations for an automatic migration of the compared variants to a delta-oriented software product line. The approach is evaluated using publicly available case studies with industrial background as well as model variants provided by an industry partner.Die Industrie steht einer steigenden Anzahl an Herausforderungen bezüglich der Funktionalität, Effizienz und Zuverlässigkeit von Software gegenüber. Um den damit verbundenen Entwicklungsaufwand und entsprechende Kosten zu reduzieren, werden häufig modellbasierte Sprachen wie Matlab/Simulink oder Zustandsautomaten eingesetzt. Obwohl diese Sprachen die Unternehmen während der Entwicklung von Einzelsystemen unterstützen, führt die große Nachfrage nach maßgeschneiderter Software zu neuen Herausforderungen. Entsprechend müssen Varianten mit hoher Ähnlichkeit und nur geringfügigen Unterschieden effizient entwickelt werden. Da eine Neuimplementierung komplexer Funktionalität für jede Variante keine Option darstellt, werden häufig Kopien existierender Lösungen für Kunden angepasst. Auf kurze Sicht ermöglicht dieser sogenannte clone-and-own-Ansatz Kosten zu sparen, da existierende Lösungen leicht wiederverwendet werden können. Jedoch birgt der Ansatz auch Risiken, da Beziehungen zwischen den Systemkopien selten dokumentiert werden und Fehler für jede der Variante einzeln behoben werden müssen. Somit kann mit einer wachsenden Anzahl an möglicherweise umfangreichen Systemkopien der Wartungsaufwand zu einem Problem werden. Um diese Probleme zu lösen, bietet diese Arbeit einen Ansatz zur semi-automatischen Überführung existierender Modellvarianten in Softwareproduktlinien. Diese ermöglichen eine anschließende Generierung der Varianten aus den identifizierten wiederverwendbaren Artefakten. Da in der Industrie eine große Menge von Modellierungssprachen eingesetzt wird, liegt der Fokus auf der einfachen Adaption für unterschiedliche Sprachen. Zusätzlich kann durch Einbeziehung von Expertenwissen oder sprachspezifische Details die Variabilitätsidentifikation beeinflusst werden. Der erste Schritt des Ansatzes analysiert die Varianten auf hohem Abstraktionslevel, um Außenseiter (z.B. Varianten die stark von den restlichen Variaten abweichen) und Cluster von stark verwandten Varianten zu identifizieren. Der zweite Schritt analysiert diese Cluster auf niedrigem Abstraktionslevel, um entsprechende Variabilitätsrelationen (d.h. gemeinsame und unterschiedliche Teile) zu identifizieren. Der dritte Schritt nutzt diese detaillierten Variabilitätsrelationen für eine automatische Migration der verglichenen Varianten in eine delta-orientierte Softwareproduktlinie. Der Ansatz ist an Fallstudien mit industriellem Kontext sowie Modellvarianten eines Industriepartners evaluiert worden

    Supporting Change in Product Lines Within the Context of Use Case-driven Development and Testing

    Get PDF
    Product Line Engineering (PLE) is a crucial practice in many software development environments where systems are complex and developed for multiple customers with varying needs. At the same time, many business contexts are use case-driven where use cases are the main artifacts driving requirements elicitation and many other development activities. In these contexts, variability information is often not explicitly represented, which leads to ad-hoc change management for use cases, domain models and test cases in product families. In this thesis, we address the problems of modeling variability in requirements with additional traceability to feature models and the manual and error prone requirements configuration and regression testing in product families. We provide the following contributions: - A modeling method for capturing variability information in product line use case and domain models by relying exclusively on commonly used artifacts in use-case driven development, thus avoiding unnecessary modeling overhead. - An approach for automated configuration of product specific use case and domain models that guides customers in making configuration decisions and automatically generates use case diagrams, use case specifications, and domain models for configured products. - A change impact analysis approach for evolving configuration decisions in product line use case models that automatically identifies the impact of decision changes on other decisions, and incrementally reconfigures product specific use case diagrams and specifications for evolving decisions. - An approach for automated classification and prioritization of system test cases in a family of products that automatically classifies and prioritizes, for each new product, system test cases of previous product(s) in a product line, and provides guidance in modifying existing system test cases to cover new use case scenarios that have not been tested in the product line before. All our approaches have been developed and evaluated in close collaboration with our industry partner IEE

    Combining SOA and BPM Technologies for Cross-System Process Automation

    Get PDF
    This paper summarizes the results of an industry case study that introduced a cross-system business process automation solution based on a combination of SOA and BPM standard technologies (i.e., BPMN, BPEL, WSDL). Besides discussing major weaknesses of the existing, custom-built, solution and comparing them against experiences with the developed prototype, the paper presents a course of action for transforming the current solution into the proposed solution. This includes a general approach, consisting of four distinct steps, as well as specific action items that are to be performed for every step. The discussion also covers language and tool support and challenges arising from the transformation

    Genetic Algorithms in Software Architecture Synthesis

    Get PDF
    Ohjelmistoarkkitehtuurien suunnittelu on kriittinen vaihe ohjelmistokehitystä, sillä arkkitehtuuri määrittelee ohjelmiston rungon: miten ohjelma jaetaan eri komponentteihin, ja miten komponentit ovat yhteydessä toisiinsa. Ohjelmisto voidaan yleensä toteuttaa toimivasti monella eri tavalla, mutta toimiva toteutus ei aina takaa, että ohjelmisto on myös toteutettu laadukkaasti. Laadun takeena onkin huolella ja taidolla suunniteltu arkkitehtuuri. Ohjelmistoarkkitehtuurin suunnittelu on haastavaa. Suunnitelmaa tehdessä tulee ottaa huomioon monen eri sidosryhmän (esim. käyttäjä, toteuttaja, markkinoija) vaatimukset ja miettiä, miten mahdollisimman suuri osa vaatimuksista voidaan toteuttaa arkkitehtuurissa. Arkkitehtuurisuunnittelu vaatiikin kokeneen ohjelmistoarkkitehdin, joka on hankkinut tietotaitonsa vuosien ajalta eri ohjelmistoprojekteista. Kokemukseen perustuvan tiedon lisäksi ohjelmistoarkkitehtuurisuunnittelun käytäntöjä on koottu eräänlaisiksi katalogeiksi, joissa esitellään hyväksi havaittuja ratkaisuja, ns. suunnittelutyylejä ja -malleja, yleisiin arkkitehtuurisuunnitteluongelmiin. Voidaankin ajatella, että arkkitehtuuri tuotetaan etsimällä (kokemukseen nojaten) paras mahdollinen kombinaatio suunnittelumalleja ja -tyylejä. Arkkitehtuurin suunnittelu onkin siis eräänlainen optimointiongelma. Ohjelmistoista tulee jatkuvasti yhä monimutkaisempia. Sovelluksien monimutkaistuessa myös arkkitehtuurisuunnittelu muuttuu entistä vaikeammaksi ja vie yhä enemmän aikaa. Suunnittelun perustuminen hiljaiseen tietoon ja arkkitehtien kokemukseen tekee prosessista yhä hitaamman ja läpinäkymättömämmän. Arkkitehtuurisuunnittelun automatisointi toisikin suuria säästöjä. Henkilöstövaihdosten yhteydessä ei myöskään tarvitsisi pelätä tietotaidon katoamista, kun arkkitehtuurisuunnittelu olisi helposti toistettavissa aina alusta lähtien. Tässä väitöskirjassa on tutkittu, miten parhaan mahdollisen ratkaisun etsintäprosessin (eli suunnittelumallien ja -tyylien soveltamisen) voisi automatisoida. Monimutkaisissa optimointiongelmissa käytetään etsintäalgoritmeja, jotka haravoivat hakuavaruutta jollain satunnaistetulla menetelmällä. Yksi suosituimmista etsintäalgoritmeista on geneettinen algoritmi. Geneettiset algoritmit tarkastelevat aina pientä ratkaisujoukkoa kerrallaan ja etsivät parasta ratkaisua yhdistelemällä osia löydetyistä ratkaisuista sekä muuntelemalla ratkaisuja. Jokaiselle ratkaisulle lasketaan laatuarvo, ja luonnonvalintaa jäljitellen jatketaan parhaiden vaihtoehtojen tarkastelua sekä kehittelyä ja hylätään huonoimmat ratkaisut. Etsintäalgoritmien käyttämistä ohjelmistokehityksen ongelmiin, esim. ohjelmistosuunnitteluun, testaukseen ja projektinhallintaan, kutsutaan etsintäperustaiseksi ohjelmistokehitykseksi. Väitöskirja kuuluu etsintäperustaisen ohjelmistosuunnittelun alaan, ja siinä tutkitaan ns. ohjelmistoarkkitehtuurisynteesiä geneettisten algoritmien avulla. Ohjelmistoarkkitehtuurisynteesi lähtee ns. nolla-arkkitehtuurista , joka toteuttaa järjestelmän toiminnalliset vaatimukset, mutta ei ota kantaa laatuvaatimuksiin. Laatua pyritään parantamaan lisäämällä lähtöarkkitehtuuriin suunnittelutyylejä ja -malleja. Väitöskirjassa laatuarviointiin on käytetty muunneltavuutta, tehokkuutta ja ymmärrettävyyttä. Lopputuloksena saadaan ehdotus arkkitehtuurista, joka toteuttaa toiminnalliset vaatimukset ja on myös laadukas. Geneettisiä algoritmeja ei ole aiemmin sovellettu vastaavantasoisiin suunnitteluongelmiin, joten toteutuksessa on kehitetty uusi tapa mallintaa arkkitehtuuri geneettiselle algoritmille sekä laskukaava arkkitehtuurin laadulle. Perustoteutuksen lisäksi myös geneettisen algoritmin eri ominaisuuksia, ns. risteytysoperaatiota ja laatufunktiota on tutkittu tarkemmin, ja niille on kehitetty vaihtoehtoisia toteutuksia. Tapaustarkasteluista saadut tulokset osoittavat, että tällä hetkellä geneettisiin algoritmeihin perustuvaa arkkitehtuurisynteesi tuottaa suunnilleen samantasoisia ratkaisuja kuin kolmannen vuosikurssin ohjelmistotekniikan opiskelija.This thesis presents an approach for synthesizing software architectures with genetic algorithms. Previously in the literature, genetic algorithms have been mostly used to improve existing architectures. The method presented here, however, focuses on upstream design. The chosen genetic construction of software architectures is based on a model which contains information on functional requirements only. Architecture styles and design patterns are used to transform the initial high-level model to a more detailed design. Quality attributes, here modifiability, efficiency and complexity, are encoded in the algorithm s fitness function for evaluating the produced solutions. The final solution is given as a UML class diagram. While the main contribution is introducing the method for architecture synthesis, basic tool support for the implementation is also presented. Two case studies are used for evaluation. One case study uses the sketch for an electronic home control system, which is a typical embedded system. The other case study is based on a robot war game simulator, which is a typical framework system. Evaluation is mostly based on fitness graphs and (subjective) evaluation of produced class diagrams. In addition to the basic approach, variations and extensions regarding crossover and fitness function have been made. While the standard algorithm uses a random crossover, asexual reproduction and complementary crossover are also studied. Asexual crossover corresponds to real-life design situations, where two architectures are rarely combined. Complementary crossover, in turn, attempts to purposefully combine good parts of two architectures. The fitness function is extended with the option to include modifiability scenarios, which enables more targeted design decisions as critical parts of the architecture can be evaluated individually. In order to achieve a wider range of solutions that answer to competing quality demands, a multi-objective approach using Pareto optimality is given as an alternative for the single weighted fitness function. The multi-objective approach evaluates modifiability and efficiency, and gives as output the class diagrams of the whole Pareto front of the last generation. Thus, extremes for both quality attributes as well as solutions in the middle ground can be compared. An experimental study is also conducted where independent experts evaluate produced solutions for the electronic home control. Results show that genetic software architecture synthesis is indeed feasible, and the quality of solutions at this stage is roughly at the level of third year software engineering students
    corecore