2,169 research outputs found

    Smart Contracts Software Metrics: a First Study

    Get PDF
    © 2018 The Author(s).Smart contracts (SC) are software codes which reside and run over a blockchain. The code can be written in different languages with the common purpose of implementing various kinds of transactions onto the hosting blockchain, They are ruled by the blockchain infrastructure and work in order to satisfy conditions typical of traditional contracts. The software code must satisfy constrains strongly context dependent which are quite different from traditional software code. In particular, since the bytecode is uploaded in the hosting blockchain, size, computational resources, interaction between different parts of software are all limited and even if the specific software languages implement more or less the same constructs of traditional languages there is not the same freedom as in normal software development. SC software is expected to reflect these constrains on SC software metrics which should display metric values characteristic of the domain and different from more traditional software metrics. We tested this hypothesis on the code of more than twelve thousands SC written in Solidity and uploaded on the Ethereum blockchain. We downloaded the SC from a public repository and computed the statistics of a set of software metrics related to SC and compared them to the metrics extracted from more traditional software projects. Our results show that generally Smart Contracts metrics have ranges more restricted than the corresponding metrics in traditional software systems. Some of the stylized facts, like power law in the tail of the distribution of some metrics, are only approximate but the lines of code follow a log normal distribution which reminds of the same behavior already found in traditional software systems.Submitted Versio

    Power laws in software systems

    Get PDF
    The main topic of my PhD has been the study of power laws in software systems within the perspective of describing software quality. My PhD research contributes to a recent stream of studies in software engineering, where the investigation of power laws in software systems has become widely popular in recent years, since they appear on an incredible variety of different software quantities and properties, like, for example, software metrics, software faults, refactoring, Java byte-code, module dependencies, software fractal dimension, lines of code, software packages and so on. The common presence of power laws suggests that software systems belong to the much larger category of complex systems, where typically self organization, fractality and emerging phenomena occur. Often my work involved the determination of a complex graph associated to the software system, defining the so called “complex software network”. For such complex software networks I analyzed different network metrics and I studied their relationships with software quality. In this PhD I took advantage of the theory of complex systems in order to study, to explain and sometimes to forecast properties and behavior of software systems. Thus my work involved the empirical study of many different statistical properties of software, in particular metrics, faults and refactorings, the construction and the application of statistical models for explaining such statistical properties, the implementation and the optimization of algorithms able to model their behavior, the introduction of metrics borrowed from Social Network Analysis (SNA) for describing relationships and dependencies among software modules. More specifically, my research activity regarded the followings topics: Bugs, power laws and software quality In [1] [7] [16] [20] [21] [22] module faultness and its implications on software quality are investigated. I studied data mining from CVS repositories of two large OO projects, Eclipse and Netbeans, focusing on “fixing- issue” commits, and compared static traditional approaches, like Knowledge Engineering, to dynamic approaches based on Machine Learning techniques. The work compares for the first time performances of Machine Learning (ML) techniques to automatic classify “fixing-issues” among message commits. Our study calculates precision and recall of different Machine Learning Classifiers for the correct classification of issue- reporting commits. The results show that some ML classifiers can correctly classify up to 99.9% of such commits. In [22] Java software systems are treated as complex graphs, where nodes represent a Java file - called compilation unit (CU) - and an edges represent a relations between them. The distribution of the number of bugs per CU, exhibits a power-law behavior in the tail, as well as the number of CUs influenced by a specific bug. The exam of the evolution of software metrics across different releases allows to understand how relationships among CUs metrics and CUs faultness change with time. In [1] module faultness is further discussed from a statistical perspective, using as case studies five versions of Eclipse, to show how log-normal, Double Pareto and Yule-Simon statistical distributions may fit the empirical bug distribution at least as well as the Weibull distribution proposed by Zhang. In particular, I discuss how some of these alternative distributions provide both a superior fit to empirical data and a theoretical motivation to be used for modeling the bug generation process. Further studies reported in [3] present a model based on the Yule process, able to explain the evolution of some properties of large object- oriented software systems. Four system properties related to code production of four large object-oriented software systems – Eclipse, Netbeans, JDK and Ant are analyzed. The properties analyzed, namely the naming of variables and methods, the call to methods and the inheritance hierarchies, show a power-law distribution. A software simulation allows to verify the goodness of the model, finding a very good correspondence between empirical data of subsequent software versions, and the prediction of the model presented. In [18], [19] and [23] three algorithms for an efficient implementation of the preferential attachment mechanism lying at the core of the Yule process are developed, and their efficiency in generating power- law distribution for different properties of Object Oriented (OO) software systems is discussed. Software metrics and SNA metrics In [2] [8] [13] [17] software metrics related to quality are analyzed and some metrics borrowed from the Social Network Analysis are applied to OO software graphs. In OO systems the modules are the classes, interconnected with each other by relationships like inheritance and dependency. It is possible to represent OO systems as software networks, where the classes are the network nodes and the relationships among classes are the network edges. Social Networks metrics, as for instance, the EGO metrics, allow to identify the role of each single node in the information flow through the network, being related to software modules and their dependencies. In [2] these metrics are compared with other traditional software metrics, like the Chidamber-Kemerer suite, and software graph metrics. The exam of the empirical distributions of all the metrics across the software modules of several releases of two large Java systems systematically shows fat-tails for all the metrics. Moreover, the various metric distributions look very similar and consistent across all system releases and are also very similar in both systems. Analytical distribution functions suitable for describing and studying the observed distributions are also provided. The work in [17] presents an extensive analysis of software metrics for 111 object-oriented systems written in Java. For each system, we considered 18 traditional metrics such as LOC and Chidamber and Kemerer metrics, as well as metrics derived from complex network theory and social network analysis, computed at class level. Most metrics follow a leptokurtotic distribution. Only a couple of metrics have patent normal behavior while some others are very irregular, and even bimodal. The statistics gathered allow to study and discuss the variability of metrics along different systems. In [8] a preliminary and exploratory analysis of the Eclipse subprojects is presented, using a joint application of SNA and traditional software metrics. The entire set of metrics has been summarized performing a Principal Component Analysis (PCA) and obtaining a very reduced number of independent principal components, which allow to represent the classes into a space where they show typical patterns. The preliminary results show how the joint application of traditional and network software metrics may be used to identify subprojects developed with similar functionalities and scopes. In [13] the software graphs of 96 systems of the Java Qualitas Corpus are anlyzed, parsing the source code and identifying the dependencies among classes. Twelve software metrics were analyzed, nine borrowed from Social Net- work Analysis (SNA), and three more traditional software metrics, such as Loc, Fan-in and Fan-out. The results show how the metrics can be partitioned in groups for almost the whole Java Qualitas Corpus, and that such grouping can provide insights on the topology of software networks. For two systems, Eclipse and Netbeans, we computed also the number of bugs, identifying the bugs affecting each class, and finding that some SNA metrics are highly correlated with bugs, while others are strongly anti-correlated. Software fractal dimension In [6] [12] [14] [15] the self similar structure of software networks is used to introduce the fractal dimension as a global software metric associated to software quality, at the system level and at the subproject level. In [6] the source code of various releases of two large OO Open Source (OS) Java software systems, Eclipse and Netbeans is analyzed, investigating the complexity of the whole release and of its subprojects. In all examined cases there exists a scaling region where it is possible to compute a self-similar coefficient, the fractal dimension, using “the box counting method”. Results show that this measure looks fairly related to software quality, acting as a global quality software metric. In particular, we computed the defects of each software system and we found a clear correlation among the number of defects in the system, or in a subproject, and its fractal dimension. This correlation exists across all the subprojects and also along the time evolution of the software systems, as new releases are delivered. In [14] software systems are considered as complex networks which have a self- similar structure under a length-scale transformation. On such complex software networks a self-similar coefficient is computed, also known as fractal dimension, using "the box counting method”. Several releases of the publically available Eclipse software system were analyzed, calculating the fractal dimension for twenty sub-projects, randomly chosen, for every release, as well as for each release as a whole. Our results display an overall consistency among the sub- projects and among all the analyzed releases. The study founds a very good correlation between the fractal dimension and the number of bugs for Eclipse and for twenty sub-projects. This result suggests that the fractal dimension could be considered as a global quality metric for large software systems. Works [12] and [15] propose an algorithm for computing the fractal dimension of a software network, and compare its performances with two other algorithms. Object of study are various large, object-oriented software systems. We built the associated graph for each system, also known as software network, analyzing the binary relationships (dependencies), among classes. We found that the structure of such software networks is self-similar under a length-scale transformation. The fractal dimension of these networks is computed using a Merge algorithm, first devised by the authors, a Greedy Coloring algorithm, based on the equivalence with the graph coloring problem, and a Simulated Annealing algorithm, largely used for efficiently determining minima in multi-dimensional problems. Our study examines both efficiency and accuracy, showing that the Merge algorithm is the most efficient, while the Simulated Annealing is the most accurate. The Greeding Coloring algorithm lays in between the two, having speed very close to the Merge algorithm, and accuracy comparable to the Simulated Annealing algorithm. 1.b Further research activity In [4] [9] [10] [11] the problem of software refactoring is analyzed. The study reported in [4] analyzes the effect of particular refactorings on class coupling for different releases of four Object Oriented (OO) Open Source (OS) Java software systems: Azureus, Jtopen, Jedit and Tomcat, as representative of general Java OS systems. Specifically, the “add parameter” to a method and “remove parameter” from a method refactorings, as defined according to Fowler’s dictionary, may influence class coupling changing fan-in and fan-out of classes they are applied to. The work investigates, both qualitatively and quantitatively, what is the global effect of the application of such refactorings, providing best fitting statistical distributions able to describe the changes in fan-in and fan-out couplings. A detailed analysis of the best fitting parameters and of their changes when refactoring occurs, has been performed, estimating the effect of refactoring on coupling before it is applied. Such estimates may help in determining refactoring costs and benefits . In [9] a study of the effect of fan-in and fan-out metrics is performed from the perspective of two refactorings, “add parameter to” and “remove parameter from” a method, collecting these two refactorings from multiple releases of the Tomcat open source system. Results show significant differences in the profiles of statistical distributions of fan-in and fan-out between refactored and not refactored classes. A strong over-arching theme emerged: developers seemed to focus on the refactoring of classes with relatively high fan-in and fan-out values rather than classes with high values in any one. In [10] is considered for the first time how a single refactoring modified these metric values, what happened when refactorings had been applied to a single class in unison and finally, what influence a set of refactorings had on the shape of FanIn and FanOut distributions. Results indicated that, on average, refactored classes tended to have larger FanIn and FanOut values when compared with non-refactored classes. Where evidence of multiple (different) refactorings applied to the same class was found, the net effect (in terms of FanIn and FanOut coupling values) was negligible. In [11] is shown how highly-coupled classes were more prone to refactoring, particularly through a set of ‘core’ refactorings. However, wide variations were found across systems for our chosen measures of coupling namely, fan-in and fan-out. Specific individual refactorings were also explored to gain an understanding of why these differences may have occurred. An exploration of open questions through the extraction of fifty-two of Fowler’s catalog of refactorings drawn from versions of four open-source systems is accomplished, comparing the coupling characteristics of each set of refactored classes with the corresponding set of non-refactored classes. In [7] I presented some preliminary studies also on the relationships about Micro- patterns, more specifically anti-patterns, and software quality, while in [5] and [21] I analyzed the role of Agile methodologies in software production and the relationships with software quality and the presence of bugs

    Power laws in software systems

    Get PDF
    The main topic of my PhD has been the study of power laws in software systems within the perspective of describing software quality. My PhD research contributes to a recent stream of studies in software engineering, where the investigation of power laws in software systems has become widely popular in recent years, since they appear on an incredible variety of different software quantities and properties, like, for example, software metrics, software faults, refactoring, Java byte-code, module dependencies, software fractal dimension, lines of code, software packages and so on. The common presence of power laws suggests that software systems belong to the much larger category of complex systems, where typically self organization, fractality and emerging phenomena occur. Often my work involved the determination of a complex graph associated to the software system, defining the so called “complex software network”. For such complex software networks I analyzed different network metrics and I studied their relationships with software quality. In this PhD I took advantage of the theory of complex systems in order to study, to explain and sometimes to forecast properties and behavior of software systems. Thus my work involved the empirical study of many different statistical properties of software, in particular metrics, faults and refactorings, the construction and the application of statistical models for explaining such statistical properties, the implementation and the optimization of algorithms able to model their behavior, the introduction of metrics borrowed from Social Network Analysis (SNA) for describing relationships and dependencies among software modules. More specifically, my research activity regarded the followings topics: Bugs, power laws and software quality In [1] [7] [16] [20] [21] [22] module faultness and its implications on software quality are investigated. I studied data mining from CVS repositories of two large OO projects, Eclipse and Netbeans, focusing on “fixing- issue” commits, and compared static traditional approaches, like Knowledge Engineering, to dynamic approaches based on Machine Learning techniques. The work compares for the first time performances of Machine Learning (ML) techniques to automatic classify “fixing-issues” among message commits. Our study calculates precision and recall of different Machine Learning Classifiers for the correct classification of issue- reporting commits. The results show that some ML classifiers can correctly classify up to 99.9% of such commits. In [22] Java software systems are treated as complex graphs, where nodes represent a Java file - called compilation unit (CU) - and an edges represent a relations between them. The distribution of the number of bugs per CU, exhibits a power-law behavior in the tail, as well as the number of CUs influenced by a specific bug. The exam of the evolution of software metrics across different releases allows to understand how relationships among CUs metrics and CUs faultness change with time. In [1] module faultness is further discussed from a statistical perspective, using as case studies five versions of Eclipse, to show how log-normal, Double Pareto and Yule-Simon statistical distributions may fit the empirical bug distribution at least as well as the Weibull distribution proposed by Zhang. In particular, I discuss how some of these alternative distributions provide both a superior fit to empirical data and a theoretical motivation to be used for modeling the bug generation process. Further studies reported in [3] present a model based on the Yule process, able to explain the evolution of some properties of large object- oriented software systems. Four system properties related to code production of four large object-oriented software systems – Eclipse, Netbeans, JDK and Ant are analyzed. The properties analyzed, namely the naming of variables and methods, the call to methods and the inheritance hierarchies, show a power-law distribution. A software simulation allows to verify the goodness of the model, finding a very good correspondence between empirical data of subsequent software versions, and the prediction of the model presented. In [18], [19] and [23] three algorithms for an efficient implementation of the preferential attachment mechanism lying at the core of the Yule process are developed, and their efficiency in generating power- law distribution for different properties of Object Oriented (OO) software systems is discussed. Software metrics and SNA metrics In [2] [8] [13] [17] software metrics related to quality are analyzed and some metrics borrowed from the Social Network Analysis are applied to OO software graphs. In OO systems the modules are the classes, interconnected with each other by relationships like inheritance and dependency. It is possible to represent OO systems as software networks, where the classes are the network nodes and the relationships among classes are the network edges. Social Networks metrics, as for instance, the EGO metrics, allow to identify the role of each single node in the information flow through the network, being related to software modules and their dependencies. In [2] these metrics are compared with other traditional software metrics, like the Chidamber-Kemerer suite, and software graph metrics. The exam of the empirical distributions of all the metrics across the software modules of several releases of two large Java systems systematically shows fat-tails for all the metrics. Moreover, the various metric distributions look very similar and consistent across all system releases and are also very similar in both systems. Analytical distribution functions suitable for describing and studying the observed distributions are also provided. The work in [17] presents an extensive analysis of software metrics for 111 object-oriented systems written in Java. For each system, we considered 18 traditional metrics such as LOC and Chidamber and Kemerer metrics, as well as metrics derived from complex network theory and social network analysis, computed at class level. Most metrics follow a leptokurtotic distribution. Only a couple of metrics have patent normal behavior while some others are very irregular, and even bimodal. The statistics gathered allow to study and discuss the variability of metrics along different systems. In [8] a preliminary and exploratory analysis of the Eclipse subprojects is presented, using a joint application of SNA and traditional software metrics. The entire set of metrics has been summarized performing a Principal Component Analysis (PCA) and obtaining a very reduced number of independent principal components, which allow to represent the classes into a space where they show typical patterns. The preliminary results show how the joint application of traditional and network software metrics may be used to identify subprojects developed with similar functionalities and scopes. In [13] the software graphs of 96 systems of the Java Qualitas Corpus are anlyzed, parsing the source code and identifying the dependencies among classes. Twelve software metrics were analyzed, nine borrowed from Social Net- work Analysis (SNA), and three more traditional software metrics, such as Loc, Fan-in and Fan-out. The results show how the metrics can be partitioned in groups for almost the whole Java Qualitas Corpus, and that such grouping can provide insights on the topology of software networks. For two systems, Eclipse and Netbeans, we computed also the number of bugs, identifying the bugs affecting each class, and finding that some SNA metrics are highly correlated with bugs, while others are strongly anti-correlated. Software fractal dimension In [6] [12] [14] [15] the self similar structure of software networks is used to introduce the fractal dimension as a global software metric associated to software quality, at the system level and at the subproject level. In [6] the source code of various releases of two large OO Open Source (OS) Java software systems, Eclipse and Netbeans is analyzed, investigating the complexity of the whole release and of its subprojects. In all examined cases there exists a scaling region where it is possible to compute a self-similar coefficient, the fractal dimension, using “the box counting method”. Results show that this measure looks fairly related to software quality, acting as a global quality software metric. In particular, we computed the defects of each software system and we found a clear correlation among the number of defects in the system, or in a subproject, and its fractal dimension. This correlation exists across all the subprojects and also along the time evolution of the software systems, as new releases are delivered. In [14] software systems are considered as complex networks which have a self- similar structure under a length-scale transformation. On such complex software networks a self-similar coefficient is computed, also known as fractal dimension, using "the box counting method”. Several releases of the publically available Eclipse software system were analyzed, calculating the fractal dimension for twenty sub-projects, randomly chosen, for every release, as well as for each release as a whole. Our results display an overall consistency among the sub- projects and among all the analyzed releases. The study founds a very good correlation between the fractal dimension and the number of bugs for Eclipse and for twenty sub-projects. This result suggests that the fractal dimension could be considered as a global quality metric for large software systems. Works [12] and [15] propose an algorithm for computing the fractal dimension of a software network, and compare its performances with two other algorithms. Object of study are various large, object-oriented software systems. We built the associated graph for each system, also known as software network, analyzing the binary relationships (dependencies), among classes. We found that the structure of such software networks is self-similar under a length-scale transformation. The fractal dimension of these networks is computed using a Merge algorithm, first devised by the authors, a Greedy Coloring algorithm, based on the equivalence with the graph coloring problem, and a Simulated Annealing algorithm, largely used for efficiently determining minima in multi-dimensional problems. Our study examines both efficiency and accuracy, showing that the Merge algorithm is the most efficient, while the Simulated Annealing is the most accurate. The Greeding Coloring algorithm lays in between the two, having speed very close to the Merge algorithm, and accuracy comparable to the Simulated Annealing algorithm. 1.b Further research activity In [4] [9] [10] [11] the problem of software refactoring is analyzed. The study reported in [4] analyzes the effect of particular refactorings on class coupling for different releases of four Object Oriented (OO) Open Source (OS) Java software systems: Azureus, Jtopen, Jedit and Tomcat, as representative of general Java OS systems. Specifically, the “add parameter” to a method and “remove parameter” from a method refactorings, as defined according to Fowler’s dictionary, may influence class coupling changing fan-in and fan-out of classes they are applied to. The work investigates, both qualitatively and quantitatively, what is the global effect of the application of such refactorings, providing best fitting statistical distributions able to describe the changes in fan-in and fan-out couplings. A detailed analysis of the best fitting parameters and of their changes when refactoring occurs, has been performed, estimating the effect of refactoring on coupling before it is applied. Such estimates may help in determining refactoring costs and benefits . In [9] a study of the effect of fan-in and fan-out metrics is performed from the perspective of two refactorings, “add parameter to” and “remove parameter from” a method, collecting these two refactorings from multiple releases of the Tomcat open source system. Results show significant differences in the profiles of statistical distributions of fan-in and fan-out between refactored and not refactored classes. A strong over-arching theme emerged: developers seemed to focus on the refactoring of classes with relatively high fan-in and fan-out values rather than classes with high values in any one. In [10] is considered for the first time how a single refactoring modified these metric values, what happened when refactorings had been applied to a single class in unison and finally, what influence a set of refactorings had on the shape of FanIn and FanOut distributions. Results indicated that, on average, refactored classes tended to have larger FanIn and FanOut values when compared with non-refactored classes. Where evidence of multiple (different) refactorings applied to the same class was found, the net effect (in terms of FanIn and FanOut coupling values) was negligible. In [11] is shown how highly-coupled classes were more prone to refactoring, particularly through a set of ‘core’ refactorings. However, wide variations were found across systems for our chosen measures of coupling namely, fan-in and fan-out. Specific individual refactorings were also explored to gain an understanding of why these differences may have occurred. An exploration of open questions through the extraction of fifty-two of Fowler’s catalog of refactorings drawn from versions of four open-source systems is accomplished, comparing the coupling characteristics of each set of refactored classes with the corresponding set of non-refactored classes. In [7] I presented some preliminary studies also on the relationships about Micro- patterns, more specifically anti-patterns, and software quality, while in [5] and [21] I analyzed the role of Agile methodologies in software production and the relationships with software quality and the presence of bugs

    An empirical study of social networks metrics in object-oriented software

    Get PDF
    We study the application to object-oriented software of new metrics, derived from Social Network Analysis. Social Networks metrics, as for instance, the EGO metrics, allow to identify the role of each single node in the information flow through the network, being related to software modules and their dependencies. These metrics are compared with other traditional software metrics, like the Chidamber-Kemerer suite, and software graph metrics. We examine the empirical distributions of all the metrics, bugs included, across the software modules of several releases of two large Java systems, Eclipse and Netbeans. We provide analytical distribution functions suitable for describing and studying the observed distributions. We study also correlations among metrics and bugs. We found that the empirical distributions systematically show fat-tails for all the metrics. Moreover, the various metric distributions look very similar and consistent across all system releases and are also very similar in both the studied systems. These features appear to be typical properties of these software metrics

    Subtle Worlds: Exploring Relational Ecology with Clay

    Get PDF
    The frameworks by which the globally-dominant culture expects individuals to understand and act within their environments are founded upon practices of oppression, division, and control, and have played an outsized role in the global crises faced in the modern era. Manifold currents of thought have developed alternatives to these trends from fields as diverse as Environmental Humanities, Feminist Theory, Post-Colonial Theory, Art Studies, Cybernetics and Systems Theory, Poetry, Political Science, and Science-Fiction, to name a few. Shared themes of these alternatives include the rekindling of respect for the agentic power of non-human things inside and outside human contexts, and the dissolution of socially constructed boundaries, both qualities that are unnecessarily diminished by the Scientism underpinning dominant cultural onto-epistemologies. Through art, a practice filled with the potential to transgress the boundaries of the Dominant culture, this thesis explores relationships, community, identity, and presence as they ebb, flow, and shift amongst humans and non-humans alike. This exploration is centered on Ceramics as an integrated collaboration between a multitude of forces, including the artist, audience, kilns, clay bodies, glazes, traditions, tools, and more, and reveals an expansive and emergent mesh of vibrant actors. By sitting with and validating more complex frameworks for understanding and relating to the worlds we exist in, it becomes possible to imagine a world in which vast choirs of influencing forces become more clear, allowing us to nurture an acceptance of the immense diversity of human, non-human, and non-living kin, prioritizing respect, curiosity, and reciprocality

    Time evolution and distribution analysis of software bugs from a complex network perspective

    Get PDF
    Successful software systems are constantly under development. Since they have to be updated when new features are introduced, bug are fixed and the system is kept up to date, they require a continuous maintenance. Among these activities the bug fixing is one of themost relevant, because it is determinant for software quality. Unfortunately, software houses have limited time and developers to address all these issues before the product delivery. For this reason, an efficient allocation of these resources is required to obtain the quality required by the market. The keyword for a correct management of software product process is measure. As De-Marco states “you cannot control what you cannot measure”, and this thesis is mainly devoted to this aspect. This dissertation bears with software measures related to bug proneness and distribution analysis of software bugs. The aim is to describe the bug occurrence phenomena, identify useful metrics related to software bugs proneness and finally to characterize how bug population is distributed and evolve, discussing also the model able to explain this evolution. Studying the relationship between code evolution and bug distribution or bug-proneness, we foresee which software structure will come out. Thus, this research provides information and guidelines tomanagers, helping them to plan, schedule activities and allocate resources, during software development

    Statistical analysis and simulation of design models evolution

    Get PDF
    Tools, algorithms and methods in the context of Model-Driven Engineering (MDE) have to be assessed, evaluated and tested with regard to different aspects such as correctness, quality, scalability and efficiency. Unfortunately, appropriate test models are scarcely available and those which are accessible often lack desired properties. Therefore, one needs to resort to artificially generated test models in practice. Many services and features of model versioning systems are motivated from the collaborative development paradigm. Testing such services does not require single models, but rather pairs of models, one being derived from the other one by applying a known sequence of edit steps. The edit operations used to modify the models should be the same as in usual development environments, e.g. adding, deleting and changing of model elements in visual model editors. Existing model generators are motivated from the testing of model transformation engines, they do not consider the true nature of evolution in which models are evolved through iterative editing steps. They provide no or very little control over the generation process and they can generate only single models rather than model histories. Moreover, the generation of stochastic and other properties of interest also are not supported in the existing approaches. Furthermore, blindly generating models through random application of edit operations does not yield useful models, since the generated models are not (stochastically) realistic and do not reflect true properties of evolution in real software systems. Unfortunately, little is known about how models of real software systems evolve over time, what are the properties and characteristics of evolution, how one can mathematically formulate the evolution and simulate it. To address the previous problems, we introduce a new general approach which facilitates generating (stochastically) realistic test models for model differencing tools and tools for analyzing model histories. We propose a model generator which addresses the above deficiencies and generates or modifies models by applying proper edit operations. Fine control mechanisms for the generation process are devised and the generator supports stochastic and other properties of interest in the generated models. It also can generate histories, i.e. related sequences, of models. Moreover, in our approach we provide a methodological framework for capturing, mathematically representing and simulating the evolution of real design models. The proposed framework is able to capture the evolution in terms of edit operations applied between revisions. Mathematically, the representation of evolution is based on different statistical distributions as well as different time series models. Forecasting, simulation and generation of stochastically realistic test models are discussed in detail. As an application, the framework is applied to the evolution of design models obtained from sample a set of carefully selected Java systems. In order to study the the evolution of design models, we analyzed 9 major Java projects which have at least 100 revisions. We reverse engineered the design models from the Java source code and compared consecutive revisions of the design models. The observed changes were expressed in terms of two sets of edit operations. The first set consists of 75 low-level graph edit operations, e.g. add, delete, etc. of nodes and edges of the abstract syntax graph of the models. The second set consists of 188 high-level (user-level) edit operations which are more meaningful from a developer’s point of view and are frequently found in visual model editors. A high-level operation typically comprises several low-level operations and is considered as one user action. In our approach, we mathematically formulated the pairwise evolution, i.e. changes between each two subsequent revisions, using statistical models (distributions). In this regard, we initially considered many distributions which could be promising in modeling the frequencies of the observed low-level and high-level changes. Six distributions were very successful in modeling the changes and able to model the evolution with very good rates of success. To simulate the pairwise evolution, we studied random variate generation algorithms of our successful distributions in detail. For four of our distributions which no tailored algorithms existed, we indirectly generated their random variates. The chronological (historical) evolution of design models was modeled using three kinds of time series models, namely ARMA, GARCH and mixed ARMA-GARCH. The comparative performance of the time series models for handling the dynamics of evolution as well as accuracies of their forecasts was deeply studied. Roughly speaking, our studies show that mixed ARMA-GARCH models are superior to other models. Moreover, we discuss the simulation aspects of our proposed time series models in detail. The knowledge gained through statistical analysis of the evolution was then used in our test model generator in order to generate more realistic test models for model differencing, model versioning, history analysis tools, etc.Im Kontext der modellgetriebenen Entwicklung mĂŒssen Werkzeuge, Algorithmen und Methoden bewertet, evaluiert und getestet werden. Dabei spielen verschiedene Aspekte wie Korrektheit, QualitĂ€t, Skalierbarkeit und Effizienz eine grosse Rolle. Problematisch dabei ist, dass geeignete Testmodelle nur spĂ€rlich verfĂŒgbar sind. VerfĂŒgbare Modelle weisen darĂŒber hinaus die fĂŒr Evaluationszwecke gewĂŒnschten Eigenschaften oft nicht auf. Aus diesem Grund muss in der Praxis auf kĂŒnstlich erzeugte Testmodelle zurĂŒckgegriffen werden. Viele der FunktionalitĂ€ten von Modellversionierungssystemen sind motiviert von den Paradigmen der kollaborativen (Software) Entwicklung. FĂŒr das Testen derartiger FunktionalitĂ€ten braucht man keine einzelnen Modelle, sondern Paare von Modellen, bei denen das Zweite durch Anwendungen einer bekannten Sequenz von Editierschritten auf das Erste erzeugt wird. Die genutzten Editieroperationen sollten dabei die gleichen sein, die bei den typischen Entwicklungsumgebungen angewendet werden, beispielsweise das HinzufĂŒgen, Löschen oder VerĂ€ndern von Modellelementen in visuellen Editoren. Derzeit existierende Modellgeneratoren sind motiviert durch das Testen von Modelltransformationsumgebungen. Dabei berĂŒcksichtigen sie nicht die wahre Natur der (Software) Evolution, bei der die Modelle iterativ durch die kontrollierte Anwendung einzelner Editierschritte verĂ€ndert werden. Dabei bieten sie nur wenig Kontrolle ĂŒber den Prozess der Generierung und sie können nur einzelne Modelle, aber keine Modellhistorien, erzeugen. DarĂŒber hinaus werden gewĂŒnschte Eigenschaften, beispielsweise eine stochastisch kontrollierte Erzeugung, von den derzeit existierenden AnsĂ€tzen nicht unterstĂŒtzt. Aufgrund der (blinden) zufallsgesteuerten Anwendungen von Editieroperationen werden keine brauchbaren, (stochastisch) realistischen Modelle generiert. Dadurch reprĂ€sentieren sie keine Eigenschaften von Evolutionen in echten Systemen. Leider gibt es wenig wissenschaftliche Erkenntnis darĂŒber, wie Modelle in realen Systemen evolvieren, was die Eigenschaften und Charakteristika einer solchen Evolution sind und wie man diese mathematisch formulieren und simulieren kann. Um die zuvor genannten Probleme zu adressieren, stellen wir einen allgemeinen Ansatz zur (stochastischen) Generierung realer Testmodelle zur Verwendung in Differenzwerkzeugen und Historienanalysen vor. Unser Generator generiert oder modifiziert Modelle durch geeignete Anwendung von Editieroperationen. Sowohl feine Kontrollmechanismen fĂŒr den Generierungsprozess als auch die UnterstĂŒtzung von stochastischen und anderen interessanten Eigenschaften in den generierten Modellen zeichnen den Generator aus. ZusĂ€tzlich kann dieser Historien, d.h. abhĂ€ngige/zusammenhĂ€ngende Änderungssequenzen, von Modellen generieren. Unser Ansatz bietet eine methodische Umgebung fĂŒr das Aufzeichnen, die mathematische ReprĂ€sentation als auch das Simulieren von Evolutionen realer Modelle. Die vorgestellte Umgebung kann die Evolution in Form von Editieroperationen, angewandt zwischen Revisionen, erfassen. Die mathematische ReprĂ€sentation der Evolution basiert sowohl auf verschiedenen stochastischen Verteilungen als auch unterschiedlichen Modellen von Zeitreihen. Das Vorhersagen, Simulieren und Generieren von stochastisch realistischen Testmodellen wird im Detail diskutiert. Als praktische Anwendung setzen wir unsere Umgebung im Rahmen einer Modellevolution von sorgfĂ€ltig ausgewĂ€hlten Java-Systemen ein. Im Rahmen dieser Arbeit wurde die Evolution von Design Modellen auf Basis von neun Open-Source Java Projekten analysiert. FĂŒr jedes Projekt lagen mindestens 100 Revisionen vor, aus deren Quelltexten Design Modelle nachkonstruiert wurden. Die dabei gefunden Änderungen konnten anhand zwei verschiedener Mengen von Editieroperationen beschrieben werden. Die erste Menge besteht aus 75 einfachen Graph-Operationen. Beispiele dafĂŒr sind das HinzufĂŒgen, Löschen, etc. einzelner Knoten und Kanten im abstrakten Syntax-Graphen der Modelle. Die zweite Menge enthĂ€lt 188 komplexe Editieroperationen. Komplexe Editieroperationen haben fĂŒr Entwickler eine höhere Aussagekraft, da sie auf dem gewohnten Abstraktionsniveau des Entwicklers angesiedelt und oftmals in visuellen Modelleditoren zu finden sind. Typischerweise besteht eine komplexe Editieroperation dabei aus mehreren einfachen Operationen, wobei die AusfĂŒhrung der komplexen Operation immer als eine einzelne Aktion angesehen wird. Um die schrittweise Evolution, also die VerĂ€nderung aufeinanderfolgender Revisionen, zu analysieren betrachteten wir verschiedene statistische Modelle (Distributionen). Von allen betrachteten Distributionen erwiesen sich sechs als sehr erfolgreich dabei die beobachteten VerĂ€nderungen und die Evolution der Modelle auf Basis einfacher und komplexer Editieroperationen zu beschreiben. Um die Evolution weiter simulieren zu können, betrachteten wir Algorithmen fĂŒr die Erstellung von Zufallsvariaten der erfolgreichen Distributionen. FĂŒr vier der Distributionen, fĂŒr die keine derartigen Algorithmen verfĂŒgbar waren, wurden die Zufallsvariate indirekt abgeleitet. Die chronologische (historische) Evolution von Modellen wurde auf Basis von drei Zeitreihen nachgebildet, konkret ARMA, GARCH und einer Mischung aus ARMA-GARCH. Sowohl deren LeistungsfĂ€higkeit, Evolutionsdynamik darstellen zu können, als auch die Genauigkeit von Vorhersagen wurden im Detail analysiert und gegenĂŒbergestellt. Grob gesagt zeigen unsere Ergebnisse, dass ARMA-GARCH Modelle besser als die ĂŒbrigen geeignet sind. ZusĂ€tzlich diskutieren wir ausfĂŒhrlich die Simulationsmöglichkeiten aller vorgestellten Zeitreihen. Die Ergebnisse unserer statistischen Analysen der Evolution haben wir dann in unserem Testmodell Generator eingesetzt. So konnten wir realistische Testmodelle generieren, die fĂŒr Modelldifferenz-, Versionierungs- und Historienanalysewerkzeuge u.s.w. verwendet werden können

    Counting, grafting and evolving binary trees

    Get PDF
    Binary trees are fundamental objects in models of evolutionary biology and population genetics. Here, we discuss some of their combinatorial and structural properties as they depend on the tree class considered. Furthermore, the process by which trees are generated determines the probability distribution in tree space. Yule trees, for instance, are generated by a pure birth process. When considered as unordered, they have neither a closed-form enumeration nor a simple probability distribution. But their ordered siblings have both. They present the object of choice when studying tree structure in the framework of evolving genealogies
    • 

    corecore