Search CORE

15 research outputs found

An empirical study of social networks metrics in object-oriented software

Author: Alessandro Murgia
Giulio Concas
Michele Marchesi
Roberto Tonelli
Publication venue
Publication date: 01/01/2010
Field of study

We study the application to object-oriented software of new metrics, derived from Social Network Analysis. Social Networks metrics, as for instance, the EGO metrics, allow to identify the role of each single node in the information flow through the network, being related to software modules and their dependencies. These metrics are compared with other traditional software metrics, like the Chidamber-Kemerer suite, and software graph metrics. We examine the empirical distributions of all the metrics, bugs included, across the software modules of several releases of two large Java systems, Eclipse and Netbeans. We provide analytical distribution functions suitable for describing and studying the observed distributions. We study also correlations among metrics and bugs. We found that the empirical distributions systematically show fat-tails for all the metrics. Moreover, the various metric distributions look very similar and consistent across all system releases and are also very similar in both the studied systems. These features appear to be typical properties of these software metrics

Directory of Open Access Journals

Open Access Repository

Power laws in software systems

Author: Tonelli Roberto
Publication venue
Publication date: 06/03/2012
Field of study

The main topic of my PhD has been the study of power laws in software systems within the perspective of describing software quality. My PhD research contributes to a recent stream of studies in software engineering, where the investigation of power laws in software systems has become widely popular in recent years, since they appear on an incredible variety of different software quantities and properties, like, for example, software metrics, software faults, refactoring, Java byte-code, module dependencies, software fractal dimension, lines of code, software packages and so on. The common presence of power laws suggests that software systems belong to the much larger category of complex systems, where typically self organization, fractality and emerging phenomena occur. Often my work involved the determination of a complex graph associated to the software system, defining the so called “complex software network”. For such complex software networks I analyzed different network metrics and I studied their relationships with software quality. In this PhD I took advantage of the theory of complex systems in order to study, to explain and sometimes to forecast properties and behavior of software systems. Thus my work involved the empirical study of many different statistical properties of software, in particular metrics, faults and refactorings, the construction and the application of statistical models for explaining such statistical properties, the implementation and the optimization of algorithms able to model their behavior, the introduction of metrics borrowed from Social Network Analysis (SNA) for describing relationships and dependencies among software modules. More specifically, my research activity regarded the followings topics: Bugs, power laws and software quality In [1] [7] [16] [20] [21] [22] module faultness and its implications on software quality are investigated. I studied data mining from CVS repositories of two large OO projects, Eclipse and Netbeans, focusing on “fixing- issue” commits, and compared static traditional approaches, like Knowledge Engineering, to dynamic approaches based on Machine Learning techniques. The work compares for the first time performances of Machine Learning (ML) techniques to automatic classify “fixing-issues” among message commits. Our study calculates precision and recall of different Machine Learning Classifiers for the correct classification of issue- reporting commits. The results show that some ML classifiers can correctly classify up to 99.9% of such commits. In [22] Java software systems are treated as complex graphs, where nodes represent a Java file - called compilation unit (CU) - and an edges represent a relations between them. The distribution of the number of bugs per CU, exhibits a power-law behavior in the tail, as well as the number of CUs influenced by a specific bug. The exam of the evolution of software metrics across different releases allows to understand how relationships among CUs metrics and CUs faultness change with time. In [1] module faultness is further discussed from a statistical perspective, using as case studies five versions of Eclipse, to show how log-normal, Double Pareto and Yule-Simon statistical distributions may fit the empirical bug distribution at least as well as the Weibull distribution proposed by Zhang. In particular, I discuss how some of these alternative distributions provide both a superior fit to empirical data and a theoretical motivation to be used for modeling the bug generation process. Further studies reported in [3] present a model based on the Yule process, able to explain the evolution of some properties of large object- oriented software systems. Four system properties related to code production of four large object-oriented software systems – Eclipse, Netbeans, JDK and Ant are analyzed. The properties analyzed, namely the naming of variables and methods, the call to methods and the inheritance hierarchies, show a power-law distribution. A software simulation allows to verify the goodness of the model, finding a very good correspondence between empirical data of subsequent software versions, and the prediction of the model presented. In [18], [19] and [23] three algorithms for an efficient implementation of the preferential attachment mechanism lying at the core of the Yule process are developed, and their efficiency in generating power- law distribution for different properties of Object Oriented (OO) software systems is discussed. Software metrics and SNA metrics In [2] [8] [13] [17] software metrics related to quality are analyzed and some metrics borrowed from the Social Network Analysis are applied to OO software graphs. In OO systems the modules are the classes, interconnected with each other by relationships like inheritance and dependency. It is possible to represent OO systems as software networks, where the classes are the network nodes and the relationships among classes are the network edges. Social Networks metrics, as for instance, the EGO metrics, allow to identify the role of each single node in the information flow through the network, being related to software modules and their dependencies. In [2] these metrics are compared with other traditional software metrics, like the Chidamber-Kemerer suite, and software graph metrics. The exam of the empirical distributions of all the metrics across the software modules of several releases of two large Java systems systematically shows fat-tails for all the metrics. Moreover, the various metric distributions look very similar and consistent across all system releases and are also very similar in both systems. Analytical distribution functions suitable for describing and studying the observed distributions are also provided. The work in [17] presents an extensive analysis of software metrics for 111 object-oriented systems written in Java. For each system, we considered 18 traditional metrics such as LOC and Chidamber and Kemerer metrics, as well as metrics derived from complex network theory and social network analysis, computed at class level. Most metrics follow a leptokurtotic distribution. Only a couple of metrics have patent normal behavior while some others are very irregular, and even bimodal. The statistics gathered allow to study and discuss the variability of metrics along different systems. In [8] a preliminary and exploratory analysis of the Eclipse subprojects is presented, using a joint application of SNA and traditional software metrics. The entire set of metrics has been summarized performing a Principal Component Analysis (PCA) and obtaining a very reduced number of independent principal components, which allow to represent the classes into a space where they show typical patterns. The preliminary results show how the joint application of traditional and network software metrics may be used to identify subprojects developed with similar functionalities and scopes. In [13] the software graphs of 96 systems of the Java Qualitas Corpus are anlyzed, parsing the source code and identifying the dependencies among classes. Twelve software metrics were analyzed, nine borrowed from Social Net- work Analysis (SNA), and three more traditional software metrics, such as Loc, Fan-in and Fan-out. The results show how the metrics can be partitioned in groups for almost the whole Java Qualitas Corpus, and that such grouping can provide insights on the topology of software networks. For two systems, Eclipse and Netbeans, we computed also the number of bugs, identifying the bugs affecting each class, and finding that some SNA metrics are highly correlated with bugs, while others are strongly anti-correlated. Software fractal dimension In [6] [12] [14] [15] the self similar structure of software networks is used to introduce the fractal dimension as a global software metric associated to software quality, at the system level and at the subproject level. In [6] the source code of various releases of two large OO Open Source (OS) Java software systems, Eclipse and Netbeans is analyzed, investigating the complexity of the whole release and of its subprojects. In all examined cases there exists a scaling region where it is possible to compute a self-similar coefficient, the fractal dimension, using “the box counting method”. Results show that this measure looks fairly related to software quality, acting as a global quality software metric. In particular, we computed the defects of each software system and we found a clear correlation among the number of defects in the system, or in a subproject, and its fractal dimension. This correlation exists across all the subprojects and also along the time evolution of the software systems, as new releases are delivered. In [14] software systems are considered as complex networks which have a self- similar structure under a length-scale transformation. On such complex software networks a self-similar coefficient is computed, also known as fractal dimension, using "the box counting method”. Several releases of the publically available Eclipse software system were analyzed, calculating the fractal dimension for twenty sub-projects, randomly chosen, for every release, as well as for each release as a whole. Our results display an overall consistency among the sub- projects and among all the analyzed releases. The study founds a very good correlation between the fractal dimension and the number of bugs for Eclipse and for twenty sub-projects. This result suggests that the fractal dimension could be considered as a global quality metric for large software systems. Works [12] and [15] propose an algorithm for computing the fractal dimension of a software network, and compare its performances with two other algorithms. Object of study are various large, object-oriented software systems. We built the associated graph for each system, also known as software network, analyzing the binary relationships (dependencies), among classes. We found that the structure of such software networks is self-similar under a length-scale transformation. The fractal dimension of these networks is computed using a Merge algorithm, first devised by the authors, a Greedy Coloring algorithm, based on the equivalence with the graph coloring problem, and a Simulated Annealing algorithm, largely used for efficiently determining minima in multi-dimensional problems. Our study examines both efficiency and accuracy, showing that the Merge algorithm is the most efficient, while the Simulated Annealing is the most accurate. The Greeding Coloring algorithm lays in between the two, having speed very close to the Merge algorithm, and accuracy comparable to the Simulated Annealing algorithm. 1.b Further research activity In [4] [9] [10] [11] the problem of software refactoring is analyzed. The study reported in [4] analyzes the effect of particular refactorings on class coupling for different releases of four Object Oriented (OO) Open Source (OS) Java software systems: Azureus, Jtopen, Jedit and Tomcat, as representative of general Java OS systems. Specifically, the “add parameter” to a method and “remove parameter” from a method refactorings, as defined according to Fowler’s dictionary, may influence class coupling changing fan-in and fan-out of classes they are applied to. The work investigates, both qualitatively and quantitatively, what is the global effect of the application of such refactorings, providing best fitting statistical distributions able to describe the changes in fan-in and fan-out couplings. A detailed analysis of the best fitting parameters and of their changes when refactoring occurs, has been performed, estimating the effect of refactoring on coupling before it is applied. Such estimates may help in determining refactoring costs and benefits . In [9] a study of the effect of fan-in and fan-out metrics is performed from the perspective of two refactorings, “add parameter to” and “remove parameter from” a method, collecting these two refactorings from multiple releases of the Tomcat open source system. Results show significant differences in the profiles of statistical distributions of fan-in and fan-out between refactored and not refactored classes. A strong over-arching theme emerged: developers seemed to focus on the refactoring of classes with relatively high fan-in and fan-out values rather than classes with high values in any one. In [10] is considered for the first time how a single refactoring modified these metric values, what happened when refactorings had been applied to a single class in unison and finally, what influence a set of refactorings had on the shape of FanIn and FanOut distributions. Results indicated that, on average, refactored classes tended to have larger FanIn and FanOut values when compared with non-refactored classes. Where evidence of multiple (different) refactorings applied to the same class was found, the net effect (in terms of FanIn and FanOut coupling values) was negligible. In [11] is shown how highly-coupled classes were more prone to refactoring, particularly through a set of ‘core’ refactorings. However, wide variations were found across systems for our chosen measures of coupling namely, fan-in and fan-out. Specific individual refactorings were also explored to gain an understanding of why these differences may have occurred. An exploration of open questions through the extraction of fifty-two of Fowler’s catalog of refactorings drawn from versions of four open-source systems is accomplished, comparing the coupling characteristics of each set of refactored classes with the corresponding set of non-refactored classes. In [7] I presented some preliminary studies also on the relationships about Micro- patterns, more specifically anti-patterns, and software quality, while in [5] and [21] I analyzed the role of Agile methodologies in software production and the relationships with software quality and the presence of bugs

UniCA Eprints

Power laws in software systems

Author: TONELLI ROBERTO
Publication venue: Università degli Studi di Cagliari
Publication date: 06/03/2012
Field of study

Time evolution and distribution analysis of software bugs from a complex network perspective

Author: Murgia Alessandro
Publication venue
Publication date
Field of study

Successful software systems are constantly under development. Since they have to be updated when new features are introduced, bug are fixed and the system is kept up to date, they require a continuous maintenance. Among these activities the bug fixing is one of themost relevant, because it is determinant for software quality. Unfortunately, software houses have limited time and developers to address all these issues before the product delivery. For this reason, an efficient allocation of these resources is required to obtain the quality required by the market. The keyword for a correct management of software product process is measure. As De-Marco states “you cannot control what you cannot measure”, and this thesis is mainly devoted to this aspect. This dissertation bears with software measures related to bug proneness and distribution analysis of software bugs. The aim is to describe the bug occurrence phenomena, identify useful metrics related to software bugs proneness and finally to characterize how bug population is distributed and evolve, discussing also the model able to explain this evolution. Studying the relationship between code evolution and bug distribution or bug-proneness, we foresee which software structure will come out. Thus, this research provides information and guidelines tomanagers, helping them to plan, schedule activities and allocate resources, during software development

UniCA Eprints

Application of social networking algorithms in program analysis: understanding execution frequencies

Author: Rahman Minhazur
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2011
Field of study

2011 Summer.Includes bibliographical references.There may be some parts of a program that are more commonly used at runtime, whereas there may be other parts that are less commonly used or not used at all. In this exploratory study, we propose an approach to predict how frequently or rarely different parts of a program will get used at runtime without actually running the program. Knowledge of the most frequently executed parts can help identify the most critical and the most testable parts of a program. The portions predicted to be the less commonly executed tend to be hard to test parts of a program. Knowing the hard to test parts of a program can aid the early development of test cases. In our approach we statically analyse code or static models of code (like UML class diagrams), using quantified social networking measures and web structure mining measures. These measures assign ranks to different portions of code for use in predictions of the relative frequency that a section of code will be used. We validated these rank ordering of predictions by running the program with a common set of use cases and identifying the actual rank ordering. We compared the predictions with other measures that use direct coupling or lines of code. We found that our predictions fared better as they were statistically more correlated to the actual rank ordering than the other measures. We present a prototype tool written as an eclipse plugin, that implements and validates our approach. Given the source code of a Java program, our tool computes the values of the metrics required by our approach to present ranks of all classes in order of how frequently they are expected to get used. Our tool can also instrument the source code to log all the necessary information at runtime that is required to validate our predictions

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Using Software Dependency to Bug Prediction

Author: Bing Li
Lulu He
Peng He
Yutao Ma
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Software maintenance, especially bug prediction, plays an important role in evaluating software quality and balancing development costs. This study attempts to use several quantitative network metrics to explore their relationships with bug prediction in terms of software dependency. Our work consists of four main steps. First, we constructed software dependency networks regarding five dependency scenes at the class-level granularity. Second, we used a set of nine representative and commonly used metrics—namely, centrality, degree, PageRank, and HITS, as well as modularity—to quantify the importance of each class. Third, we identified how these metrics were related to the proneness and severity of fixed bugs in Tomcat and Ant and determined the extent to which they were related. Finally, the significant metrics were considered as predictors for bug proneness and severity. The result suggests that there is a statistically significant relationship between class’s importance and bug prediction. Furthermore, betweenness centrality and out-degree metric yield an impressive accuracy for bug prediction and test prioritization. The best accuracy of our prediction for bug proneness and bug severity is up to 54.7% and 66.7% (top 50, Tomcat) and 63.8% and 48.7% (top 100, Ant), respectively, within these two cases

Directory of Open Access Journals

Decision Modules in Models and Implementations

Author: Roubtsov Serguei
Roubtsova E.E.
Publication venue
Publication date: 01/01/2014
Field of study

Open University of the Netherlands Research Portal

Repository TU/e

Assessing sofware quality by micro patterns detection

Author: Destefanis Giuseppe
Publication venue
Publication date: 23/04/2013
Field of study

One of the goals of Software Engineering is to reduce, or at least to try to control, the defectiveness of software systems during the development phase. Software engineers need to have empirical evidence that software metrics are related to software quality. Unfortunately, software quality is quite an elusive concept, software being an immaterial entity that cannot be physically measured in traditional ways. In general, software quality means many things. In software, the narowest sense of product quality is commonly recognized as absence or low incidence of bugs in the product. It is also the most basic meaning of confermance to requirements, because if the software contains too many functional defects, the basic requirement of providing the desired function is not met. To increase overall customer satisfaction as well as satisfaction toward various quality attributes, the quality attributes must be taken into account in the planning and design of software. To improve quality during development, we need models of the develompment process, and within the process we need to select and deploy specific methods and approaches, and employ proper tools and technologies. It is necessary to know measures of the characteristics and quality parameters of the development process and its stages, as well as metrics and models to help ensure that the development process is under control to meet the product’s quality objectives. Software quality metrics tend to measure whether software is well structured, not too simple and not too complex, with cohesive modules that minimize their coupling. Many quality metrics have been proposed for software, depending also on the paradigm and languages used there are metrics for structured programming, object-oriented programming, aspect-oriented programming, and so on. The use of traditional metrics as quality indicators is very dicult. The Lines of Code (LOC) metric (very related to faults), is dicult to use, you can not say to a team of developers to develop classes by imposing a predefined number of lines of code. The use of the micro patterns (introduced by Gil and Maman) metrics, that capture concepts of good or bad programming (like anti patterns) can help developers to focus on those classes that belong to categories of micro patterns prone to fault. The relationship between traditional metrics and micro patterns is useful for enabling these new metrics to evaluate software quality. Micro patterns are similar to design patterns, but their characteristic is that they can be identified automatically, and are at a lower level of abstraction with respect to design patterns. This thesis tackles the problem of measuring software quality in Object Oriented (OO) systems by using such novel approaches based on micro patterns that can be a useful metrics in order to measure the quality of software by showing that certain categories of micro patterns are more fault prone than others, and that the classes that do not correspond to any category of micro patterns are more likely to be faulty. Many empirical studies were performed to validate empirically CK suite under these two aspects, showing an acceptable correlation between CK metrics values and software fault-proneness and diculty of maintenance. In OO, micro patterns can help to identify the portions of code that should be improved (for example those where encapsulation is not respected), and highlight other portions that make up good design practices. The design patterns, defined in the early nineties were an important breakthrough at analysis and design level, but are dicult to be automatically supported at the coding level. There are tools claiming to help finding the usage of design patterns in code, but in practice they are used in a very limited way. On the contrary, micro patterns are defined at coding level, and it is relatively easy to recognize them automatically, thus being able to implement formal conditions on the structure of the class. Thesis overview The thesis is organized according to this scheme: • Chapter 2 provides an overview of the concept of software metrics; • Chapter 3 presents an overview of the design patterns catalogs; • Chapter 4 discusses the micro patterns catalog using the definitions made by Gil and Maman; • Chapter 5 discusses the interpretation of Micro Patterns given by Arcelli and Maggioni; • Chapter 6 present the study of the evolution of five particular micro patterns (anti patterns) in different releases of the Eclipse and NetBeans systems, and the correlations between anti patterns and faults. The analysis confirms previous findings regarding the high coverage of micro patterns onto the system classes, and show that anti patterns not only represent bad Object Oriented programming practices, but may also be associated to the production of worse quality software, since they present a significantly enhanced fault proneness. • Chapter 7 present a study that aims to show, through empirical studies of open source software systems, which categories of micro patterns are more correlated to faults. Gil and Maman demonstrated, and subsequent studies confirmed, that 75% of the classes of a software system are covered by micro patterns. In this chapter is also analyzed the relationship between faults and the remaining 25% of classes that do not match with any micro pattern. We found that these classes are more likely to be faultprone than the others. We also studied the correlation among all the micro patterns of the catalog, in order to verify the existence of relationships between them. • Chapter 8 present a study on micro patterns in different releases of two software systems developed with Object Oriented technologies and Agile process. In this chapter we present some empirical results on two case studies of systems devel- oped with Agile methodologies, and compare them to previous results obtained for non Agile systems. In particular we have verified that the distribution of micro patterns in a software system developed using Agile methodologies does not differ from the distribution studied in other systems, and that the micro patterns fault-proneness is about the same. We also analyzed how the distribution of micro patterns changes in different releases of the same software system. We demonstrate that there is a relationship between the number of faults and the classes that do not match with any micro patterns. We found that these classes are more likely to be fault-prone than the others even in software developed with Agile methodologies • Chapter 9 present the Java tool used in order to extract from the source code the informations about micro patterns distributions. • Chapter 10 discusses the related works in the field

UniCA Eprints

Assessing sofware quality by micro patterns detection

Author
Publication venue: Università degli Studi di Cagliari
Publication date: 23/04/2013
Field of study