241 research outputs found

    Comparing φ and the F-measure as Performance Metrics for Software-related Classifications

    Get PDF
    Context The F-measure has been widely used as a performance metric when selecting binary classifiers for prediction, but it has also been widely criticized, especially given the availability of alternatives such as φ (also known as Matthews Correlation Coefficient). Objectives Our goals are to (1) investigate possible issues related to the F-measure in depth and show how φ can address them, and (2) explore the relationships between the F-measure and φ. Method Based on the definitions of φ and the F-measure, we derive a few mathematical properties of these two performance metrics and of the relationships between them. To demonstrate the practical effects of these mathematical properties, we illustrate the outcomes of an empirical study involving 70 Empirical Software Engineering datasets and 837 classifiers. Results We show that φ can be defined as a function of Precision and Recall, which are the only two performance metrics used to define the F-measure, and the rate of actually positive software modules in a dataset. Also, φ can be expressed as a function of the F-measure and the rates of actual and estimated positive software modules. We derive the minimum and maximum value of φ for any given value of the F-measure, and the conditions under which both the F-measure and φ rank two classifiers in the same order. Conclusions Our results show that φ is a sensible and useful metric for assessing the performance of binary classifiers. We also recommend that the F-measure should not be used by itself to assess the performance of a classifier, but that the rate of positives should always be specified as well, at least to assess if and to what extent a classifier performs better than random classification. The mathematical relationships described here can also be used to reinterpret the conclusions of previously published papers that relied mainly on the F-measure as a performance metric

    Intelligent Data-Driven Reverse Engineering of Software Design Patterns

    Get PDF
    Recognising implemented instances of Design Patterns (DPs) in software design discloses and recovers a wealth of information about the intention of the original designers and the rationale for their design decisions. Because it is often the case that the documentation available for software systems, if any, is poor and/or obsolete, recovering such information can be of great help and importance for maintenance tasks. However, since DPs are abstractly and vaguely defined, a set of software classes with exactly the same relationships as expected for a DP instance may actually be only accidentally similar. On the other hand, a set of classes with relationships that are, to an extent, different from those typically expected can still be a true DP instance. The deciding factor is mainly concerned with whether or not the set of classes is actually intended to solve the design problem addressed by the DP, thus making the intent a fundamental and defining characteristic of DPs. Discerning the intent of potential instances requires building complex models that cannot be built using only the descriptions of DPs in books and catalogues. Accordingly, a paradigm shift in DP recognition towards fully machine learning based approaches is required. The problem is that no accurate and sufficiently large DP datasets exist, and it is difficult to manually construct one. Moreover, there is a lack of research on the feature set that should be used in DP recognition. The main aim of this thesis is to enable the required paradigm shift by laying down an accurate, comprehensive and information-rich foundation of feature and data sets. In order to achieve this aim, a large set of features is developed to cover a wide range of design aspects, with particular focus on design intent. This set serves as a global feature set from which different subsets can be objectively selected for different DPs. A new and feasible approach to DP dataset construction is designed and used to construct training datasets. The feature and data sets are then used experimentally to build and train DP classifiers. The results demonstrate the accuracy and utility of the sets introduced, and show that fully machine learning based approaches are capable of providing appropriate and well-equipped solutions for the problem of DP recognition.Saudi Cultural Burea

    Modelling software quality : a multidimensional approach

    Full text link
    Les sociétés modernes dépendent de plus en plus sur les systèmes informatiques et ainsi, il y a de plus en plus de pression sur les équipes de développement pour produire des logiciels de bonne qualité. Plusieurs compagnies utilisent des modèles de qualité, des suites de programmes qui analysent et évaluent la qualité d'autres programmes, mais la construction de modèles de qualité est difficile parce qu'il existe plusieurs questions qui n'ont pas été répondues dans la littérature. Nous avons étudié les pratiques de modélisation de la qualité auprès d'une grande entreprise et avons identifié les trois dimensions où une recherche additionnelle est désirable : Le support de la subjectivité de la qualité, les techniques pour faire le suivi de la qualité lors de l'évolution des logiciels, et la composition de la qualité entre différents niveaux d'abstraction. Concernant la subjectivité, nous avons proposé l'utilisation de modèles bayésiens parce qu'ils sont capables de traiter des données ambiguës. Nous avons appliqué nos modèles au problème de la détection des défauts de conception. Dans une étude de deux logiciels libres, nous avons trouvé que notre approche est supérieure aux techniques décrites dans l'état de l'art, qui sont basées sur des règles. Pour supporter l'évolution des logiciels, nous avons considéré que les scores produits par un modèle de qualité sont des signaux qui peuvent être analysés en utilisant des techniques d'exploration de données pour identifier des patrons d'évolution de la qualité. Nous avons étudié comment les défauts de conception apparaissent et disparaissent des logiciels. Un logiciel est typiquement conçu comme une hiérarchie de composants, mais les modèles de qualité ne tiennent pas compte de cette organisation. Dans la dernière partie de la dissertation, nous présentons un modèle de qualité à deux niveaux. Ces modèles ont trois parties: un modèle au niveau du composant, un modèle qui évalue l'importance de chacun des composants, et un autre qui évalue la qualité d'un composé en combinant la qualité de ses composants. L'approche a été testée sur la prédiction de classes à fort changement à partir de la qualité des méthodes. Nous avons trouvé que nos modèles à deux niveaux permettent une meilleure identification des classes à fort changement. Pour terminer, nous avons appliqué nos modèles à deux niveaux pour l'évaluation de la navigabilité des sites web à partir de la qualité des pages. Nos modèles étaient capables de distinguer entre des sites de très bonne qualité et des sites choisis aléatoirement. Au cours de la dissertation, nous présentons non seulement des problèmes théoriques et leurs solutions, mais nous avons également mené des expériences pour démontrer les avantages et les limitations de nos solutions. Nos résultats indiquent qu'on peut espérer améliorer l'état de l'art dans les trois dimensions présentées. En particulier, notre travail sur la composition de la qualité et la modélisation de l'importance est le premier à cibler ce problème. Nous croyons que nos modèles à deux niveaux sont un point de départ intéressant pour des travaux de recherche plus approfondis.As society becomes ever more dependent on computer systems, there is more and more pressure on development teams to produce high-quality software. Many companies therefore rely on quality models, program suites that analyse and evaluate the quality of other programs, but building good quality models is hard as there are many questions concerning quality modelling that have yet to be adequately addressed in the literature. We analysed quality modelling practices in a large organisation and identified three dimensions where research is needed: proper support of the subjective notion of quality, techniques to track the quality of evolving software, and the composition of quality judgments from different abstraction levels. To tackle subjectivity, we propose using Bayesian models as these can deal with uncertain data. We applied our models to the problem of anti-pattern detection. In a study of two open-source systems, we found that our approach was superior to state of the art rule-based techniques. To support software evolution, we consider scores produced by quality models as signals and the use of signal data-mining techniques to identify patterns in the evolution of quality. We studied how anti-patterns are introduced and removed from systems. Software is typically written using a hierarchy of components, yet quality models do not explicitly consider this hierarchy. As the last part of our dissertation, we present two level quality models. These are composed of three parts: a component-level model, a second model to evaluate the importance of each component, and a container-level model to combine the contribution of components with container attributes. This approach was tested on the prediction of class-level changes based on the quality and importance of its components: methods. It was shown to be more useful than single-level, traditional approaches. To finish, we reapplied this two-level methodology to the problem of assessing web site navigability. Our models could successfully distinguish award-winning sites from average sites picked at random. Throughout the dissertation, we present not only theoretical problems and solutions, but we performed experiments to illustrate the pros and cons of our solutions. Our results show that there are considerable improvements to be had in all three proposed dimensions. In particular, our work on quality composition and importance modelling is the first that focuses on this particular problem. We believe that our general two-level models are only a starting point for more in-depth research

    Experimenting the Influence of Numerical Thresholds on Model-based Detection and Refactoring of Performance Antipatterns

    Get PDF
    Performance antipatterns are well-known bad design practices that lead to software products suffering from poor performance. A number of performance antipatterns has been defined and classified and refactoring actions have also been suggested to remove them. In the last few years, we have dedicated some effort to the detection and refactoring of performance antipatterns in software models.A specific characteristic of performance antipatterns is that they contain numerical parameters that may represent thresholds referring to either performance indices (e.g., a device utilization) or design features (e.g., number of interface operations of a software component). In this paper, we analyze the influence of such thresholds on the capability of detecting and refactoring performance antipatterns. In particular, (i) we analyze how a set of detected antipatterns may change while varying the threshold values and (ii) we discuss the influence of thresholds on the complexity of refactoring actions. With the help of a leading example, we quantify the influence using precision and recall metrics
    corecore