79 research outputs found

    Fine-grained code changes and bugs: Improving bug prediction

    Full text link
    Software development and, in particular, software maintenance are time consuming and require detailed knowledge of the structure and the past development activities of a software system. Limited resources and time constraints make the situation even more difficult. Therefore, a significant amount of research effort has been dedicated to learning software prediction models that allow project members to allocate and spend the limited resources efficiently on the (most) critical parts of their software system. Prominent examples are bug prediction models and change prediction models: Bug prediction models identify the bug-prone modules of a software system that should be tested with care; change prediction models identify modules that change frequently and in combination with other modules, i.e., they are change coupled. By combining statistical methods, data mining approaches, and machine learning techniques software prediction models provide a structured and analytical basis to make decisions.Researchers proposed a wide range of approaches to build effective prediction models that take into account multiple aspects of the software development process. They achieved especially good prediction performance, guiding developers towards those parts of their system where a large share of bugs can be expected. For that, they rely on change data provided by version control systems (VCS). However, due to the fact that current VCS track code changes only on file-level and textual basis most of those approaches suffer from coarse-grained and rather generic change information. More fine-grained change information, for instance, at the level of source code statements, and the type of changes, e.g., whether a method was renamed or a condition expression was changed, are often not taken into account. Therefore, investigating the development process and the evolution of software at a fine-grained change level has recently experienced an increasing attention in research.The key contribution of this thesis is to improve software prediction models by using fine-grained source code changes. Those changes are based on the abstract syntax tree structure of source code and allow us to track code changes at the fine-grained level of individual statements. We show with a series of empirical studies using the change history of open-source projects how prediction models can benefit in terms of prediction performance and prediction granularity from the more detailed change information.First, we compare fine-grained source code changes and code churn, i.e., lines modified, for bug prediction. The results with data from the Eclipse platform show that fine grained-source code changes significantly outperform code churn when classifying source files into bug- and not bug-prone, as well as when predicting the number of bugs in source files. Moreover, these results give more insights about the relation of individual types of code changes, e.g., method declaration changes and bugs. For instance, in our dataset method declaration changes exhibit a stronger correlation with the number of bugs than class declaration changes.Second, we leverage fine-grained source code changes to predict bugs at method-level. This is beneficial as files can grow arbitrarily large. Hence, if bugs are predicted at the level of files a developer needs to manually inspect all methods of a file one by one until a particular bug is located.Third, we build models using source code properties, e.g., complexity, to predict whether a source file will be affected by a certain type of code change. Predicting the type of changes is of practical interest, for instance, in the context of software testing as different change types require different levels of testing: While for small statement changes local unit-tests are mostly sufficient, API changes, e.g., method declaration changes, might require system-wide integration-tests which are more expensive. Hence, knowing (in advance) which types of changes will most likely occur in a source file can help to better plan and develop tests, and, in case of limited resources, prioritize among different types of testing.Finally, to assist developers in bug triaging we compute prediction models based on the attributes of a bug report that can be used to estimate whether a bug will be fixed fast or whether it will take more time for resolution.The results and findings of this thesis give evidence that fine-grained source code changes can improve software prediction models to provide more accurate results

    Improving Program Comprehension and Recommendation Systems Using Developers' Context

    Get PDF
    RÉSUMÉ Avec les besoins croissants des utilisateurs ainsi que l’évolution de la technologie, les programmes informatiques deviennent de plus en plus complexes. Ces programmes doivent évolués et être maintenus afin de corriger les bogues et–ou s’adapter aux nouveaux besoins. Lors de la maintenance des programmes, la compréhension des programmes est une activité indispensable et préliminaire à tout changement sur le programme. Environ 50% du temps de maintenance d’un programme est consacré à leur compréhension. Plusieurs facteurs affectent cette compréhension. L’étude de ces facteurs permet non seulement de les comprendre et d’identifier les problèmes que peuvent rencontrer les développeurs mais aussi et surtout de concevoir des outils permettant de faciliter leur travail et d’améliorer ainsi leur productivité. Pour comprendre le programme et effectuer des tâches de maintenance, les développeurs doivent naviguer entre les éléments du programme (par exemple les fichiers) dans l’objectif de trouver le(s) élément(s) à modifier pour réaliser leurs tâches. L’exploration du programme est donc essentielle. Notre thèse est que l’effort de maintenance n’est pas seulement lié à la complexité de la tâche, mais aussi à la façon dont les développeurs explorent un programme. Pour soutenir cette thèse, nous utilisons les données d’interaction qui contiennent les activités effectuées par les développeurs durant la maintenance. D’abord, nous étudions les bruits dans les données d’interaction et montrons qu’environ 6% du temps de réalisation d’une tâche ne sont pas contenus dans ces données et qu’elles contiennent environ 28% d’activités faussement considérées comme modifications effectuées sur le programme. Nous proposons une approche de nettoyage de ces bruits qui atteind une précision et un rappel jusqu’à 93% et 81%, respectivement. Ensuite, nous montrons que ces bruits impactent les résultats de certains travaux précédents et que le nettoyage des bruits améliore la précision et le rappel des systèmes de récommendation jusqu’à environ 51% et 57%, respectivement. Nous montrons également utilisant les données d’interaction que l’effort de maintenance n’est pas seulement lié à la complexité de la tâche de maintenance mais aussi à la façon dont les développeurs explorent les programmes. Ainsi, nous validons notre thèse et concluons que nous pouvons nettoyer des données d’interaction, les utiliser pour calculer l’effort de maintenance et comprendre l’impact de l’exploration du programme sur l’effort. Nous recommendons aux chercheurs de nettoyer les traces d’interaction et de les utiliser lorsqu’elles sont disponibles pour calculer l’effort de maintenance. Nous mettons à la disposition des fournisseurs d’outils de monitoring et de recommendation des résultats leur permettant d’améliorer leurs outils en vue d’aider les développeurs à plus de productivité.----------ABSTRACT Software systems must be maintained and evolved to fix bugs and adapted to new technologies and requirements. Maintaining and evolving systems require to understand the systems (e.g., program comprehension) prior to any modification. Program comprehension consumes about half of developers' time during software maintenance. Many factors impact program comprehension (e.g., developers, systems, tasks). The study of these factors aims (1) to identify and understand problems faced by developers during maintenance task and (2) to build tools to support developers and improve their productivity. To evolve software, developers must always explore the program, i.e., navigate through its entities. The purpose of this navigation is to find the subset of entities relevant to the task. Our thesis is that the maintenance effort is not only correlated to the complexity of the implementation of the task, but developers' exploration of a program also impacts the maintenance effort. To validate our thesis, we consider interaction traces data, which are developers' activities logs collected during maintenance task. We study noise in interaction traces data and show that these data miss about 6% of the time spent to perform the task and contain on average about 28% of activities wrongly considered as modification of the source code. We propose an approach to clean interaction traces. Our approach achieves up to 93% precision and 81% recall. Using our cleaning approach, we show that some previous works that use interaction traces have been impacted by noise and we improve the recommendation of entities by up to 51% precision and 57% recall. We also show that the effort is not only correlated to the complexity of the implementation of the task but impacted by the developers' exploration strategies. Thus, we validate our thesis and conclude that we can clean interaction traces data and use them to assess developers' effort, understand how developers spend their effort, and assess the impact of their program exploration on their effort. Researchers and practitioners should be aware of the noise in interaction traces. We recommend researchers to clean interaction traces prior to their usage and use them to compute the effort of maintenance tasks instead of estimating from bug report. We provide practitioners with an approach to improve the collection of interaction traces and the accuracy of recommendation systems. The use of our approach to improve these tools will likely improve the developers' productivity

    Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

    Get PDF
    Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.School of ComputingPh. D. (Computer Science

    Towards the exploration strategies by mining Mylyns' interaction histories

    Get PDF
    When developers perform a maintenance task, they always explore the program, i.e., move from one program entity to another. However, even though maintenance is a crucial task, the exploration strategies (ES) used by developers to navigate through the program entities remain unstudied. This lack of study prevents us from understanding how developers explore a program and perform a change task, from recommending strategies to developers, and (ultimately) from critically evaluating a developer's exploration performance. As a first step towards understanding ES, we mined interaction histories (IH) gathered using the Eclipse Mylyn plugin from developers performing a change task on four open-source projects (ECF, Mylyn, PDE, and Eclipse Platform). An ES is defined and characterized by the way (how) the developers navigate through the program entities. Using the Gini inequality index on the number of revisits of program entities, we observe that ES can be either centralized (CES) or extended (EES). We automatically classified interaction histories as CES or EES and performed an empirical study to ascertain the effect of the ES on the task duration and effort. We found that, although an EES requires more exploration effort than a CES, an EES is less time consuming than a CES. Extensive work (number of days spent performing a task) typically imply a CES. Our results show that developers who follow an EES have a methodical investigation of source code while developers who follow a CES have an opportunistic exploration of source code

    Ranking-based approaches for localizing faults

    Get PDF

    Quality Assessment and Prediction in Software Product Lines

    Get PDF
    At the heart of product line development is the assumption that through structured reuse later products will be of a higher quality and require less time and effort to develop and test. This thesis presents empirical results from two case studies aimed at assessing the quality aspect of this claim and exploring fault prediction in the context of software product lines. The first case study examines pre-release faults and change proneness of four products in PolyFlow, a medium-sized, industrial software product line; the second case study analyzes post-release faults using pre-release data over seven releases of four products in Eclipse, a very large, open source software product line.;The goals of our research are (1) to determine the association between various software metrics, as well as their correlation with the number of faults at the component/package level; (2) to characterize the fault and change proneness of components/packages at various levels of reuse; (3) to explore the benefits of the structured reuse found in software product lines; and (4) to evaluate the effectiveness of predictive models, built on a variety of products in a software product line, to make accurate predictions of pre-release software faults (in the case of PolyFlow) and post-release software faults (in the case of Eclipse).;The research results of both studies confirm, in a software product line setting, the findings of others that faults (both pre- and post-release) are more highly correlated to change metrics than to static code metrics, and are mostly contained in a small set of components/ packages. The longitudinal aspect of our research indicates that new products do benefit from the development and testing of previous products. The results also indicate that pre-existing components/packages, including the common components/packages, undergo continuous change, but tend to sustain low fault densities. However, this is not always true for newly developed components/packages. Finally, the results also show that predictions of pre-release faults in the case of PolyFlow and post-release faults in the case of Eclipse can be done accurately from pre-release data, and furthermore, that these predictions benefit from information about additional products in the software product lines
    • …
    corecore