61 research outputs found

    Data Mining for Software Engineering

    Get PDF

    Data mining for software engineering and humans in the loop

    Get PDF
    The field of data mining for software engineering has been growing over the last decade. This field is concerned with the use of data mining to provide useful insights into how to improve software engineering processes and software itself, supporting decision-making. For that, data produced by software engineering processes and products during and after software development are used. Despite promising results, there is frequently a lack of discussion on the role of software engineering practitioners amidst the data mining approaches. This makes adoption of data mining by software engineering practitioners difficult. Moreover, the fact that experts’ knowledge is frequently ignored by data mining approaches, together with the lack of transparency of such approaches, can hinder the acceptability of data mining by software engineering practitioners. To overcome these problems, this position paper provides a discussion of the role of software engineering experts when adopting data mining approaches. It also argues that this role can be extended to increase experts’ involvement in the process of building data mining models. We believe that such extended involvement is not only likely to increase software engineers’ acceptability of the resulting models, but also improve the models themselves. We also provide some recommendations aimed at increasing the success of experts involvement and model acceptability

    Mining developer communication data streams

    Full text link
    This paper explores the concepts of modelling a software development project as a process that results in the creation of a continuous stream of data. In terms of the Jazz repository used in this research, one aspect of that stream of data would be developer communication. Such data can be used to create an evolving social network characterized by a range of metrics. This paper presents the application of data stream mining techniques to identify the most useful metrics for predicting build outcomes. Results are presented from applying the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift. The results indicate that only a small number of the available metrics considered have any significance for predicting the outcome of a build

    git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

    Full text link
    Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure

    Mining Competences of Expert Estimators

    Get PDF
    This paper reports on a study conducted with intention to identify competences of employees engaged on software development projects that are responsible for reliable effort estimation. Execution of assigned project tasks engages different human characteristics and effort estimation is integral part of development process. Competences are defined as knowledge , skills and abilities required to perform job assignments. As input data we used company internal classification and collection of employee competences together with data sets of task effort estimates from ten projects executed in a department of the company specialized for development of IT solutions in telecom domain. Techniques used for modeling are proven data mining methods, the neural network and decision tree algorithms. Results provided mapping of competences to effort estimates and represent valuable knowledge discovery that can be used in practice for selection and evaluation of expert effort estimators

    Data mining use for learning process design of an information source locator agent

    Get PDF
    The aim of this work is to present a data mining application to software engineering. We describe the use of data mining in some parts of the design process of a dynamic decision support system agent-based architecture. The main function of this system is to guide information requirements from users to the domains that offer greater possibilities of answering them. For that purpose, a strategy is developed, which provides the system with capacity for analyzing an information requirement, and determining to which domains it will be directed. To learn from errors made during its operation, a learning mechanism based in CBR techniques is also proposed. On the one hand, by using data mining techniques it is possible to define a discriminating function to classify the system domains into two groups: those that can probably provide an answer to the information requirement made to the system, and those that cannot. On the other hand, the application of data mining to the cases base allows the specification of rules to settle relationships among the stored cases with the aim of inferring possible causes of error in the domains classification. In this way, a learning mechanism is designed to update the knowledge base and thus improve the already made classification as regards the values assigned to the discriminating function.Eje: Aprendizaje y reconocimiento de patronesRed de Universidades con Carreras en Informática (RedUNCI

    DĂ©fis 2025

    Get PDF
    International audienceNew paradigms, languages, modeling, verification, testing approaches and new tools in the field of programming and software should be created in the next 10 years, whether to make life easier for designers and maintainers of computer systems, to model and reliable software or to anticipate technological change. This text summarizes the challenges in the Programming and Software Engineering field on the horizon 2025. This work has been presented and discussed during the national days of the Research Group on Programming and Software Engineering in June 2014 and in September 2014 in Paris.De nouveaux paradigmes, de nouveaux langages, de nouvelles approches de modélisation, de vérification, de tests et de nouveaux outils dans le domaine de la programmation et du logiciel devraient voir le jour dans les dix ans à venir, que ce soit pour faciliter la vie des concepteurs et mainteneurs de systèmes informatiques, pour modéliser et fiabiliser les logiciels ou encore pour devancer l’évolution technologique. Ce texte résume les travaux menés sur les défis du Génie de la Programmation et du Logiciel à l’horizon 2025. Ces travaux ont été l’occasion de présentations et d’échanges lors des journées nationales du Groupe de Recherche Génie de la Programmation et du Logiciel en juin 2014 et lors d’une journée en septembre 2014 à Paris
    • …
    corecore