98 research outputs found

    Towards Identifying Paid Open Source Developers - A Case Study with Mozilla Developers

    Full text link
    Open source development contains contributions from both hired and volunteer software developers. Identification of this status is important when we consider the transferability of research results to the closed source software industry, as they include no volunteer developers. While many studies have taken the employment status of developers into account, this information is often gathered manually due to the lack of accurate automatic methods. In this paper, we present an initial step towards predicting paid and unpaid open source development using machine learning and compare our results with automatic techniques used in prior work. By relying on code source repository meta-data from Mozilla, and manually collected employment status, we built a dataset of the most active developers, both volunteer and hired by Mozilla. We define a set of metrics based on developers' usual commit time pattern and use different classification methods (logistic regression, classification tree, and random forest). The results show that our proposed method identify paid and unpaid commits with an AUC of 0.75 using random forest, which is higher than the AUC of 0.64 obtained with the best of the previously used automatic methods.Comment: International Conference on Mining Software Repositories (MSR) 201

    20-MAD -- 20 Years of Issues and Commits of Mozilla and Apache Development

    Full text link
    Data of long-lived and high profile projects is valuable for research on successful software engineering in the wild. Having a dataset with different linked software repositories of such projects, enables deeper diving investigations. This paper presents 20-MAD, a dataset linking the commit and issue data of Mozilla and Apache projects. It includes over 20 years of information about 765 projects, 3.4M commits, 2.3M issues, and 17.3M issue comments, and its compressed size is over 6 GB. The data contains all the typical information about source code commits (e.g., lines added and removed, message and commit time) and issues (status, severity, votes, and summary). The issue comments have been pre-processed for natural language processing and sentiment analysis. This includes emoticons and valence and arousal scores. Linking code repository and issue tracker information, allows studying individuals in two types of repositories and provide more accurate time zone information for issue trackers as well. To our knowledge, this the largest linked dataset in size and in project lifetime that is not based on GitHub.Comment: 17th International Conference on Mining Software Repositories, 202

    Software evolvability - empirically discovered evolvability issues and human evaluations

    Get PDF
    Evolution of a software system can take decades and can cost up to several billion Euros. Software evolvability refers to how easily software is understood, modified, adapted, corrected, and developed. It has been estimated that software evolvability can explain 25% to 38% of the costs of software evolution. Prior research has presented software evolvability criteria and quantified the criteria utilizing source code metrics. However, the empirical observations of software evolvability issues and human evaluations of them have largely been ignored. This dissertation empirically studies human evaluations and observations of software evolvability issues. This work utilizes both qualitative and quantitative research methods. Empirical data was collected from controlled experiments with student subjects, and by observing issues that were discovered in real industrial settings. This dissertation presents a new classification for software evolvability issues. The information provided by the classification is extended by the detailed analysis of evolvability issues that have been discovered in code reviews and their distributions to different issue types. Furthermore, this work studies human evaluations of software evolvability; more specifically, it focuses on the interrater agreement of the evaluations, the affect of demographics, the evolvability issues that humans find to be most significant, as well as the relationship between human evaluation and source code metrics based evaluations. The results show that code review that is performed after light functional testing reveals three times as many evolvability issues as functional defects. We also discovered a new evolvability issue called "solution approach", which indicates a need to rethink the current solution rather than reorganize it. For solution approach issues, we are not aware of any research that presents or discusses such issues in the software engineering domain. We found weak evidence that software evolvability evaluations are more affected by a person's role in the organization and the relationship (authorship) to the code than by education and work experience. Comparison of code metrics and human evaluations revealed that metrics cannot detect all human found evolvability issues

    LogLead -- Fast and Integrated Log Loader, Enhancer, and Anomaly Detector

    Full text link
    This paper introduces LogLead, a tool designed for efficient log analysis benchmarking. LogLead combines three essential steps in log processing: loading, enhancing, and anomaly detection. The tool leverages Polars, a high-speed DataFrame library. We currently have Loaders for eight systems that are publicly available (HDFS, Hadoop, BGL, Thunderbird, Spirit, Liberty, TrainTicket, and GC Webshop). We have multiple enhancers with three parsers (Drain, Spell, LenMa), Bert embedding creation and other log representation techniques like bag-of-words. LogLead integrates to five supervised and four unsupervised machine learning algorithms for anomaly detection from SKLearn. By integrating diverse datasets, log representation methods and anomaly detectors, LogLead facilitates comprehensive benchmarking in log analysis research. We show that log loading from raw file to dataframe is over 10x faster with LogLead compared to past solutions. We demonstrate roughly 2x improvement in Drain parsing speed by off-loading log message normalization to LogLead. Our brief benchmarking on HDFS indicates that log representations extending beyond the bag-of-words approach offer limited additional benefits. Tool URL: https://github.com/EvoTestOps/LogLeadComment: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER

    How to Configure Masked Event Anomaly Detection on Software Logs?

    Full text link
    Software Log anomaly event detection with masked event prediction has various technical approaches with countless configurations and parameters. Our objective is to provide a baseline of settings for similar studies in the future. The models we use are the N-Gram model, which is a classic approach in the field of natural language processing (NLP), and two deep learning (DL) models long short-term memory (LSTM) and convolutional neural network (CNN). For datasets we used four datasets Profilence, BlueGene/L (BGL), Hadoop Distributed File System (HDFS) and Hadoop. Other settings are the size of the sliding window which determines how many surrounding events we are using to predict a given event, mask position (the position within the window we are predicting), the usage of only unique sequences, and the portion of data that is used for training. The results show clear indications of settings that can be generalized across datasets. The performance of the DL models does not deteriorate as the window size increases while the N-Gram model shows worse performance with large window sizes on the BGL and Profilence datasets. Despite the popularity of Next Event Prediction, the results show that in this context it is better not to predict events at the edges of the subsequence, i.e., first or last event, with the best result coming from predicting the fourth event when the window size is five. Regarding the amount of data used for training, the results show differences across datasets and models. For example, the N-Gram model appears to be more sensitive toward the lack of data than the DL models. Overall, for similar experimental setups we suggest the following general baseline: Window size 10, mask position second to last, do not filter out non-unique sequences, and use a half of the total data for training.Comment: Accepted to the New Ideas and Emerging Results (NIER) track of the 38th IEEE International Conference on Software Maintenance and Evolution (ICSME

    A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects

    Get PDF
    Context: Duplicate detection is a fundamental part of issue management. Systems able to predict whether a new defect report will be closed as a duplicate, may decrease costs by limiting rework and collecting related pieces of information. Goal: Our work explores using Apache Lucene for large-scale duplicate detection based on textual content. Also, we evaluate the previous claim that results are improved if the title is weighted as more important than the description. Method: We conduct a conceptual replication of a well-cited study conducted at Sony Ericsson, using Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weighting of the title and the description affects the accuracy. Results: We show that Lucene obtains the best results when the defect report title is weighted three times higher than the description, a bigger difference than has been previously acknowledged. Conclusions: Our work shows the potential of using Lucene as a scalable solution for duplicate detection

    Supporting Regression Test Scoping with Visual Analytics

    Get PDF
    Test managers have to repeatedly select test cases for test activities during evolution of large software systems. Researchers have widely studied automated test scoping, but have not fully investigated decision support with human interaction. We previously proposed the introduction of visual analytics for this purpose. Aim: In this empirical study we investigate how to design such decision support. Method: We explored the use of visual analytics using heat maps of historical test data for test scoping support by letting test managers evaluate prototype visualizations in three focus groups with in total nine industrial test experts. Results: All test managers in the study found the visual analytics useful for supporting test planning. However, our results show that different tasks and contexts require different types of visualizations. Conclusion: Important properties for test planning support are: ability to overview testing from different perspectives, ability to filter and zoom to compare subsets of the testing with respect to various attributes and the ability to manipulate the subset under analysis by selecting and deselecting test cases. Our results may be used to support the introduction of visual test analytics in practice
    • …
    corecore