98 research outputs found
Towards Identifying Paid Open Source Developers - A Case Study with Mozilla Developers
Open source development contains contributions from both hired and volunteer
software developers. Identification of this status is important when we
consider the transferability of research results to the closed source software
industry, as they include no volunteer developers. While many studies have
taken the employment status of developers into account, this information is
often gathered manually due to the lack of accurate automatic methods. In this
paper, we present an initial step towards predicting paid and unpaid open
source development using machine learning and compare our results with
automatic techniques used in prior work. By relying on code source repository
meta-data from Mozilla, and manually collected employment status, we built a
dataset of the most active developers, both volunteer and hired by Mozilla. We
define a set of metrics based on developers' usual commit time pattern and use
different classification methods (logistic regression, classification tree, and
random forest). The results show that our proposed method identify paid and
unpaid commits with an AUC of 0.75 using random forest, which is higher than
the AUC of 0.64 obtained with the best of the previously used automatic
methods.Comment: International Conference on Mining Software Repositories (MSR) 201
20-MAD -- 20 Years of Issues and Commits of Mozilla and Apache Development
Data of long-lived and high profile projects is valuable for research on
successful software engineering in the wild. Having a dataset with different
linked software repositories of such projects, enables deeper diving
investigations. This paper presents 20-MAD, a dataset linking the commit and
issue data of Mozilla and Apache projects. It includes over 20 years of
information about 765 projects, 3.4M commits, 2.3M issues, and 17.3M issue
comments, and its compressed size is over 6 GB. The data contains all the
typical information about source code commits (e.g., lines added and removed,
message and commit time) and issues (status, severity, votes, and summary). The
issue comments have been pre-processed for natural language processing and
sentiment analysis. This includes emoticons and valence and arousal scores.
Linking code repository and issue tracker information, allows studying
individuals in two types of repositories and provide more accurate time zone
information for issue trackers as well. To our knowledge, this the largest
linked dataset in size and in project lifetime that is not based on GitHub.Comment: 17th International Conference on Mining Software Repositories, 202
Software evolvability - empirically discovered evolvability issues and human evaluations
Evolution of a software system can take decades and can cost up to several billion Euros. Software evolvability refers to how easily software is understood, modified, adapted, corrected, and developed. It has been estimated that software evolvability can explain 25% to 38% of the costs of software evolution. Prior research has presented software evolvability criteria and quantified the criteria utilizing source code metrics. However, the empirical observations of software evolvability issues and human evaluations of them have largely been ignored.
This dissertation empirically studies human evaluations and observations of software evolvability issues. This work utilizes both qualitative and quantitative research methods. Empirical data was collected from controlled experiments with student subjects, and by observing issues that were discovered in real industrial settings.
This dissertation presents a new classification for software evolvability issues. The information provided by the classification is extended by the detailed analysis of evolvability issues that have been discovered in code reviews and their distributions to different issue types. Furthermore, this work studies human evaluations of software evolvability; more specifically, it focuses on the interrater agreement of the evaluations, the affect of demographics, the evolvability issues that humans find to be most significant, as well as the relationship between human evaluation and source code metrics based evaluations.
The results show that code review that is performed after light functional testing reveals three times as many evolvability issues as functional defects. We also discovered a new evolvability issue called "solution approach", which indicates a need to rethink the current solution rather than reorganize it. For solution approach issues, we are not aware of any research that presents or discusses such issues in the software engineering domain. We found weak evidence that software evolvability evaluations are more affected by a person's role in the organization and the relationship (authorship) to the code than by education and work experience. Comparison of code metrics and human evaluations revealed that metrics cannot detect all human found evolvability issues
LogLead -- Fast and Integrated Log Loader, Enhancer, and Anomaly Detector
This paper introduces LogLead, a tool designed for efficient log analysis
benchmarking. LogLead combines three essential steps in log processing:
loading, enhancing, and anomaly detection. The tool leverages Polars, a
high-speed DataFrame library. We currently have Loaders for eight systems that
are publicly available (HDFS, Hadoop, BGL, Thunderbird, Spirit, Liberty,
TrainTicket, and GC Webshop). We have multiple enhancers with three parsers
(Drain, Spell, LenMa), Bert embedding creation and other log representation
techniques like bag-of-words. LogLead integrates to five supervised and four
unsupervised machine learning algorithms for anomaly detection from SKLearn. By
integrating diverse datasets, log representation methods and anomaly detectors,
LogLead facilitates comprehensive benchmarking in log analysis research. We
show that log loading from raw file to dataframe is over 10x faster with
LogLead compared to past solutions. We demonstrate roughly 2x improvement in
Drain parsing speed by off-loading log message normalization to LogLead. Our
brief benchmarking on HDFS indicates that log representations extending beyond
the bag-of-words approach offer limited additional benefits. Tool URL:
https://github.com/EvoTestOps/LogLeadComment: 2024 IEEE International Conference on Software Analysis, Evolution
and Reengineering (SANER
How to Configure Masked Event Anomaly Detection on Software Logs?
Software Log anomaly event detection with masked event prediction has various
technical approaches with countless configurations and parameters. Our
objective is to provide a baseline of settings for similar studies in the
future. The models we use are the N-Gram model, which is a classic approach in
the field of natural language processing (NLP), and two deep learning (DL)
models long short-term memory (LSTM) and convolutional neural network (CNN).
For datasets we used four datasets Profilence, BlueGene/L (BGL), Hadoop
Distributed File System (HDFS) and Hadoop. Other settings are the size of the
sliding window which determines how many surrounding events we are using to
predict a given event, mask position (the position within the window we are
predicting), the usage of only unique sequences, and the portion of data that
is used for training. The results show clear indications of settings that can
be generalized across datasets. The performance of the DL models does not
deteriorate as the window size increases while the N-Gram model shows worse
performance with large window sizes on the BGL and Profilence datasets. Despite
the popularity of Next Event Prediction, the results show that in this context
it is better not to predict events at the edges of the subsequence, i.e., first
or last event, with the best result coming from predicting the fourth event
when the window size is five. Regarding the amount of data used for training,
the results show differences across datasets and models. For example, the
N-Gram model appears to be more sensitive toward the lack of data than the DL
models. Overall, for similar experimental setups we suggest the following
general baseline: Window size 10, mask position second to last, do not filter
out non-unique sequences, and use a half of the total data for training.Comment: Accepted to the New Ideas and Emerging Results (NIER) track of the
38th IEEE International Conference on Software Maintenance and Evolution
(ICSME
A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects
Context: Duplicate detection is a fundamental part of issue management. Systems able to predict whether a new defect report will be closed as a duplicate, may decrease costs by limiting rework and collecting related pieces of information. Goal: Our work explores using Apache Lucene for large-scale duplicate detection based on textual content. Also, we evaluate the previous claim that results are improved if the title is weighted as more important than the description. Method: We conduct a conceptual replication of a well-cited study conducted at Sony Ericsson, using Lucene for searching in the public Android defect repository. In line with the original study, we explore how varying the weighting of the title and the description affects the accuracy. Results: We show that Lucene obtains the best results when the defect report title is weighted three times higher than the description, a bigger difference than has been previously acknowledged. Conclusions: Our work shows the potential of using Lucene as a scalable solution for duplicate detection
Supporting Regression Test Scoping with Visual Analytics
Test managers have to repeatedly select test cases for test activities during evolution of large software systems. Researchers have widely studied automated test scoping, but have not fully investigated decision support with human interaction. We previously proposed the introduction of visual analytics for this purpose. Aim: In this empirical study we investigate how to design such decision support. Method: We explored the use of visual analytics using heat maps of historical test data for test scoping support by letting test managers evaluate prototype visualizations in three focus groups with in total nine industrial test experts. Results: All test managers in the study found the visual analytics useful for supporting test planning. However, our results show that different tasks and contexts require different types of visualizations. Conclusion: Important properties for test planning support are: ability to overview testing from different perspectives, ability to filter and zoom to compare subsets of the testing with respect to various attributes and the ability to manipulate the subset under analysis by selecting and deselecting test cases. Our results may be used to support the introduction of visual test analytics in practice
- …