280,052 research outputs found
Worse Than Spam: Issues In Sampling Software Developers
Background: Reaching out to professional software developers is a crucial
part of empirical software engineering research. One important method to
investigate the state of practice is survey research. As drawing a random
sample of professional software developers for a survey is rarely possible,
researchers rely on various sampling strategies. Objective: In this paper, we
report on our experience with different sampling strategies we employed,
highlight ethical issues, and motivate the need to maintain a collection of key
demographics about software developers to ease the assessment of the external
validity of studies. Method: Our report is based on data from two studies we
conducted in the past. Results: Contacting developers over public media proved
to be the most effective and efficient sampling strategy. However, we not only
describe the perspective of researchers who are interested in reaching goals
like a large number of participants or a high response rate, but we also shed
light onto ethical implications of different sampling strategies. We present
one specific ethical guideline and point to debates in other research
communities to start a discussion in the software engineering research
community about which sampling strategies should be considered ethical.Comment: 6 pages, 2 figures, Proceedings of the 2016 ACM/IEEE International
Symposium on Empirical Software Engineering and Measurement (ESEM 2016), ACM,
201
Search Heuristics, Case-Based Reasoning and Software Project Effort Prediction
This paper reports on the use of search techniques to help optimise a case-based reasoning (CBR) system for predicting software project effort. A major problem, common to ML techniques in general, has been dealing with large numbers of case features, some of which can hinder the prediction process. Unfortunately searching for the optimal feature subset is a combinatorial problem and therefore NP-hard. This paper examines the use of random searching, hill climbing and forward sequential selection (FSS) to tackle this problem. Results from examining a set of real software project data show that even random searching was better than using all available for features (average error 35.6% rather than 50.8%). Hill climbing and FSS both produced results substantially better than the random search (15.3 and 13.1% respectively), but FSS was more computationally efficient. Providing a description of the fitness landscape of a problem along with search results is a step towards the classification of search problems and their assignment to optimum search techniques. This paper attempts to describe the fitness landscape of this problem by combining the results from random searches and hill climbing, as well as using multi-dimensional scaling to aid visualisation. Amongst other findings, the visualisation results suggest that some form of heuristic-based initialisation might prove useful for this problem
Searching for test data with feature diversity
There is an implicit assumption in software testing that more diverse and
varied test data is needed for effective testing and to achieve different types
and levels of coverage. Generic approaches based on information theory to
measure and thus, implicitly, to create diverse data have also been proposed.
However, if the tester is able to identify features of the test data that are
important for the particular domain or context in which the testing is being
performed, the use of generic diversity measures such as this may not be
sufficient nor efficient for creating test inputs that show diversity in terms
of these features. Here we investigate different approaches to find data that
are diverse according to a specific set of features, such as length, depth of
recursion etc. Even though these features will be less general than measures
based on information theory, their use may provide a tester with more direct
control over the type of diversity that is present in the test data. Our
experiments are carried out in the context of a general test data generation
framework that can generate both numerical and highly structured data. We
compare random sampling for feature-diversity to different approaches based on
search and find a hill climbing search to be efficient. The experiments
highlight many trade-offs that needs to be taken into account when searching
for diversity. We argue that recurrent test data generation motivates building
statistical models that can then help to more quickly achieve feature
diversity.Comment: This version was submitted on April 14th 201
Evaluating Random Mutant Selection at Class-Level in Projects with Non-Adequate Test Suites
Mutation testing is a standard technique to evaluate the quality of a test
suite. Due to its computationally intensive nature, many approaches have been
proposed to make this technique feasible in real case scenarios. Among these
approaches, uniform random mutant selection has been demonstrated to be simple
and promising. However, works on this area analyze mutant samples at project
level mainly on projects with adequate test suites. In this paper, we fill this
lack of empirical validation by analyzing random mutant selection at class
level on projects with non-adequate test suites. First, we show that uniform
random mutant selection underachieves the expected results. Then, we propose a
new approach named weighted random mutant selection which generates more
representative mutant samples. Finally, we show that representative mutant
samples are larger for projects with high test adequacy.Comment: EASE 2016, Article 11 , 10 page
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)
We report and fix an important systematic error in prior studies that ranked
classifiers for software analytics. Those studies did not (a) assess
classifiers on multiple criteria and they did not (b) study how variations in
the data affect the results. Hence, this paper applies (a) multi-criteria tests
while (b) fixing the weaker regions of the training data (using SMOTUNED, which
is a self-tuning version of SMOTE). This approach leads to dramatically large
increases in software defect predictions. When applied in a 5*5
cross-validation study for 3,681 JAVA classes (containing over a million lines
of code) from open source systems, SMOTUNED increased AUC and recall by 60% and
20% respectively. These improvements are independent of the classifier used to
predict for quality. Same kind of pattern (improvement) was observed when a
comparative analysis of SMOTE and SMOTUNED was done against the most recent
class imbalance technique. In conclusion, for software analytic tasks like
defect prediction, (1) data pre-processing can be more important than
classifier choice, (2) ranking studies are incomplete without such
pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of
Software Engineering (ICSE), 201
- …