280,052 research outputs found

    Worse Than Spam: Issues In Sampling Software Developers

    Full text link
    Background: Reaching out to professional software developers is a crucial part of empirical software engineering research. One important method to investigate the state of practice is survey research. As drawing a random sample of professional software developers for a survey is rarely possible, researchers rely on various sampling strategies. Objective: In this paper, we report on our experience with different sampling strategies we employed, highlight ethical issues, and motivate the need to maintain a collection of key demographics about software developers to ease the assessment of the external validity of studies. Method: Our report is based on data from two studies we conducted in the past. Results: Contacting developers over public media proved to be the most effective and efficient sampling strategy. However, we not only describe the perspective of researchers who are interested in reaching goals like a large number of participants or a high response rate, but we also shed light onto ethical implications of different sampling strategies. We present one specific ethical guideline and point to debates in other research communities to start a discussion in the software engineering research community about which sampling strategies should be considered ethical.Comment: 6 pages, 2 figures, Proceedings of the 2016 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2016), ACM, 201

    Search Heuristics, Case-Based Reasoning and Software Project Effort Prediction

    Get PDF
    This paper reports on the use of search techniques to help optimise a case-based reasoning (CBR) system for predicting software project effort. A major problem, common to ML techniques in general, has been dealing with large numbers of case features, some of which can hinder the prediction process. Unfortunately searching for the optimal feature subset is a combinatorial problem and therefore NP-hard. This paper examines the use of random searching, hill climbing and forward sequential selection (FSS) to tackle this problem. Results from examining a set of real software project data show that even random searching was better than using all available for features (average error 35.6% rather than 50.8%). Hill climbing and FSS both produced results substantially better than the random search (15.3 and 13.1% respectively), but FSS was more computationally efficient. Providing a description of the fitness landscape of a problem along with search results is a step towards the classification of search problems and their assignment to optimum search techniques. This paper attempts to describe the fitness landscape of this problem by combining the results from random searches and hill climbing, as well as using multi-dimensional scaling to aid visualisation. Amongst other findings, the visualisation results suggest that some form of heuristic-based initialisation might prove useful for this problem

    Searching for test data with feature diversity

    Full text link
    There is an implicit assumption in software testing that more diverse and varied test data is needed for effective testing and to achieve different types and levels of coverage. Generic approaches based on information theory to measure and thus, implicitly, to create diverse data have also been proposed. However, if the tester is able to identify features of the test data that are important for the particular domain or context in which the testing is being performed, the use of generic diversity measures such as this may not be sufficient nor efficient for creating test inputs that show diversity in terms of these features. Here we investigate different approaches to find data that are diverse according to a specific set of features, such as length, depth of recursion etc. Even though these features will be less general than measures based on information theory, their use may provide a tester with more direct control over the type of diversity that is present in the test data. Our experiments are carried out in the context of a general test data generation framework that can generate both numerical and highly structured data. We compare random sampling for feature-diversity to different approaches based on search and find a hill climbing search to be efficient. The experiments highlight many trade-offs that needs to be taken into account when searching for diversity. We argue that recurrent test data generation motivates building statistical models that can then help to more quickly achieve feature diversity.Comment: This version was submitted on April 14th 201

    Evaluating Random Mutant Selection at Class-Level in Projects with Non-Adequate Test Suites

    Full text link
    Mutation testing is a standard technique to evaluate the quality of a test suite. Due to its computationally intensive nature, many approaches have been proposed to make this technique feasible in real case scenarios. Among these approaches, uniform random mutant selection has been demonstrated to be simple and promising. However, works on this area analyze mutant samples at project level mainly on projects with adequate test suites. In this paper, we fill this lack of empirical validation by analyzing random mutant selection at class level on projects with non-adequate test suites. First, we show that uniform random mutant selection underachieves the expected results. Then, we propose a new approach named weighted random mutant selection which generates more representative mutant samples. Finally, we show that representative mutant samples are larger for projects with high test adequacy.Comment: EASE 2016, Article 11 , 10 page

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201
    • …
    corecore