3 research outputs found

    Test Smells 20 Years Later: Detectability, Validity, and Reliability

    No full text
    Test smells aim to capture design issues in test code that reduces its maintainability. These have been extensively studied and generally found quite prevalent in both human-written and automatically generated test-cases. However, most evidence of prevalence is based on specific static detection rules. Although those are based on the original, conceptual definitions of the various test smells, recent empirical studies indicate that developers perceive warnings raised by detection tools as overly strict and non-representative of the maintainability and quality of test suites. This leads us to re-assess test smell detection tools' detection accuracy and investigate the prevalence and detectability of test smells more broadly.Specifically, we construct a hand-annotated dataset spanning hundreds of test suites both written by developers and generated by two test generation tools (EvoSuite and JTExpert) and performed a multi-stage, cross-validated manual analysis to identify the presence of six types of test smells in these. We then use this manual labeling to benchmark the performance and external validity of two test smell detection tools -- one widely used in prior work and one recently introduced with the express goal to match developer perceptions of test smells.Our results primarily show that the current vocabulary of test smells is highly mismatched to real concerns: multiple smells were ubiquitous on developer-written tests but virtually never correlated with semantic or maintainability flaws; machine-generated tests actually often scored better, but in reality, suffered from a host of problems not well-captured by current test smells. Current test smell detection strategies poorly characterized the issues in these automatically generated test suites; in particular, the older tool's detection strategies misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice, (ii) more accurate detection strategies to be evaluated primarily in industrial contexts.Software Engineerin

    Evidence for a role for cyclic AMP in modulating the action of 5-HT and an excitatory neuropeptide, FLP17A, in the pharyngeal muscle of Caenorhabditis elegans

    No full text
    The feeding activity of the nematode Caenorhabditis elegans is regulated by an anatomically well-defined network of 20 enteric neurones that employs small molecule and neuropeptidergic signalling. Two of the most potent excitatory agents are 5-HT and the neuropeptide FLP17A. Here we have examined the role of cAMP in modulating their excitatory actions by pharmacological manipulation of the level of cAMP. Application of the membrane permeable cAMP analogue, dibutyryl-cAMP (1 ?M), enhanced the excitatory response to both FLP17A and 5-HT. Furthermore, the adenylyl cyclase activator, forskolin (50 nM), significantly enhanced the excitatory response to both FLP17A and 5-HT. The phosphodiesterase inhibitor, ibudilast (10 ?M), enhanced the excitatory response to FLP17A. The protein kinase inhibitor, H-9 dihydrochloride (10 ?M) significantly reduced the excitatory response to 5-HT. H-9 dihydrochloride also had a direct effect on pharyngeal activity. The effect of FLP17A and 5-HT on two mutants, egl-8 (loss-of-function phospholipase-C?) and egl-30 (loss-of-function G?q) was also investigated. Both these mutants have a lower pharyngeal pumping rate than wild-type which has to be considered when interpreting the effects of these mutations on the excitatory responses to FLP17A and 5HT. However, even taking into consideration the lower basal activity of these mutants, it is clear that the percentage increase in pharyngeal pumping rate induced by FLP17A is greatly reduced in both mutants compared to wild-type. In the case of 5-HT, the effect of the mutant backgrounds on the response was less pronounced. Overall, the data support a role for cAMP in modulating the excitatory action of both FLP17A and 5-HT on C. elegans pharyngeal pumping and furthermore implicate an EGL-30 dependent pathway in the regulation of the response to FLP17A

    Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

    No full text
    Test smells attempt to capture design issues in test code that reduce their maintainability. Previous work found such smells to be highly common in automatically generated test-cases, but based this result on specific static detection rules; although these are based on the original definition of “test smells”, a recent empirical study showed that developers perceive these as overly strict and non-representative of the maintainability and quality of test suites. This leads us to investigate how effective such test smell detection tools are on automatically generated test suites. In this paper, we build a dataset of 2,340 test cases automatically generated by EVOSUITE for 100 Java classes. We performed a multi-stage, cross-validated manual analysis to identify six types of test smells and label their instances. We benchmark the performance of two test smell detection tools: one widely used in prior work, and one recently introduced with the express goal to match developer perceptions of test smells. Our results show that these test smell detection strategies poorly characterized the issues in automatically generated test suites; the older tool’s detection strategies, especially, misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice; and (ii) more accurate detection strategies, to be evaluated primarily in industrial contexts.Virtual/online event due to COVID-19Software Engineerin
    corecore