42,759 research outputs found
Reflection-aware static regression test selection
Regression test selection (RTS) aims to speed up regression testing by rerunning only tests that are affected by code changes. RTS can be performed using dynamic or static analysis techniques. A recent study showed that static and dynamic RTS can perform similarly for some medium-sized Java projects. However, the results also showed that static RTS can be sometimes unsafe, missing to select some tests that dynamic RTS selects, and reflection was the only cause of unsafety among the evaluated projects. In this thesis, we investigate five techniques—three purely static techniques and two hybrid static-dynamic techniques—to make static RTS safe with respect to reflection. We implemented four of these reflection-aware techniques as extensions to one reflection-unaware (RU) static RTS technique in a tool called STARTS. We evaluated the fifth technique but did not yet fully implement it. To assess reflection-aware SRTS techniques, we measured benefits and costs of four implemented reflection-aware techniques by comparing their end-to-end times with the RU technique and with RetestAll—the latter runs all tests after every code change. We also compared safety and precision of all five static RTS techniques relative to Ekstazi, a state-of-the-art dynamic RTS technique. Our results on 805 revisions of 22 open-source Java projects show that all reflection-aware techniques we evaluated can make static RTS safe with respect to reflection, but their costs vary widely. The best purely static technique in our study is based on border analysis with minimal border methods which avoids analyzing JDK and saves, on average, 14.1% of the end-to-end time of RetestAll. Furthermore, the results show that a hybrid technique based on per-test analysis is very promising in terms of safety and precision. On the other hand, the worst techniques were based on string analysis; these techniques are imprecise and often lead to selecting to re-run all tests. Taken together, these results show the need for more research into purely static techniques for making static RTS reflection aware
Recommended from our members
Ensemble learning of model hyperparameters and spatiotemporal data for calibration of low-cost PM2.5 sensors.
he PM2.5 air quality index (AQI) measurements from government-built supersites are accurate but cannot provide a dense coverage of monitoring areas. Low-cost PM2.5 sensors can be used to deploy a fine-grained internet-of-things (IoT) as a complement to government facilities. Calibration of low-cost sensors by reference to high-accuracy supersites is thus essential. Moreover, the imputation for missing-value in training data may affect the calibration result, the best performance of calibration model requires hyperparameter optimization, and the affecting factors of PM2.5 concentrations such as climate, geographical landscapes and anthropogenic activities are uncertain in spatial and temporal dimensions. In this paper, an ensemble learning for imputation method selection, calibration model hyperparameterization, and spatiotemporal training data composition is proposed. Three government supersites are chosen in central Taiwan for the deployment of low-cost sensors and hourly PM2.5 measurements are collected for 60 days for conducting experiments. Three optimizers, Sobol sequence, Nelder and Meads, and particle swarm optimization (PSO), are compared for evaluating their performances with various versions of ensembles. The best calibration results are obtained by using PSO, and the improvement ratios with respect to R2, RMSE, and NME, are 4.92%, 52.96%, and 56.85%, respectively
Applying test case prioritization to software microbenchmarks
Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative
Fidelity metrics for virtual environment simulations based on spatial memory awareness states
This paper describes a methodology based on human judgments of memory awareness
states for assessing the simulation fidelity of a virtual environment (VE) in relation
to its real scene counterpart. To demonstrate the distinction between task
performance-based approaches and additional human evaluation of cognitive awareness
states, a photorealistic VE was created. Resulting scenes displayed on a headmounted
display (HMD) with or without head tracking and desktop monitor were
then compared to the real-world task situation they represented, investigating spatial
memory after exposure. Participants described how they completed their spatial
recollections by selecting one of four choices of awareness states after retrieval in
an initial test and a retention test a week after exposure to the environment. These
reflected the level of visual mental imagery involved during retrieval, the familiarity
of the recollection and also included guesses, even if informed. Experimental results
revealed variations in the distribution of participants’ awareness states across conditions
while, in certain cases, task performance failed to reveal any. Experimental
conditions that incorporated head tracking were not associated with visually induced
recollections. Generally, simulation of task performance does not necessarily
lead to simulation of the awareness states involved when completing a memory
task. The general premise of this research focuses on how tasks are achieved,
rather than only on what is achieved. The extent to which judgments of human
memory recall, memory awareness states, and presence in the physical and VE are
similar provides a fidelity metric of the simulation in question
Applying test case prioritization to software microbenchmarks
Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative
- …