Measurement-based probabilistic (MBP) methods like Extreme Value Theory (EVT) and the Markov's Inequality have been exploited to derive probabilistic Worst-Case Execution Time (pWCET) estimates. Usually, the reliability and accuracy of pWCET techniques have been evaluated on medium to large sample sizes, N = [103, 105]. However, several works increasingly advocate for containing the cost of carrying out the test campaign by reducing the number of executions (i.e. the sample size) required by pWCET analysis. Specific scenarios, for example, impose inherent limitations on the collection of timing measurements due to cost and availability of appropriate testing facilities. In this work, we analyze the impact of small sample sizes on MBP. Our analysis shows that classical EVT models for tail estimation require a threshold that estimates where the tail of the distribution begins. In low sample scenarios, the uncertainty in determining this threshold can compromise the reliability of EVT estimates. We also assess the impact of small samples on RESTK, a time forecast method based on Markov's Inequality. Our results with synthetic data and representative kernels show that RESTK provides the best trade-off in terms of trustworthiness and tightness for small samples, partly due to not relying on the selection of any threshold, as opposed to EVT.The research leading to these results has received funding from
the European Union’s Horizon Europe Programme under the SAF-
EXPLAIN Project (www.safeexplain.eu), grant agreement num.
101069595. Authors also appreciate the support given to the Re-
search Group SSAS (Code: 2021 SGR 00637) by the Research and
University Department of the Generalitat de Catalunya.Peer ReviewedPostprint (author's final draft
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.