274 research outputs found
Bookmaker Consensus and Agreement for the UEFA Champions League 2008/09
Bookmakers odds are an easily available source of ``prospective" information that is thus often employed for forecasting the outcome of sports events. To investigate the statistical properties of bookmakers odds from a variety of bookmakers for a number of different potential outcomes of a sports event, a class of mixed-effects models is explored, providing information about both consensus and (dis)agreement across bookmakers. In an empirical study for the UEFA Champions League, the most prestigious football club competition in Europe, model selection yields a simple and intuitive model with team-specific means for capturing consensus and team-specific standard deviations reflecting agreement across bookmakers. The resulting consensus forecast performs well in practice, exhibiting high correlation with the actual tournament outcome. Furthermore, the teams' agreement can be shown to be strongly correlated with the predicted consensus and can thus be incorporated in a more parsimonious model for agreement while preserving the same consensus fit.Series: Research Report Series / Department of Statistics and Mathematic
Predictive bookmaker consensus model for the UEFA Euro 2016
From 10 June to 10 July 2016 the best European football teams will meet in France to determine the European Champion in the UEFA European Championship 2016 tournament (Euro 2016 for short). For the first time 24 teams compete, expanding the format from 16 teams as in the previous five Euro tournaments. For forecasting the winning probability of each team a predictive model based on bookmaker odds from 19 online bookmakers is employed. The favorite is the host France with a forecasted winning probability of 21.5%, followed by the current World Champion Germany with a winning probability of 20.1%. The defending European Champion Spain follows after some gap with 13.7% and all remaining teams are predicted to have lower chances with England (9.2%) and Belgium (7.7%) being the "best of the rest". Furthermore, by complementing the bookmaker consensus results with simulations of the whole tournament, predicted pairwise probabilities for each possible game at the Euro 2016 are obtained along with "survival" probabilities for each team proceeding to the different stages of the tournament. For example, it can be determined that it is much more likely that top favorites France and Germany meet in the semifinal (7.8%) rather than in the final at the Stade de France (4.2%) - which would be a re-match of the friendly game that was played on 13 November 2015 during the terrorist attacks in Paris and that France won 2-0. Hence it is maybe better that the tournament draw favors a match in the semifinal at Marseille (with an almost even winning probability of 50.5% for France). The most likely final is then that either of the two teams plays against the defending champion Spain with a probability of 5.7% for France vs. Spain and 5.4% for Germany vs. Spain, respectively. All forecasts are the result of an aggregation of quoted winning odds for each team in the Euro 2016: These are first adjusted for profit margins ("overrounds"), averaged on the log-odds scale, and then transformed back to winning probabilities. Moreover, team abilities (or strengths) are approximated by an "inverse" procedure of tournament simulations, yielding estimates of probabilities for all possible pairwise matches at all stages of the tournament. This technique correctly predicted the winner of the FIFA 2010 and Euro 2012 tournaments while missing the winner but correctly predicting the final for the Euro 2008 and three out of four semifinalists at the FIFA 2014 World Cup (Leitner, Zeileis, and Hornik 2008, 2010a,b; Zeileis, Leitner, and Hornik 2012, 2014)
Home victory for Brazil in the 2014 FIFA World Cup
After 36 years the FIFA World Cup returns to South America with the 2014 event being hosted in Brazil (after 1978 in Argentina). And as in all previous South American FIFA World Cups, a South American team is expected to take the victory: Using a bookmaker consensus rating - obtained by aggregating winning odds from 22 online bookmakers - the clear favorite is the host Brazil with a forecasted winning probability of 22.5%, followed by three serious contenders. Neighbor country Argentina is the expected runner-up with a winning probability of 15.8% before Germany with 13.4% and Spain with 11.8%. All other competitors have much lower winning probabilities with the "best of the rest" being the "insider tip" Belgium with a predicted 4.8%. Furthermore, by complementing the bookmaker consensus results with simulations of the whole tournament, predicted pairwise probabilities for each possible game at the FIFA World Cup are obtained along with "survival" probabilities for each team proceeding to the different stages of the tournament. For example, it can be inferred that the most likely final is a match between neighbors Brazil and Argentina (6.5%) with the odds somewhat in favor of Brazil of winning such a final (with a winning probability of 57.8%). However, this outcome is by no means certain and many other courses of the tournament are not unlikely as will be presented here. All forecasts are the result of an aggregation of quoted winning odds for each team in the 2014 FIFA World Cup: These are first adjusted for profit margins ("overrounds"), averaged on the log-odds scale, and then transformed back to winning probabilities. Moreover, team abilities (or strengths) are approximated by an "inverse" procedure of tournament simulations, yielding estimates of probabilities for all possible pairwise matches at all stages of the tournament. This technique correctly predicted the EURO 2008 final (Leitner, Zeileis, and Hornik 2008), with better results than other rating/forecast methods (Leitner, Zeileis, and Hornik 2010a), and correctly predicted Spain as the 2010 FIFA World Champion (Leitner, Zeileis, and Hornik 2010b) and EURO 2012 Champion (Leitner, Zeileis, and Hornik 2012)
History Repeating: Spain Beats Germany in the EURO 2012 Final
Four years after the last European football championship (EURO) in Austria and Switzerland, the two finalists of the EURO 2008 - Spain and Germany - are again the clear favorites for the EURO 2012 in Poland and the Ukraine. Using a bookmaker consensus rating - obtained by aggregating winning odds from 23 online bookmakers - the forecast winning probability for Spain is 25.8% followed by Germany with 22.2%, while all other competitors have much lower winning probabilities (The Netherlands are in third place with a predicted 11.3%). Furthermore, by complementing the bookmaker consensus results with simulations of the whole tournament, we can infer that the probability for a rematch between Spain and Germany in the final is 8.9% with the odds just slightly in favor of Spain for prevailing again in such a final (with a winning probability of 52.9%). Thus, one can conclude that - based on bookmakers' expectations - it seems most likely that history repeats itself and Spain defends its European championship title against Germany. However, this outcome is by no means certain and many other courses of the tournament are not unlikely as will be presented here. All forecasts are the result of an aggregation of quoted winning odds for each team in the EURO 2012: These are first adjusted for profit margins (overrounds), averaged on the log-odds scale, and then transformed back to winning probabilities. Moreover, team abilities (or strengths) are approximated by an inverse procedure of tournament simulations, yielding estimates of all pairwise probabilities (for matches between each pair of teams) as well as probabilities to proceed to the various stages of the tournament. This technique correctly predicted the EURO 2008 final (Leitner, Zeileis, Hornik 2008), with better results than other rating/forecast methods (Leitner, Zeileis, Hornik 2010a), and correctly predicted Spain as the 2010 FIFA World Champion (Leitner, Zeileis, Hornik 2010b). Compared to the EURO 2008 forecasts, there are many parallels but two notable differences: First, the gap between Spain/Germany and all remaining teams is much larger. Second, the odds for the predicted final were slightly in favor of Germany in 2008 whereas this year the situation is reversed
Software Microbenchmarking in the Cloud. How Bad is it Really?
Rigorous performance engineering traditionally assumes measuring on bare-metal environments to control for as many confounding factors as possible. Unfortunately, some researchers and practitioners might not have access, knowledge, or funds to operate dedicated performance-testing hardware, making public clouds an attractive alternative. However, shared public cloud environments are inherently unpredictable in terms of the system performance they provide. In this study, we explore the effects of cloud environments on the variability of performance test results and to what extent slowdowns can still be reliably detected even in a public cloud. We focus on software microbenchmarks as an example of performance tests and execute extensive experiments on three different well-known public cloud services (AWS, GCE, and Azure) using three different cloud instance types per service. We also compare the results to a hosted bare-metal offering from IBM Bluemix. In total, we gathered more than 4.5 million unique microbenchmarking data points from benchmarks written in Java and Go. We find that the variability of results differs substantially between benchmarks and instance types (by a coefficient of variation from 0.03% to > 100%). However, executing test and control experiments on the same instances (in randomized order) allows us to detect slowdowns of 10% or less with high confidence, using state-of-the-art statistical tests (i.e., Wilcoxon rank-sum and overlapping bootstrapped confidence intervals). Finally, our results indicate that Wilcoxon rank-sum manages to detect smaller slowdowns in cloud environments
Applying test case prioritization to software microbenchmarks
Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative
Applying test case prioritization to software microbenchmarks
Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative
- …