31 research outputs found

    Probabilistic Tools for the Analysis of Randomized Optimization Heuristics

    Full text link
    This chapter collects several probabilistic tools that proved to be useful in the analysis of randomized search heuristics. This includes classic material like Markov, Chebyshev and Chernoff inequalities, but also lesser known topics like stochastic domination and coupling or Chernoff bounds for geometrically distributed random variables and for negatively correlated random variables. Most of the results presented here have appeared previously, some, however, only in recent conference publications. While the focus is on collecting tools for the analysis of randomized search heuristics, many of these may be useful as well in the analysis of classic randomized algorithms or discrete random structures.Comment: 91 page

    Optimal high-dimensional and nonparametric distributed testing under communication constraints

    Get PDF
    We derive minimax testing errors in a distributed framework where the data is split over multiple machines and their communication to a central machine is limited to b bits. We investigate both the d- and infinite-dimensional signal detection problem under Gaussian white noise. We also derive distributed testing algorithms reaching the theoretical lower bounds. Our results show that distributed testing is subject to fundamentally different phenomena that are not observed in distributed estimation. Among our findings, we show that testing protocols that have access to shared randomness can perform strictly better in some regimes than those that do not. We also observe that consistent nonparametric distributed testing is always possible, even with as little as 1-bit of communication and the corresponding test outperforms the best local test using only the information available at a single local machine. Furthermore, we also derive adaptive nonparametric distributed testing strategies and the corresponding theoretical lower bound

    Bootstrap percolation in inhomogeneous random graphs

    Full text link
    A bootstrap percolation process on a graph G is an "infection" process which evolves in rounds. Initially, there is a subset of infected nodes and in each subsequent round every uninfected node which has at least r infected neighbours becomes infected and remains so forever. The parameter r > 1 is fixed. We consider this process in the case where the underlying graph is an inhomogeneous random graph whose kernel is of rank 1. Assuming that initially every vertex is infected independently with probability p > 0, we provide a law of large numbers for the number of vertices that will have been infected by the end of the process. We also focus on a special case of such random graphs which exhibit a power-law degree distribution with exponent in (2,3). The first two authors have shown the existence of a critical function a_c(n) such that a_c(n)=o(n) with the following property. Let n be the number of vertices of the underlying random graph and let a(n) be the number of the vertices that are initially infected. Assume that a set of a(n) vertices is chosen randomly and becomes externally infected. If a(n) << a_c(n), then the process does not evolve at all, with high probability as n grows, whereas if a(n)>> a_c(n), then with high probability the final set of infected vertices is linear. Using the techniques of the previous theorem, we give the precise asymptotic fraction of vertices which will be eventually infected when a(n) >> a_c (n) but a(n) = o(n). Note that this corresponds to the case where p approaches 0.Comment: 42 page

    Change-point Problem and Regression: An Annotated Bibliography

    Get PDF
    The problems of identifying changes at unknown times and of estimating the location of changes in stochastic processes are referred to as the change-point problem or, in the Eastern literature, as disorder . The change-point problem, first introduced in the quality control context, has since developed into a fundamental problem in the areas of statistical control theory, stationarity of a stochastic process, estimation of the current position of a time series, testing and estimation of change in the patterns of a regression model, and most recently in the comparison and matching of DNA sequences in microarray data analysis. Numerous methodological approaches have been implemented in examining change-point models. Maximum-likelihood estimation, Bayesian estimation, isotonic regression, piecewise regression, quasi-likelihood and non-parametric regression are among the methods which have been applied to resolving challenges in change-point problems. Grid-searching approaches have also been used to examine the change-point problem. Statistical analysis of change-point problems depends on the method of data collection. If the data collection is ongoing until some random time, then the appropriate statistical procedure is called sequential. If, however, a large finite set of data is collected with the purpose of determining if at least one change-point occurred, then this may be referred to as non-sequential. Not surprisingly, both the former and the latter have a rich literature with much of the earlier work focusing on sequential methods inspired by applications in quality control for industrial processes. In the regression literature, the change-point model is also referred to as two- or multiple-phase regression, switching regression, segmented regression, two-stage least squares (Shaban, 1980), or broken-line regression. The area of the change-point problem has been the subject of intensive research in the past half-century. The subject has evolved considerably and found applications in many different areas. It seems rather impossible to summarize all of the research carried out over the past 50 years on the change-point problem. We have therefore confined ourselves to those articles on change-point problems which pertain to regression. The important branch of sequential procedures in change-point problems has been left out entirely. We refer the readers to the seminal review papers by Lai (1995, 2001). The so called structural change models, which occupy a considerable portion of the research in the area of change-point, particularly among econometricians, have not been fully considered. We refer the reader to Perron (2005) for an updated review in this area. Articles on change-point in time series are considered only if the methodologies presented in the paper pertain to regression analysis

    Electronic voting : 6th International Joint Conference, E-Vote-ID 2021, virtual event, October 5-8, 2021

    Get PDF
    This book constitutes the proceedings of the 6th International Conference on Electronic Voting, E-Vote-ID 2021, held online -due to COVID -19- in Bregenz, Austria, in October 2021. The 14 full papers presented were carefully reviewed and selected from 55 submissions. The conference collected the most relevant debates on the development of Electronic Voting, from aspects relating to security and usability through to practical experiences and applications of voting systems, as well as legal, social or political aspects

    Cable Layout Optimization Problems in the Context of Renewable Energy Sources

    Get PDF

    Hypothesis testing and confidence sets: why Bayesian not frequentist, and how to set a prior with a regulatory authority

    Full text link
    We marshall the arguments for preferring Bayesian hypothesis testing and confidence sets to frequentist ones. We define admissible solutions to inference problems, noting that Bayesian solutions are admissible. We give seven weaker common-sense criteria for solutions to inference problems, all failed by these frequentist methods but satisfied by any admissible method. We note that pseudo-Bayesian methods made by handicapping Bayesian methods to satisfy criteria on type I error rate makes them frequentist not Bayesian in nature. We give four examples showing the differences between Bayesian and frequentist methods; the first to be accessible to those with no calculus, the second to illustrate dramatically in abstract what is wrong with these frequentist methods, the third to show that the same problems arise, albeit to a lesser extent, in everyday statistical problems, and the fourth to illustrate how on some real-life inference problems Bayesian methods require less data than fixed sample-size (resp. pseudo-Bayesian) frequentist hypothesis testing by factors exceeding 3000 (resp 300) without recourse to informative priors. To address the issue of different parties with opposing interests reaching agreement on a prior, we illustrate the beneficial effects of a Bayesian "Let the data decide" policy both on results under a wide variety of conditions and on motivation to reach a common prior by consent. We show that in general the frequentist confidence level contains less relevant Shannon information than the Bayesian posterior, and give an example where no deterministic frequentist critical regions give any relevant information even though the Bayesian posterior contains up to the maximum possible amount. In contrast use of the Bayesian prior allows construction of non-deterministic critical regions for which the Bayesian posterior can be recovered from the frequentist confidence.Comment: 123 pages, 61 figures, 11 tables; v2 has corrected figures 50 and 52; v4 has mnemonics added for the main criteria and other minor corrections on pages listed at the bottom of page 1; v5 has new criterion "Complementarity" and new figures 14 and 1

    Extreme-Value Theory for Large Fork-Join Queues

    Get PDF

    Statistical modelling of environmental extremes

    Get PDF
    This thesis is concerned with the development of theory and statistical methodologies that may be used to analyse environmental extremes. As extreme environmental events are often associated with large economic costs and loss of human life, accurate statistical modelling of such events is crucial in order to be able to accurately estimate their frequency and intensity. A key feature of environmental time series is that they display serial correlation which must be modelled in order for valid inferences to be drawn. One line of research in this thesis is the development of flexible time series models that may be used to simulate the behaviour of an environmental process after entering an extreme state. This allows us to estimate quantities such as the mean duration of an extreme event. We illustrate our modelling approach and methodology by simulating the behaviour of daily maximum temperature in Orleans, France, over a three week period given that the temperature exceeds 35C at the start of the period. Much of extreme value theory for time series has been developed under the assumption of strict stationarity, a mathematically convenient but often unrealistic assumption for environmental data. Our second project extends some well known classical results for strictly stationary time series to a more general setting that allows for non-stationarity. We show that for weakly dependent time series with common marginal distributions, the distribution of the sample maximum at large thresholds is characterized by a parameter that plays an analogous role to the extremal index of a stationary time series, and may be estimated similarly. Our results are applied to the particular case where non-stationarity arises through periodicity in the dependence structure as may be expected in certain environmental time series. We also show how our results may be further generalized to allow for different marginal distributions. Another strand of research in this thesis concerns the detection and quantification of changes in the distribution of the annual maximum daily maximum temperature (TXx) in a large gridded data set of European daily temperature during the years 1950-2018. We model TXx throughout Europe using a generalized extreme value distribution, with the log of the atmospheric concentration of CO2 as a covariate. It is commonplace in the geoscientific literature for such models to be fit separately at each spatial location over the domain of interest. To reflect the fact that nearby locations are expected to be similarly affected by any climate change, we instead consider models that incorporate spatial dependence, and thus increase efficiency in parameter estimation compared to separate model fits. We find strong evidence for shifts towards hotter temperatures throughout Europe. Averaged across our spatial domain, the 100-year return temperature based on the 2018 climate is approximately 2C hotter than that based on the 1950 climate. Our final project concerns the evaluation of bias in climate model output and how such biases contribute to biases in hazard indices. Based on copula theory we develop a multivariate bias-assessment framework, which allows us to disentangle the biases in hazard indicators in terms of biases in the underlying univariate drivers and their statistical dependence. Based on this framework, we dissect biases in fire and heat stress hazards in a set of global climate models by considering two simplified hazard indicators: the wet-bulb globe temperature (WBGT) and the Chandler burning index (CBI)
    corecore