13 research outputs found
Localized Cumulative Distributions and a Multivariate Generalization of the Cramér-von Mises Distance
This paper is concerned with distances for comparing multivariate random vectors with a special focus on the case that at least one of the random vectors is of discrete type, i.e., assumes values from a discrete set only. The first contribution is a new type of characterization of multivariate random quantities, the so called Localized Cumulative Distribution (LCD) that, in contrast to the conventional definition of a cumulative distribution, is unique and symmetric. Based on the LCDs of the random vectors under consideration, the second contribution is the definition of generalized distance measures that are suitable for the multivariate case. These distances are used for both analysis and synthesis purposes. Analysis is concerned with assessing whether a given sample stems from a given continuous distribution. Synthesis is concerned with both density estimation, i.e., calculating a suitable continuous approximation of a given sample, and density discretization, i.e., approximation of a given continuous random vector by a discrete one
Sample dispersion is better than sample discrepancy for classification
We want to generate learning data within the context of active learning. First, we recall theoretical results proposing discrepancy as a criterion for generating sample in regression. We show surprisingly that theoretical results about low discrepancy sequences in regression problems are not adequate for classification problems. Secondly we propose dispersion as a criterion for generating data. Then, we present numerical experiments which have a good degree of adequacy with theory
Constructing Low Star Discrepancy Point Sets with Genetic Algorithms
Geometric discrepancies are standard measures to quantify the irregularity of
distributions. They are an important notion in numerical integration. One of
the most important discrepancy notions is the so-called \emph{star
discrepancy}. Roughly speaking, a point set of low star discrepancy value
allows for a small approximation error in quasi-Monte Carlo integration. It is
thus the most studied discrepancy notion.
In this work we present a new algorithm to compute point sets of low star
discrepancy. The two components of the algorithm (for the optimization and the
evaluation, respectively) are based on evolutionary principles. Our algorithm
clearly outperforms existing approaches. To the best of our knowledge, it is
also the first algorithm which can be adapted easily to optimize inverse star
discrepancies.Comment: Extended abstract appeared at GECCO 2013. v2: corrected 3 numbers in
table
Heuristic Approaches to Obtain Low-Discrepancy Point Sets via Subset Selection
Building upon the exact methods presented in our earlier work [J. Complexity,
2022], we introduce a heuristic approach for the star discrepancy subset
selection problem. The heuristic gradually improves the current-best subset by
replacing one of its elements at a time. While we prove that the heuristic does
not necessarily return an optimal solution, we obtain very promising results
for all tested dimensions. For example, for moderate point set sizes in dimension 6, we obtain point sets with star
discrepancy up to 35% better than that of the first points of the Sobol'
sequence. Our heuristic works in all dimensions, the main limitation being the
precision of the discrepancy calculation algorithms.
We also provide a comparison with a recent energy functional introduced by
Steinerberger [J. Complexity, 2019], showing that our heuristic performs better
on all tested instances
Recommended from our members
Entropy, Randomization, Derandomization, and Discrepancy
The star discrepancy is a measure of how uniformly distributed a finite point set is in the d-dimensional unit cube. It is related to high-dimensional numerical integration of certain function classes as expressed by the Koksma-Hlawka inequality. A sharp version of this inequality states that the worst-case error of approximating the integral of functions from the unit ball of some Sobolev space by an equal-weight cubature is exactly the star discrepancy of the set of sample points. In many applications, as, e.g., in physics, quantum chemistry or finance, it is essential to approximate high-dimensional integrals. Thus with regard to the Koksma- Hlawka inequality the following three questions are very important: (i) What are good bounds with explicitly given dependence on the dimension d for the smallest possible discrepancy of any n-point set for moderate n? (ii) How can we construct point sets efficiently that satisfy such bounds? (iii) How can we calculate the discrepancy of given point sets efficiently? We want to discuss these questions and survey and explain some approaches to tackle them relying on metric entropy, randomization, and derandomization