41,719 research outputs found
New sampling strategies when searching for robust solutions
Many real-world optimisation problems involve un- certainties, and in such situations it is often desirable to identify robust solutions that perform well over the possible future scenarios. In this paper, we focus on input uncertainty, such as in manufacturing, where the actual manufactured product may differ from the specified design but should still function well. Estimating a solution’s expected fitness in such a case is challenging, especially if the fitness function is expensive to evaluate, and its analytic form is unknown. One option is to average over a number of scenarios, but this is computationally expensive. The archive sample approximation method reduces the required number of fitness evaluations by re-using previous evaluations stored in an archive. The main challenge in the application of this method lies in determining the locations of additional samples drawn in each generation to enrich the information in the archive and reduce the estimation error. In this paper, we use the Wasserstein distance metric to approximate the possible benefit of a potential sample location on the estimation error, and propose new sampling strategies based on this metric. Contrary to previous studies, we consider a sample’s contribution for the entire population, rather than inspecting each individual separately. This also allows us to dynamically adjust the number of samples to be collected in each generation. An empirical comparison with several previously proposed archive-based sample approximation methods demonstrates the superiority of our approaches
Optimal sampling strategies for multiscale stochastic processes
In this paper, we determine which non-random sampling of fixed size gives the
best linear predictor of the sum of a finite spatial population. We employ
different multiscale superpopulation models and use the minimum mean-squared
error as our optimality criterion. In multiscale superpopulation tree models,
the leaves represent the units of the population, interior nodes represent
partial sums of the population, and the root node represents the total sum of
the population. We prove that the optimal sampling pattern varies dramatically
with the correlation structure of the tree nodes. While uniform sampling is
optimal for trees with ``positive correlation progression'', it provides the
worst possible sampling with ``negative correlation progression.'' As an
analysis tool, we introduce and study a class of independent innovations trees
that are of interest in their own right. We derive a fast water-filling
algorithm to determine the optimal sampling of the leaves to estimate the root
of an independent innovations tree.Comment: Published at http://dx.doi.org/10.1214/074921706000000509 in the IMS
Lecture Notes--Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
Recovery Conditions and Sampling Strategies for Network Lasso
The network Lasso is a recently proposed convex optimization method for
machine learning from massive network structured datasets, i.e., big data over
networks. It is a variant of the well-known least absolute shrinkage and
selection operator (Lasso), which is underlying many methods in learning and
signal processing involving sparse models. Highly scalable implementations of
the network Lasso can be obtained by state-of-the art proximal methods, e.g.,
the alternating direction method of multipliers (ADMM). By generalizing the
concept of the compatibility condition put forward by van de Geer and Buehlmann
as a powerful tool for the analysis of plain Lasso, we derive a sufficient
condition, i.e., the network compatibility condition, on the underlying network
topology such that network Lasso accurately learns a clustered underlying graph
signal. This network compatibility condition relates the location of the
sampled nodes with the clustering structure of the network. In particular, the
NCC informs the choice of which nodes to sample, or in machine learning terms,
which data points provide most information if labeled.Comment: nominated as student paper award finalist at Asilomar 2017. arXiv
admin note: substantial text overlap with arXiv:1704.0210
Counting Methods and Sampling Strategies Determining Pedestrian Numbers
1.1.1 Any new road, road improvement or traffic management
scheme could affect pedestrian journeys in its locality or
elsewhere. Some journeys may be affected directly, with
severance caused where the new road or road improvement cuts
across a pedestrian route, others may be affected indirectly with
a new road causing changes in traffic levels elsewhere. To
enable effects on pedestrians to be given proper weight when
decisions are taken, techniques are required that forecast the
effects of the scheme on the number and quality of pedestrian
journeys. This is particularly true in urban areas, since
effects on pedestrians may be one of the main benefits or
disbenefits of measures to relieve urban traffic.
(Continues..
Simple coarse graining and sampling strategies for image recognition
A conceptually simple way to classify images is to directly compare test-set
data and training-set data. The accuracy of this approach is limited by the
method of comparison used, and by the extent to which the training-set data
cover configuration space. Here we show that this coverage can be substantially
increased using simple strategies of coarse graining (replacing groups of
images by their centroids) and stochastic sampling (using distinct sets of
centroids in combination). We use the MNIST and Fashion-MNIST data sets to show
that coarse graining can be used to convert a subset of training images into
many fewer image centroids, with no loss of accuracy of classification of
test-set images by direct (nearest-neighbor) classification. Distinct batches
of centroids can be used in combination as a means of stochastically sampling
configuration space, and can classify test-set data more accurately than can
the unaltered training set. The approach works most naturally with multiple
processors in parallel
- …