41,719 research outputs found

    New sampling strategies when searching for robust solutions

    Get PDF
    Many real-world optimisation problems involve un- certainties, and in such situations it is often desirable to identify robust solutions that perform well over the possible future scenarios. In this paper, we focus on input uncertainty, such as in manufacturing, where the actual manufactured product may differ from the specified design but should still function well. Estimating a solution’s expected fitness in such a case is challenging, especially if the fitness function is expensive to evaluate, and its analytic form is unknown. One option is to average over a number of scenarios, but this is computationally expensive. The archive sample approximation method reduces the required number of fitness evaluations by re-using previous evaluations stored in an archive. The main challenge in the application of this method lies in determining the locations of additional samples drawn in each generation to enrich the information in the archive and reduce the estimation error. In this paper, we use the Wasserstein distance metric to approximate the possible benefit of a potential sample location on the estimation error, and propose new sampling strategies based on this metric. Contrary to previous studies, we consider a sample’s contribution for the entire population, rather than inspecting each individual separately. This also allows us to dynamically adjust the number of samples to be collected in each generation. An empirical comparison with several previously proposed archive-based sample approximation methods demonstrates the superiority of our approaches

    Benchmarking API Costs of Network Sampling Strategies

    Get PDF

    Optimal sampling strategies for multiscale stochastic processes

    Full text link
    In this paper, we determine which non-random sampling of fixed size gives the best linear predictor of the sum of a finite spatial population. We employ different multiscale superpopulation models and use the minimum mean-squared error as our optimality criterion. In multiscale superpopulation tree models, the leaves represent the units of the population, interior nodes represent partial sums of the population, and the root node represents the total sum of the population. We prove that the optimal sampling pattern varies dramatically with the correlation structure of the tree nodes. While uniform sampling is optimal for trees with ``positive correlation progression'', it provides the worst possible sampling with ``negative correlation progression.'' As an analysis tool, we introduce and study a class of independent innovations trees that are of interest in their own right. We derive a fast water-filling algorithm to determine the optimal sampling of the leaves to estimate the root of an independent innovations tree.Comment: Published at http://dx.doi.org/10.1214/074921706000000509 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    Recovery Conditions and Sampling Strategies for Network Lasso

    Full text link
    The network Lasso is a recently proposed convex optimization method for machine learning from massive network structured datasets, i.e., big data over networks. It is a variant of the well-known least absolute shrinkage and selection operator (Lasso), which is underlying many methods in learning and signal processing involving sparse models. Highly scalable implementations of the network Lasso can be obtained by state-of-the art proximal methods, e.g., the alternating direction method of multipliers (ADMM). By generalizing the concept of the compatibility condition put forward by van de Geer and Buehlmann as a powerful tool for the analysis of plain Lasso, we derive a sufficient condition, i.e., the network compatibility condition, on the underlying network topology such that network Lasso accurately learns a clustered underlying graph signal. This network compatibility condition relates the location of the sampled nodes with the clustering structure of the network. In particular, the NCC informs the choice of which nodes to sample, or in machine learning terms, which data points provide most information if labeled.Comment: nominated as student paper award finalist at Asilomar 2017. arXiv admin note: substantial text overlap with arXiv:1704.0210

    Counting Methods and Sampling Strategies Determining Pedestrian Numbers

    Get PDF
    1.1.1 Any new road, road improvement or traffic management scheme could affect pedestrian journeys in its locality or elsewhere. Some journeys may be affected directly, with severance caused where the new road or road improvement cuts across a pedestrian route, others may be affected indirectly with a new road causing changes in traffic levels elsewhere. To enable effects on pedestrians to be given proper weight when decisions are taken, techniques are required that forecast the effects of the scheme on the number and quality of pedestrian journeys. This is particularly true in urban areas, since effects on pedestrians may be one of the main benefits or disbenefits of measures to relieve urban traffic. (Continues..

    Simple coarse graining and sampling strategies for image recognition

    Get PDF
    A conceptually simple way to classify images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data cover configuration space. Here we show that this coverage can be substantially increased using simple strategies of coarse graining (replacing groups of images by their centroids) and stochastic sampling (using distinct sets of centroids in combination). We use the MNIST and Fashion-MNIST data sets to show that coarse graining can be used to convert a subset of training images into many fewer image centroids, with no loss of accuracy of classification of test-set images by direct (nearest-neighbor) classification. Distinct batches of centroids can be used in combination as a means of stochastically sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. The approach works most naturally with multiple processors in parallel
    • …
    corecore