117 research outputs found

    A Definition of Non-Stationary Bandits

    Full text link
    Despite the subject of non-stationary bandit learning having attracted much recent attention, we have yet to identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones. Prior work has characterized non-stationary bandits as bandits for which the reward distribution changes over time. We demonstrate that this definition can ambiguously classify the same bandit as both stationary and non-stationary; this ambiguity arises in the existing definition's dependence on the latent sequence of reward distributions. Moreover, the definition has given rise to two widely used notions of regret: the dynamic regret and the weak regret. These notions are not indicative of qualitative agent performance in some bandits. Additionally, this definition of non-stationary bandits has led to the design of agents that explore excessively. We introduce a formal definition of non-stationary bandits that resolves these issues. Our new definition provides a unified approach, applicable seamlessly to both Bayesian and frequentist formulations of bandits. Furthermore, our definition ensures consistent classification of two bandits offering agents indistinguishable experiences, categorizing them as either both stationary or both non-stationary. This advancement provides a more robust framework for non-stationary bandit learning

    Non-Stationary Bandit Learning via Predictive Sampling

    Full text link
    Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We show that such failures are attributed to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to non-stationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. Theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulations, we demonstrate that predictive sampling outperforms Thompson sampling in all non-stationary environments examined

    UPSCALE: Unconstrained Channel Pruning

    Full text link
    As neural networks grow in size and complexity, inference speeds decline. To combat this, one of the most effective compression techniques -- channel pruning -- removes channels from weights. However, for multi-branch segments of a model, channel removal can introduce inference-time memory copies. In turn, these copies increase inference latency -- so much so that the pruned model can be slower than the unpruned model. As a workaround, pruners conventionally constrain certain channels to be pruned together. This fully eliminates memory copies but, as we show, significantly impairs accuracy. We now have a dilemma: Remove constraints but increase latency, or add constraints and impair accuracy. In response, our insight is to reorder channels at export time, (1) reducing latency by reducing memory copies and (2) improving accuracy by removing constraints. Using this insight, we design a generic algorithm UPSCALE to prune models with any pruning pattern. By removing constraints from existing pruners, we improve ImageNet accuracy for post-training pruned models by 2.1 points on average -- benefiting DenseNet (+16.9), EfficientNetV2 (+7.9), and ResNet (+6.2). Furthermore, by reordering channels, UPSCALE improves inference speeds by up to 2x over a baseline export.Comment: 29 pages, 26 figures, accepted to ICML 202

    Prediction and optimization of a desulphurization system using CMAC neural network and genetic algorithm

    Get PDF
    In this paper, taking desulphurizing ratio and economic cost as two objectives, a ten-input two-output prediction model was structured and validated for desulphurization system. Cerebellar model articulation controller (CMAC) neural network and genetic algorithm (GA) were used for model building and optimization of cost respectively. In the model building process, the grey relation entropy analysis and uniform design method were used to screen the input variables and study the model parameters separately. Traditional regression analysis and proposed location number analysis method were adopted to analyze output errors of experiment group and predict the results of test group. Results show that regression analyses keep high fit degree with experiment group results while the fitting accuracies for test group are quite different. As for location number analysis, a power function between output errors and location numbers was fitted well with the data of experiment group and test group for SO2. Prediction model was initialized by location number analysis method. Model was validated and cost optimization case was performed with GA subsequently. The result shows that the optimal cost obtained from GA could be reduced by more than 30% compared with original optimal operating parameters under same constraints
    • …
    corecore