117 research outputs found
A Definition of Non-Stationary Bandits
Despite the subject of non-stationary bandit learning having attracted much
recent attention, we have yet to identify a formal definition of
non-stationarity that can consistently distinguish non-stationary bandits from
stationary ones. Prior work has characterized non-stationary bandits as bandits
for which the reward distribution changes over time. We demonstrate that this
definition can ambiguously classify the same bandit as both stationary and
non-stationary; this ambiguity arises in the existing definition's dependence
on the latent sequence of reward distributions. Moreover, the definition has
given rise to two widely used notions of regret: the dynamic regret and the
weak regret. These notions are not indicative of qualitative agent performance
in some bandits. Additionally, this definition of non-stationary bandits has
led to the design of agents that explore excessively. We introduce a formal
definition of non-stationary bandits that resolves these issues. Our new
definition provides a unified approach, applicable seamlessly to both Bayesian
and frequentist formulations of bandits. Furthermore, our definition ensures
consistent classification of two bandits offering agents indistinguishable
experiences, categorizing them as either both stationary or both
non-stationary. This advancement provides a more robust framework for
non-stationary bandit learning
Non-Stationary Bandit Learning via Predictive Sampling
Thompson sampling has proven effective across a wide range of stationary
bandit environments. However, as we demonstrate in this paper, it can perform
poorly when applied to non-stationary environments. We show that such failures
are attributed to the fact that, when exploring, the algorithm does not
differentiate actions based on how quickly the information acquired loses its
usefulness due to non-stationarity. Building upon this insight, we propose
predictive sampling, an algorithm that deprioritizes acquiring information that
quickly loses usefulness. Theoretical guarantee on the performance of
predictive sampling is established through a Bayesian regret bound. We provide
versions of predictive sampling for which computations tractably scale to
complex bandit environments of practical interest. Through numerical
simulations, we demonstrate that predictive sampling outperforms Thompson
sampling in all non-stationary environments examined
UPSCALE: Unconstrained Channel Pruning
As neural networks grow in size and complexity, inference speeds decline. To
combat this, one of the most effective compression techniques -- channel
pruning -- removes channels from weights. However, for multi-branch segments of
a model, channel removal can introduce inference-time memory copies. In turn,
these copies increase inference latency -- so much so that the pruned model can
be slower than the unpruned model. As a workaround, pruners conventionally
constrain certain channels to be pruned together. This fully eliminates memory
copies but, as we show, significantly impairs accuracy. We now have a dilemma:
Remove constraints but increase latency, or add constraints and impair
accuracy. In response, our insight is to reorder channels at export time, (1)
reducing latency by reducing memory copies and (2) improving accuracy by
removing constraints. Using this insight, we design a generic algorithm UPSCALE
to prune models with any pruning pattern. By removing constraints from existing
pruners, we improve ImageNet accuracy for post-training pruned models by 2.1
points on average -- benefiting DenseNet (+16.9), EfficientNetV2 (+7.9), and
ResNet (+6.2). Furthermore, by reordering channels, UPSCALE improves inference
speeds by up to 2x over a baseline export.Comment: 29 pages, 26 figures, accepted to ICML 202
Prediction and optimization of a desulphurization system using CMAC neural network and genetic algorithm
In this paper, taking desulphurizing ratio and economic cost as two objectives, a ten-input two-output prediction model was structured and validated for desulphurization system. Cerebellar model articulation controller (CMAC) neural network and genetic algorithm (GA) were used for model building and optimization of cost respectively. In the model building process, the grey relation entropy analysis and uniform design method were used to screen the input variables and study the model parameters separately. Traditional regression analysis and proposed location number analysis method were adopted to analyze output errors of experiment group and predict the results of test group. Results show that regression analyses keep high fit degree with experiment group results while the fitting accuracies for test group are quite different. As for location number analysis, a power function between output errors and location numbers was fitted well with the data of experiment group and test group for SO2. Prediction model was initialized by location number analysis method. Model was validated and cost optimization case was performed with GA subsequently. The result shows that the optimal cost obtained from GA could be reduced by more than 30% compared with original optimal operating parameters under same constraints
- …