7,167 research outputs found
Deep reinforcement learning für workload balance und Fälligkeitskontrolle in wafer fabs
Semiconductor wafer fabrication facilities (wafer fabs) often prioritize two operational objectives: work-in-process (WIP) and due date. WIP-oriented and due date-oriented dispatching rules are two commonly used methods to achieve workload balance and on-time delivery, respectively. However, it often requires sophisticated heuristics to achieve both objectives simultaneously. In this paper, we propose a novel approach using deep-Q-network reinforcement learning (DRL) for dispatching in wafer fabs. The DRL approach differs from traditional dispatching methods by using dispatch agents at work-centers to observe state changes in the wafer fabs. The agents train their deep-Q-networks by taking the states as inputs, allowing them to select the most appropriate dispatch action. Additionally, the reward function is integrated with workload and due date information on both local and global levels. Compared to the traditional WIP and due date-oriented rules, as well as heuristics-based rule in literature, the DRL approach is able to produce better global performance with regard to workload balance and on-time delivery
A general theory of intertemporal decision-making and the perception of time
Animals and humans make decisions based on their expected outcomes. Since
relevant outcomes are often delayed, perceiving delays and choosing between
earlier versus later rewards (intertemporal decision-making) is an essential
component of animal behavior. The myriad observations made in experiments
studying intertemporal decision-making and time perception have not yet been
rationalized within a single theory. Here we present a
theory-Training--Integrated Maximized Estimation of Reinforcement Rate
(TIMERR)--that explains a wide variety of behavioral observations made in
intertemporal decision-making and the perception of time. Our theory postulates
that animals make intertemporal choices to optimize expected reward rates over
a limited temporal window; this window includes a past integration interval
(over which experienced reward rate is estimated) and the expected delay to
future reward. Using this theory, we derive a mathematical expression for the
subjective representation of time. A unique contribution of our work is in
finding that the past integration interval directly determines the steepness of
temporal discounting and the nonlinearity of time perception. In so doing, our
theory provides a single framework to understand both intertemporal
decision-making and time perception.Comment: 37 pages, 4 main figures, 3 supplementary figure
NEUROBEHAVIORAL MEASUREMENTS OF NATURAL AND OPIOID REWARD VALUE
In the last decade, (non)prescription opioid abuse, opioid use disorder (OUD) diagnoses, and opioid-related overdoses have risen and represent a significant public health concern. One method of understanding OUD is as a disorder of choice that requires choosing opioid rewards at the expense of other nondrug rewards. The characterization of OUD as a disorder of choice is important as it implicates decision- making processes as therapeutic targets, such as the valuation of opioid rewards. However, reward-value measurement and interpretation are traditionally different in substance abuse research compared to related fields such as economics, animal behavior, and neuroeconomics and may be less effective for understanding how opioid rewards are valued. The present research therefore used choice procedures in line with behavioral/neuroeconomic studies to determine if drug-associated decision making could be predicted from economic choice theories. In Experiment 1, rats completed an isomorphic food-food probabilistic choice task with dynamic, unpredictable changes in reward probability that required constant updating of reward values. After initial training, the reward magnitude of one choice subsequently increased from one to two to three pellets. Additionally, rats were split between the Signaled and Unsignaled groups to understand how cues modulate reward value. After each choice, the Unsignaled group received distinct choice-dependent cues that were uninformative of the choice outcome. The Signaled group also received uninformative cues on one option, but the alternative choice produced reward-predictive cues that informed the trial outcome as a win or loss. Choice data were analyzed at a molar level using matching equations and molecular level using reinforcement learning (RL) models to determine how probability, reward magnitude, and reward-associated cues affected choice. Experiment 2 used an allomorphic drug versus food procedure where the food reward for one option was replaced by a self-administered remifentanil (REMI) infusion at doses of 1, 3 and 10 μg/kg. Finally, Experiment 3 assessed the potential for both REMI and food reward value to be commonly scaled within the brain by examining changes in nucleus accumbens (NAc) Oxygen (O2) dynamics. Results showed that increasing reward probability, magnitude, and the presence of reward-associated cues all independently increased the propensity of choosing the associated choice alternative, including REMI drug choices. Additionally, both molar matching and molecular RL models successfully parameterized rats’ decision dynamics. O2 dynamics were generally commensurate with the idea of a common value signal for REMI and food with changes in O2 signaling scaling with the reward magnitude of REMI rewards. Finally, RL model-derived reward prediction errors significantly correlated with peak O2 activity for reward delivery, suggesting a possible neurological mechanism of value updating. Results are discussed in terms of their implications for current conceptualizations of substance use disorders including a potential need to change the discourse surrounding how substance use disorders are modeled experimentally. Overall, the present research provides evidence that a choice model of substance use disorders may be a viable alternative to the disease model and could facilitate future treatment options centered around economic principles
A Local Circuit Model of Learned Striatal and Dopamine Cell Responses under Probabilistic Schedules of Reward
Before choosing, it helps to know both the expected value signaled by a predictive cue and the associated uncertainty that the reward will be forthcoming. Recently, Fiorillo et al. (2003) found the dopamine (DA) neurons of the SNc exhibit sustained responses related to the uncertainty that a cure will be followed by reward, in addition to phasic responses related to reward prediction errors (RPEs). This suggests that cue-dependent anticipations of the timing, magnitude, and uncertainty of rewards are learned and reflected in components of the DA signals broadcast by SNc neurons. What is the minimal local circuit model that can explain such multifaceted reward-related learning? A new computational model shows how learned uncertainty responses emerge robustly on single trial along with phasic RPE responses, such that both types of DA responses exhibit the empirically observed dependence on conditional probability, expected value of reward, and time since onset of the reward-predicting cue. The model includes three major pathways for computing: immediate expected values of cures, timed predictions of reward magnitudes (and RPEs), and the uncertainty associated with these predictions. The first two model pathways refine those previously modeled by Brown et al. (1999). A third, newly modeled, pathway is formed by medium spiny projection neurons (MSPNs) of the matrix compartment of the striatum, whose axons co-release GABA and a neuropeptide, substance P, both at synapses with GABAergic neurons in the SNr and with the dendrites (in SNr) of DA neurons whose somas are in ventral SNc. Co-release enables efficient computation of sustained DA uncertainty responses that are a non-monotonic function of the conditonal probability that a reward will follow the cue. The new model's incorporation of a striatal microcircuit allowed it to reveals that variability in striatal cholinergic transmission can explain observed difference, between monkeys, in the amplitutude of the non-monotonic uncertainty function. Involvement of matriceal MSPNs and striatal cholinergic transmission implpies a relation between uncertainty in the cue-reward contigency and action-selection functions of the basal ganglia. The model synthesizes anatomical, electrophysiological and behavioral data regarding the midbrain DA system in a novel way, by relating the ability to compute uncertainty, in parallel with other aspects of reward contingencies, to the unique distribution of SP inputs in ventral SN.National Science Foundation (SBE-354378); Higher Educational Council of Turkey; Canakkale Onsekiz Mart University of Turke
Automating Staged Rollout with Reinforcement Learning
Staged rollout is a strategy of incrementally releasing software updates to
portions of the user population in order to accelerate defect discovery without
incurring catastrophic outcomes such as system wide outages. Some past studies
have examined how to quantify and automate staged rollout, but stop short of
simultaneously considering multiple product or process metrics explicitly. This
paper demonstrates the potential to automate staged rollout with
multi-objective reinforcement learning in order to dynamically balance
stakeholder needs such as time to deliver new features and downtime incurred by
failures due to latent defects
Recommended from our members
Artificial Intelligence in Radiotherapy Treatment Planning: Present and Future.
Treatment planning is an essential step of the radiotherapy workflow. It has become more sophisticated over the past couple of decades with the help of computer science, enabling planners to design highly complex radiotherapy plans to minimize the normal tissue damage while persevering sufficient tumor control. As a result, treatment planning has become more labor intensive, requiring hours or even days of planner effort to optimize an individual patient case in a trial-and-error fashion. More recently, artificial intelligence has been utilized to automate and improve various aspects of medical science. For radiotherapy treatment planning, many algorithms have been developed to better support planners. These algorithms focus on automating the planning process and/or optimizing dosimetric trade-offs, and they have already made great impact on improving treatment planning efficiency and plan quality consistency. In this review, the smart planning tools in current clinical use are summarized in 3 main categories: automated rule implementation and reasoning, modeling of prior knowledge in clinical practice, and multicriteria optimization. Novel artificial intelligence-based treatment planning applications, such as deep learning-based algorithms and emerging research directions, are also reviewed. Finally, the challenges of artificial intelligence-based treatment planning are discussed for future works
- …