5,192 research outputs found
Application of the cross-entropy method to the buffer allocation problem in a simulation-based environment
The buffer allocation problem (BAP) is a well-known difficult problem in the design of production lines. We present a stochastic algorithm for solving the BAP, based on the cross-entropy method, a new paradigm for stochastic optimization. The algorithm involves the following iterative steps: (a) the generation of buffer allocations according to a certain random mechanism, followed by (b) the modification of this mechanism on the basis of cross-entropy minimization. Through various numerical experiments we demonstrate the efficiency of the proposed algorithm and show that the method can quickly generate (near-)optimal buffer allocations for fairly large production lines
Energy Sharing for Multiple Sensor Nodes with Finite Buffers
We consider the problem of finding optimal energy sharing policies that
maximize the network performance of a system comprising of multiple sensor
nodes and a single energy harvesting (EH) source. Sensor nodes periodically
sense the random field and generate data, which is stored in the corresponding
data queues. The EH source harnesses energy from ambient energy sources and the
generated energy is stored in an energy buffer. Sensor nodes receive energy for
data transmission from the EH source. The EH source has to efficiently share
the stored energy among the nodes in order to minimize the long-run average
delay in data transmission. We formulate the problem of energy sharing between
the nodes in the framework of average cost infinite-horizon Markov decision
processes (MDPs). We develop efficient energy sharing algorithms, namely
Q-learning algorithm with exploration mechanisms based on the -greedy
method as well as upper confidence bound (UCB). We extend these algorithms by
incorporating state and action space aggregation to tackle state-action space
explosion in the MDP. We also develop a cross entropy based method that
incorporates policy parameterization in order to find near optimal energy
sharing policies. Through simulations, we show that our algorithms yield energy
sharing policies that outperform the heuristic greedy method.Comment: 38 pages, 10 figure
The Project Scheduling Problem with Non-Deterministic Activities Duration: A Literature Review
Purpose: The goal of this article is to provide an extensive literature review of the models and solution procedures proposed by many researchers interested on the Project Scheduling Problem with nondeterministic activities duration. Design/methodology/approach: This paper presents an exhaustive literature review, identifying the existing models where the activities duration were taken as uncertain or random parameters. In order to get published articles since 1996, was employed the Scopus database. The articles were selected on the basis of reviews of abstracts, methodologies, and conclusions. The results were classified according to following characteristics: year of publication, mathematical representation of the activities duration, solution techniques applied, and type of problem solved. Findings: Genetic Algorithms (GA) was pointed out as the main solution technique employed by researchers, and the Resource-Constrained Project Scheduling Problem (RCPSP) as the most studied type of problem. On the other hand, the application of new solution techniques, and the possibility of incorporating traditional methods into new PSP variants was presented as research trends. Originality/value: This literature review contents not only a descriptive analysis of the published articles but also a statistical information section in order to examine the state of the research activity carried out in relation to the Project Scheduling Problem with non-deterministic activities duration.Peer Reviewe
Joint QoS-Aware Scheduling and Precoding for Massive MIMO Systems via Deep Reinforcement Learning
The rapid development of mobile networks proliferates the demands of high
data rate, low latency, and high-reliability applications for the
fifth-generation (5G) and beyond (B5G) mobile networks. Concurrently, the
massive multiple-input-multiple-output (MIMO) technology is essential to
realize the vision and requires coordination with resource management functions
for high user experiences. Though conventional cross-layer adaptation
algorithms have been developed to schedule and allocate network resources, the
complexity of resulting rules is high with diverse quality of service (QoS)
requirements and B5G features. In this work, we consider a joint user
scheduling, antenna allocation, and precoding problem in a massive MIMO system.
Instead of directly assigning resources, such as the number of antennas, the
allocation process is transformed into a deep reinforcement learning (DRL)
based dynamic algorithm selection problem for efficient Markov decision process
(MDP) modeling and policy training. Specifically, the proposed utility function
integrates QoS requirements and constraints toward a long-term system-wide
objective that matches the MDP return. The componentized action structure with
action embedding further incorporates the resource management process into the
model. Simulations show 7.2% and 12.5% more satisfied users against static
algorithm selection and related works under demanding scenarios
Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks
An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the Internet-of-Things (IoT) users, by optimizing offloading decision, transmission power, and resource allocation in the large-scale mobile-edge computing (MEC) system. Toward this end, a deep reinforcement learning (DRL)-based solution is proposed, which includes the following components. First, a related and regularized stacked autoencoder (2r-SAE) with unsupervised learning is applied to perform data compression and representation for high-dimensional channel quality information (CQI) data, which can reduce the state space for DRL. Second, we present an adaptive simulated annealing approach (ASA) as the action search method of DRL, in which an adaptive h -mutation is used to guide the search direction and an adaptive iteration is proposed to enhance the search efficiency during the DRL process. Third, a preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy. The numerical results are provided to demonstrate that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computational time compared with existing benchmarks
- …