5,192 research outputs found

    Application of the cross-entropy method to the buffer allocation problem in a simulation-based environment

    Get PDF
    The buffer allocation problem (BAP) is a well-known difficult problem in the design of production lines. We present a stochastic algorithm for solving the BAP, based on the cross-entropy method, a new paradigm for stochastic optimization. The algorithm involves the following iterative steps: (a) the generation of buffer allocations according to a certain random mechanism, followed by (b) the modification of this mechanism on the basis of cross-entropy minimization. Through various numerical experiments we demonstrate the efficiency of the proposed algorithm and show that the method can quickly generate (near-)optimal buffer allocations for fairly large production lines

    Energy Sharing for Multiple Sensor Nodes with Finite Buffers

    Full text link
    We consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. Sensor nodes periodically sense the random field and generate data, which is stored in the corresponding data queues. The EH source harnesses energy from ambient energy sources and the generated energy is stored in an energy buffer. Sensor nodes receive energy for data transmission from the EH source. The EH source has to efficiently share the stored energy among the nodes in order to minimize the long-run average delay in data transmission. We formulate the problem of energy sharing between the nodes in the framework of average cost infinite-horizon Markov decision processes (MDPs). We develop efficient energy sharing algorithms, namely Q-learning algorithm with exploration mechanisms based on the ϵ\epsilon-greedy method as well as upper confidence bound (UCB). We extend these algorithms by incorporating state and action space aggregation to tackle state-action space explosion in the MDP. We also develop a cross entropy based method that incorporates policy parameterization in order to find near optimal energy sharing policies. Through simulations, we show that our algorithms yield energy sharing policies that outperform the heuristic greedy method.Comment: 38 pages, 10 figure

    The Project Scheduling Problem with Non-Deterministic Activities Duration: A Literature Review

    Get PDF
    Purpose: The goal of this article is to provide an extensive literature review of the models and solution procedures proposed by many researchers interested on the Project Scheduling Problem with nondeterministic activities duration. Design/methodology/approach: This paper presents an exhaustive literature review, identifying the existing models where the activities duration were taken as uncertain or random parameters. In order to get published articles since 1996, was employed the Scopus database. The articles were selected on the basis of reviews of abstracts, methodologies, and conclusions. The results were classified according to following characteristics: year of publication, mathematical representation of the activities duration, solution techniques applied, and type of problem solved. Findings: Genetic Algorithms (GA) was pointed out as the main solution technique employed by researchers, and the Resource-Constrained Project Scheduling Problem (RCPSP) as the most studied type of problem. On the other hand, the application of new solution techniques, and the possibility of incorporating traditional methods into new PSP variants was presented as research trends. Originality/value: This literature review contents not only a descriptive analysis of the published articles but also a statistical information section in order to examine the state of the research activity carried out in relation to the Project Scheduling Problem with non-deterministic activities duration.Peer Reviewe

    Joint QoS-Aware Scheduling and Precoding for Massive MIMO Systems via Deep Reinforcement Learning

    Full text link
    The rapid development of mobile networks proliferates the demands of high data rate, low latency, and high-reliability applications for the fifth-generation (5G) and beyond (B5G) mobile networks. Concurrently, the massive multiple-input-multiple-output (MIMO) technology is essential to realize the vision and requires coordination with resource management functions for high user experiences. Though conventional cross-layer adaptation algorithms have been developed to schedule and allocate network resources, the complexity of resulting rules is high with diverse quality of service (QoS) requirements and B5G features. In this work, we consider a joint user scheduling, antenna allocation, and precoding problem in a massive MIMO system. Instead of directly assigning resources, such as the number of antennas, the allocation process is transformed into a deep reinforcement learning (DRL) based dynamic algorithm selection problem for efficient Markov decision process (MDP) modeling and policy training. Specifically, the proposed utility function integrates QoS requirements and constraints toward a long-term system-wide objective that matches the MDP return. The componentized action structure with action embedding further incorporates the resource management process into the model. Simulations show 7.2% and 12.5% more satisfied users against static algorithm selection and related works under demanding scenarios

    Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks

    Get PDF
    An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the Internet-of-Things (IoT) users, by optimizing offloading decision, transmission power, and resource allocation in the large-scale mobile-edge computing (MEC) system. Toward this end, a deep reinforcement learning (DRL)-based solution is proposed, which includes the following components. First, a related and regularized stacked autoencoder (2r-SAE) with unsupervised learning is applied to perform data compression and representation for high-dimensional channel quality information (CQI) data, which can reduce the state space for DRL. Second, we present an adaptive simulated annealing approach (ASA) as the action search method of DRL, in which an adaptive h -mutation is used to guide the search direction and an adaptive iteration is proposed to enhance the search efficiency during the DRL process. Third, a preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy. The numerical results are provided to demonstrate that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computational time compared with existing benchmarks
    corecore