1,364 research outputs found

    Machine Learning for Ad Publishers in Real Time Bidding

    Get PDF

    An Automated Deep Reinforcement Learning Pipeline for Dynamic Pricing

    Get PDF
    A dynamic pricing problem is difficult due to the highly dynamic environment and unknown demand distributions. In this article, we propose a deep reinforcement learning (DRL) framework, which is a pipeline that automatically defines the DRL components for solving a dynamic pricing problem. The automated DRL pipeline is necessary because the DRL framework can be designed in numerous ways, and manually finding optimal configurations is tedious. The levels of automation make nonexperts capable of using DRL for dynamic pricing. Our DRL pipeline contains three steps of DRL design, including Markov decision process modeling, algorithm selection, and hyperparameter optimization. It starts with transforming available information to state representation and defining reward function using a reward shaping approach. Then, the hyperparameters are tuned using a novel hyperparameter optimization method that integrates Bayesian optimization and the selection operator of the genetic algorithm. We employ our DRL pipeline on reserve price optimization problems in online advertising as a case study. We show that using the DRL configuration obtained by our DRL pipeline, a pricing policy is obtained whose revenue is significantly higher than the benchmark methods. The evaluation is performed by developing a simulation for the real-time bidding environment that makes exploration possible for the reinforcement learning agent.</p

    Fast reinforcement learning for decentralized MAC optimization

    Get PDF
    In this paper, we propose a novel decentralized framework for optimizing the transmission strategy of Irregular Repetition Slotted ALOHA (IRSA) protocol in sensor networks. We consider a hierarchical communication framework that ensures adaptivity to changing network conditions and does not require centralized control. The proposed solution is inspired by the reinforcement learning literature, and, in particular, Q-learning. To deal with sensor nodes' limited lifetime and communication range, we allow them to decide how many packet replicas to transmit considering only their own buffer state. We show that this information is sufficient and can help avoiding packets' collisions and improving the throughput significantly. We solve the problem using the decentralized partially observable Markov Decision Process (Dec-POMDP) framework, where we allow each node to decide independently of the others how many packet replicas to transmit. We enhance the proposed Q-learning based method with the concept of virtual experience, and we theoretically and experimentally prove that convergence time is, thus, significantly reduced. The experiments prove that our method leads to large throughput gains, in particular when network traffic is heavy, and scales well with the size of the network. To comprehend the effect of the problem's nature on the learning dynamics and vice versa, we investigate the waterfall effect, a severe degradation in performance above a particular traffic load, typical for codes-on-graphs and prove that our algorithm learns to alleviate it

    Evolving Neural Networks through a Reverse Encoding Tree

    Full text link
    NeuroEvolution is one of the most competitive evolutionary learning frameworks for designing novel neural networks for use in specific tasks, such as logic circuit design and digital gaming. However, the application of benchmark methods such as the NeuroEvolution of Augmenting Topologies (NEAT) remains a challenge, in terms of their computational cost and search time inefficiency. This paper advances a method which incorporates a type of topological edge coding, named Reverse Encoding Tree (RET), for evolving scalable neural networks efficiently. Using RET, two types of approaches -- NEAT with Binary search encoding (Bi-NEAT) and NEAT with Golden-Section search encoding (GS-NEAT) -- have been designed to solve problems in benchmark continuous learning environments such as logic gates, Cartpole, and Lunar Lander, and tested against classical NEAT and FS-NEAT as baselines. Additionally, we conduct a robustness test to evaluate the resilience of the proposed NEAT algorithms. The results show that the two proposed strategies deliver improved performance, characterized by (1) a higher accumulated reward within a finite number of time steps; (2) using fewer episodes to solve problems in targeted environments, and (3) maintaining adaptive robustness under noisy perturbations, which outperform the baselines in all tested cases. Our analysis also demonstrates that RET expends potential future research directions in dynamic environments. Code is available from https://github.com/HaolingZHANG/ReverseEncodingTree.Comment: Accepted to IEEE Congress on Evolutionary Computation (IEEE CEC) 2020. Lecture Presentatio
    • …
    corecore