Search CORE

9 research outputs found

Toward Robust Long Range Policy Transfer

Author: Feng Yao-Min
Lin Jin-Siang
Sun Min
Tseng Wei-Cheng
Publication venue
Publication date: 04/03/2021
Field of study

Humans can master a new task within a few trials by drawing upon skills acquired through prior experience. To mimic this capability, hierarchical models combining primitive policies learned from prior tasks have been proposed. However, these methods fall short comparing to the human's range of transferability. We propose a method, which leverages the hierarchical structure to train the combination function and adapt the set of diverse primitive polices alternatively, to efficiently produce a range of complex behaviors on challenging new tasks. We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase. We demonstrate that our method outperforms other recent policy transfer methods by combining and adapting these reusable primitives in tasks with continuous action space. The experiment results further show that our approach provides a broader transferring range. The ablation study also shows the regularization terms are critical for long range policy transfer. Finally, we show that our method consistently outperforms other methods when the quality of the primitives varies.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

DeepSoCS: A Neural Scheduler for Heterogeneous System-on-Chip (SoC) Resource Scheduling

Author: Ha Jeongsoo
Kim Jeewoo
Ryu Bo
Sohn Chae-Bong
Sung Tegg Taekyong
Yahja Alex
Publication venue: 'MDPI AG'
Publication date: 04/06/2020
Field of study

In this paper, we~present a novel scheduling solution for a class of System-on-Chip (SoC) systems where heterogeneous chip resources (DSP, FPGA, GPU, etc.) must be efficiently scheduled for continuously arriving hierarchical jobs with their tasks represented by a directed acyclic graph. Traditionally, heuristic algorithms have been widely used for many resource scheduling domains, and Heterogeneous Earliest Finish Time (HEFT) has been a dominating state-of-the-art technique across a broad range of heterogeneous resource scheduling domains over many years. Despite their long-standing popularity, HEFT-like algorithms are known to be vulnerable to a small amount of noise added to the environment. Our Deep Reinforcement Learning (DRL)-based SoC Scheduler (DeepSoCS), capable of learning the "best" task ordering under dynamic environment changes, overcomes the brittleness of rule-based schedulers such as HEFT with significantly higher performance across different types of jobs. We~describe a DeepSoCS design process using a real-time heterogeneous SoC scheduling emulator, discuss major challenges, and present two novel neural network design features that lead to outperforming HEFT: (i) hierarchical job- and task-graph embedding; and (ii) efficient use of real-time task information in the state space. Furthermore, we~introduce effective techniques to address two fundamental challenges present in our environment: delayed consequences and joint actions. Through an extensive simulation study, we~show that our DeepSoCS exhibits the significantly higher performance of job execution time than that of HEFT with a higher level of robustness under realistic noise conditions. We~conclude with a discussion of the potential improvements for our DeepSoCS neural scheduler.Comment: 18 pages, Accepted by Electronics 202

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning

Author: Abadi Martín
Benesty Jacob
David
Huang Tianchi
Jiang Junchen
Kingma Diederik P
Laskey Michael
Mao Hongzi
Mnih Volodymyr
Rehman Abdul
Ross Stéphane
Spiteri Kevin
Zahaib
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/12/2019
Field of study

Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it typically suffers from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco's neural network architecture, video datasets and QoE metrics with video quality features. Using trace-driven and real-world experiments, we demonstrate significant improvements of Comyco's sample efficiency in comparison to prior work, with 1700x improvements in terms of the number of samples required and 16x improvements on training time required. Moreover, results illustrate that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. Especially, Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average video quality under the same rebuffering time.Comment: ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

Learning Scheduling Algorithms for Data Processing Clusters

Author: Abadi Martín
Addanki Ravichandra
Dai Hanjun
Finn Chelsea
Ghodsi Ali
Gog Ionel
Grandl Robert
Greensmith Evan
Hindman Benjamin
Kingma Diederik P
Mao Hongzi
Mao Hongzi
Marcus Ryan
Mirhoseini Azalia
Mirhoseini Azalia
Pinto Lerrel
Schulman John
Spark Apache
Sutton S.
Weaver Lex
Zaharia Matei
Publication venue
Publication date: 21/08/2019
Field of study

Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Evaluation of blood glucose level control in Type 1 diabetic patients using online and offline reinforcement learning

Author: Viroonluecha Phuwadol
Publication venue: 'Universidad Politecnica de Cartagena'
Publication date: 01/01/2023
Field of study

[SPA] Los pacientes con diabetes tipo 1 deben monitorear de cerca sus niveles de glucemia y administrar insulina para controlarlos. Se han propuesto métodos de control automatizado de la glucemia que eliminan la necesidad de intervención humana, y recientemente, el aprendizaje por refuerzo, un tipo de algoritmo de aprendizaje automático, se ha utilizado como un método efectivo de control en entornos simulados. Actualmente, los métodos utilizados para los pacientes con diabetes, como el régimen basal- bolus y los monitores continuos de glucemia, tienen limitaciones y todavía requieren intervención manual. Los controladores PID se utilizan ampliamente por su simplicidad y robustez, pero son sensibles a factores externos que afectan su efectividad. Las obras existentes en la literatura de investigación se han enfocado principalmente en mejorar la precisión de estos algoritmos de control. Sin embargo, todavía hay margen para mejorar la adaptabilidad a los pacientes individuales. La siguiente fase de investigación tiene como objetivo optimizar aún más los métodos actuales y adaptar los algoritmos para controlar mejor los niveles de glucemia. Una solución potencial es usar el aprendizaje por refuerzo (RL) para entrenar los algoritmos en base a datos individuales del paciente. En esta tesis, proponemos un control en lazo cerrado para los niveles de glucemia basado en el aprendizaje profundo por refuerzo. Describimos la evaluación inicial de varias alternativas llevadas a cabo en un simulador realista del sistema glucorregulador y proponemos una estrategia de implementación particular basada en reducir la frecuencia de las observaciones y recompensas pasadas al agente, y usar una función de recompensa simple. Entrenamos agentes con esa estrategia para tres grupos de clases de pacientes, los evaluamos y los comparamos con otras alternativas. Nuestros resultados muestran que nuestro método con Proximal Policy Optimization es capaz de superar a los métodos tradicionales, así como a propuestas similares recientes, al lograr períodos más prolongados de estado glicémico seguro y de bajo riesgo. Como extensión del aporte anterior, constatamos que la aplicación práctica de los algoritmos de control de glucemia requeriría interacciones de prueba y error con los pacientes, lo que es una limitación para entrenar el sistema de manera efectiva. Como alternativa, el aprendizaje reforzado sin conexión no requiere interacción con humanos y la investigación previa sugiere que se pueden lograr resultados prometedores con conjuntos de datos obtenidos sin interacción, similar a los algoritmos de aprendizaje automático clásicos. Sin embargo, aún no se ha evaluado la aplicación del aprendizaje reforzado sin conexión al control de la glucemia. Por lo tanto, en esta tesis, evaluamos exhaustivamente dos algoritmos de aprendizaje reforzado sin conexión para el control de glucemia y examinamos su potencial y limitaciones. Evaluamos el impacto del método utilizado para generar los conjuntos de datos de entrenamiento, el tipo de trayectorias (secuencias de estados, acciones y recompensas experimentadas por un agente en un entorno,) empleadas (método único o mixto), la calidad de las trayectorias y el tamaño de los conjuntos de datos en el entrenamiento y el rendimiento, y los comparamos con las alternativas como PID y Proximal Policy Optimization. Nuestros resultados demuestran que uno de los algoritmos de aprendizaje reforzado sin conexión evaluados, Trajectory Transformer, es capaz de rendir al mismo nivel que alternativas, pero sin necesidad de interacción con pacientes reales durante el entrenamiento.[ENG] Patients with Type 1 diabetes are required to closely monitor their blood glucose levels and administer insulin to manage them. Automated glucose control methods that eliminate the need for human intervention have been proposed, and recently, reinforcement learning, a type of machine learning algorithm, has been used as an effective control method in simulated environments. Currently, the methods used for diabetes patients, such as the basal-bolus regime and continuous glucose monitors, have limitations and still require manual intervention. The PID controllers are widely used for their simplicity and robustness, but they are sensitive to external factors affecting their effectiveness. The existing works in the research literature have mainly focused on improving the accuracy of these control algorithms. However, there is still room for improvement regarding adaptability to individual patients. The next phase of research aims to further optimize the current methods and adapt the algorithms to better control blood glucose levels. Machine learning proposals have paved the way partially, but they can generate generic models with limited adaptability. One potential solution is to use reinforcement learning (RL) to train the algorithms based on individual patient data. In this thesis, we propose a closed-loop control for blood glucose levels based on Deep reinforcement learning. We describe the initial evaluation of several alternatives conducted on a realistic simulator of the glucoregulatory system and propose a particular implementation strategy based on reducing the frequency of the observations and rewards passed to the agent, and using a simple reward function. We train agents with that strategy for three groups of patient classes, evaluate and compare it with alternative control baselines. Our results show that our method with Proximal Policy Optimization is able to outperform baselines as well as similar recent proposals, by achieving longer periods of safe glycemic state and low risk. As an extension of the previous contribution, we have noticed that, practical application of blood glucose control algorithms would necessitate trial-and-error interaction with patients, which could be a limitation for effectively training the system. As an alternative, offline reinforcement learning does not require interaction with subjects and preliminary research suggests that promising results can be achieved with datasets obtained offline, similar to classical machine learning algorithms. However, application of offline reinforcement learning to glucose control has to be evaluated yet. Thus, in this thesis, we comprehensively evaluate two offline reinforcement learning algorithms for blood glucose control and examine their potential and limitations. We assess the impact of the method used to generate training datasets, the type of trajectories employed (sequences of states, actions, and rewards experienced by an agent in an environment over time), the quality of the trajectories, and the size of the datasets on training and performance, and compare them to commonly used baselines such as PID and Proximal Policy Optimization. Our results demonstrate that one of the offline reinforcement learning algorithms evaluated, Trajectory Transformer, is able to perform at the same level as the baselines, but without the need for interaction with real patients during training.Escuela Internacional de Doctorado de la Universidad Politécnica de CartagenaUniversidad Politécnica de CartagenaPrograma Doctorado en Tecnologías de la Información y las Comunicacione

Repositorio Digital de la Universidad Politécnica de Cartagena