11 research outputs found

    Deep reinforcement learning für workload balance und Fälligkeitskontrolle in wafer fabs

    Get PDF
    Semiconductor wafer fabrication facilities (wafer fabs) often prioritize two operational objectives: work-in-process (WIP) and due date. WIP-oriented and due date-oriented dispatching rules are two commonly used methods to achieve workload balance and on-time delivery, respectively. However, it often requires sophisticated heuristics to achieve both objectives simultaneously. In this paper, we propose a novel approach using deep-Q-network reinforcement learning (DRL) for dispatching in wafer fabs. The DRL approach differs from traditional dispatching methods by using dispatch agents at work-centers to observe state changes in the wafer fabs. The agents train their deep-Q-networks by taking the states as inputs, allowing them to select the most appropriate dispatch action. Additionally, the reward function is integrated with workload and due date information on both local and global levels. Compared to the traditional WIP and due date-oriented rules, as well as heuristics-based rule in literature, the DRL approach is able to produce better global performance with regard to workload balance and on-time delivery

    Sheet-Metal Production Scheduling Using AlphaGo Zero

    Get PDF
    This work investigates the applicability of a reinforcement learning (RL) approach, specifically AlphaGo Zero (AZ), for optimizing sheet-metal (SM) production schedules with respect to tardiness and material waste. SM production scheduling is a complex job shop scheduling problem (JSSP) with dynamic operation times, routing flexibility and supplementary constraints. SM production systems are capable of processing a large number of highly heterogeneous jobs simultaneously. While very large relative to the JSSP literature, the SM-JSSP instances investigated in this work are small relative to the SM production reality. Given the high dimensionality of the SM-JSSP, computation of an optimal schedule is not tractable. Simple heuristic solutions often deliver bad results. We use AZ to selectively search the solution space. To this end, a single player AZ version is pretrained using supervised learning on schedules generated by a heuristic, fine-tuned using RL and evaluated through comparison with a heuristic baseline and Monte Carlo Tree Search. It will be shown that AZ outperforms the other approaches. The work’s scientific contribution is twofold: On the one hand, a novel scheduling problem is formalized such that it can be tackled using RL approaches. On the other hand, it is proved that AZ can be successfully modified to provide a solution for the problem at hand, whereby a new line of research into real-world applications of AZ is opened

    Reinforcement Learning Based Production Control of Semi-automated Manufacturing Systems

    Get PDF
    In an environment which is marked by an increasing speed of changes, industrial companies have to be able to quickly adapt to new market demands and innovative technologies. This leads to a need for continuous adaption of existing production systems and the optimization of their production control. To tackle this problem digitalization of production systems has become essential for new and existing systems. Digital twins based on simulations of real production systems allow the simplification of analysis processes and, thus, a better understanding of the systems, which leads to broad optimization possibilities. In parallel, machine learning methods can be integrated to process the numerical data and discover new production control strategies. In this work, these two methods are combined to derive a production control logic in a semi-automated production system based on the chaku-chaku principle. A reinforcement learning method is integrated into the digital twin to autonomously learn a superior production control logic for the distribution of tasks between the different workers on a production line. By analyzing the influence of different reward shaping and hyper-parameter optimization on the quality and stability of the results obtained, the use of a well-configured policy-based algorithm enables an efficient management of the workers and the deduction of an optimal production control logic for the production system. The algorithm manages to define a control logic that leads to an increase in productivity while having a stable task assignment so that a transfer to daily business is possible. The approach is validated in the digital twin of a real assembly line of an automotive supplier. The results obtained suggest a new approach to optimizing production control in production lines. Production control shall be centered directly on the workers’ routines and controlled by artificial intelligence infused with a global overview of the entire production system

    Propuesta de mejora en el planeamiento de la producción de botellas aplicando un MPS y pronósticos basados en Deep Learning en una empresa productiva y envasadora de agua en el Callao

    Get PDF
    En los últimos tres años el nivel de venta de agua embotellada para consumo humano en el Perú muestra un crecimiento en las ventas realizadas por las empresas productoras a los supermercados, bodegas, mercados y tiendas de conveniencias, esto se evidencia en el incremento del 3.9% de este sector productivo al cierre del 2017. La facturación de las grandes cadenas de supermercados a nivel nacional ha alcanzado los 14,000 000 PEN con un crecimiento del 5.3% respecto al año 2016. Esto demuestra que este sector económico está en crecimiento por el alto consumo de agua embotellada. La presente empresa a analizar se dedica a la producción de agua embotellada de marca propia y de maquila para sus principales clientes: Cencosud Retail S.A. y Supermercados Peruanos S.A., empresa joven y mediana con 43 trabajadores en planilla presente en el ámbito nacional, de importante participación en el mercado, que a partir del 2017 ha experimentado un fuerte crecimiento tras la priorización de la producción del cliente Supermercados Peruanos S.A en la presente empresa. Sus productos cuentan con 69% de participación a nivel supermercados y en el 2018 presenta un posicionamiento en el mercado que se encuentra dentro del 10.4%, compartiendo el mercado nacional con grandes embotelladoras como AB InBev, CBC Perú, Arca Continental e ISM, cabe resaltar que su portafolio de productos en primer trimestre del 2019 presenta la siguiente composición: Bells 48.04%, Wong 12.32%, Metro 28.11%, Selfie 11.53%. Lo cual representa un crecimiento en ventas respecto al trimestre anterior del 8%, el cual se provee ser mantenido. El aumento de pedidos de la presentación de 2.5L Bells, ha ocasionado que la falta de una planeación en la demanda y el incorrecto manejo de almacenes comiencen a generar problemas incumplimiento de pedidos, en promedio 17% entre los cuales se encuentran el abastecimiento incompleto a clientes, penalidades por entrega tardía, roturas en el stock de producto terminado, baja calidad del producto entre otros. El proceso critico de la empresa se presenta en las áreas de producción y logística. En primer lugar, actualmente se utilizan metodologías para el cálculo de la demanda que no corresponden a la realidad, dado esto, la producción se basa en los pedidos actuales. No se mantiene ningún pronóstico de la demanda útil para producción y se carece de un sistema de inventarios. Como resultado se genera que se incurra en el incumplimiento de los pedidos, acrecentado por el incremento de las ventas en estos últimos años. Se puede observar, según datos del último trimestre del año 2019 la cantidad de agua tratada requerida en diferentes presentaciones ascendió a la cantidad de: 538312.2 L. En segundo lugar, se evidencia de que existe una cantidad considerable de mermas en distintas fases del proceso productivo, se observa que la utilización del agua tratada tiene una eficacia de aproximadamente el 40%, para las etapas de llenado, sellado y empacado, existe un nivel de merma del 8.8% en promedio el cual comprende los recursos utilizados en cada etapa del proceso productivo respectivamente, dado al uso excesivo de las horas extras y de la utilización de los equipos hasta su falla . Se verifico que el número de horas extras por mes en el último trimestre del año 2019 alcanzo 628,5 horas extras en distintas posiciones, es decir se requiere de dichas horas adicionales para la culminación de los pedidos. Lo presente lleva a concluir que se debe plantear una reingeniería en la planificación, inventarios y ritmo de trabajo. Se propondrá solucionar los problemas actuales de la empresa mediante la implementación de un MPS Master Production Schedule a partir del análisis de la demanda apoyado por la aplicación de herramientas de analítica de datos con el cual se pretende implementar un modelo de Deep Learning LTSM y consecuentemente desarrollar una correcta planificación de la producción, establecer un sistema de inventarios y aumentar la productividad a través del TaktTime

    An intelligent resource allocation decision support system with Q-learning

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Heuristics and policies for online pickup and delivery problems

    Get PDF
    Master ThesisIn the last few decades, increased attention has been dedicated to a speci c subclass of Vehicle Routing Problems due to its signi cant importance in several transportation areas such as taxi companies, courier companies, transportation of people, organ transportation, etc. These problems are characterized by their dynamicity as the demands are, in general, unknown in advance and the corresponding locations are paired. This thesis addresses a version of such Dynamic Pickup and Delivery Problems, motivated by a problem arisen in an Australian courier company, which operates in Sydney, Melbourne and Brisbane, where almost every day more than a thousand transportation orders arrive and need to be accommodated. The rm has a eet of almost two hundred vehicles of various types, mostly operating within the city areas. Thus, whenever new orders arrive at the system the dispatchers face a complex decision regarding the allocation of the new customers within the distribution routes (already existing or new) taking into account a complex multi-level objective function. The thesis thus focuses on the process of learning simple dispatch heuristics, and lays the foundations of a recommendation system able to rank such heuristics. We implemented eight of these, observing di erent characteristics of the current eet and orders. It incorporates an arti cial neural network that is trained on two hundred days of past data, and is supervised by schedules produced by an oracle, Indigo, which is a system able to produce suboptimal solutions to problem instances. The system opens the possibility for many dispatch policies to be implemented that are based on this rule ranking, and helps dispatchers to manage the vehicles of the eet. It also provides results for the human resources required each single day and within the di erent periods of the day. We complement the quite promising results obtained with a discussion on future additions and improvements such as channel eet management, tra c consideration, and learning hyper-heuristics to control simple rule sequences.The thesis work was partially supported by the National ICT Australia according to the Visitor Research Agreement contract between NICTA and Martin Damyanov Aleksandro

    Resource Allocation in SDN/NFV-Enabled Core Networks

    Get PDF
    For next generation core networks, it is anticipated to integrate communication, storage and computing resources into one unified, programmable and flexible infrastructure. Software-defined networking (SDN) and network function virtualization (NFV) become two enablers. SDN decouples the network control and forwarding functions, which facilitates network management and enables network programmability. NFV allows the network functions to be virtualized and placed on high capacity servers located anywhere in the network, not only on dedicated devices in current networks. Driven by SDN and NFV platforms, the future network architecture is expected to feature centralized network management, virtualized function chaining, reduced capital and operational costs, and enhanced service quality. The combination of SDN and NFV provides a potential technical route to promote the future communication networks. It is imperative to efficiently manage, allocate and optimize the heterogeneous resources, including computing, storage, and communication resources, to the customized services to achieve better quality-of-service (QoS) provisioning. This thesis makes some in-depth researches on efficient resource allocation for SDN/NFV-enabled core networks in multiple aspects and dimensionality. Typically, the resource allocation task is implemented in three aspects. Given the traffic metrics, QoS requirements, and resource constraints of the substrate network, we first need to compose a virtual network function (VNF) chain to form a virtual network (VN) topology. Then, virtual resources allocated to each VNF or virtual link need to be optimized in order to minimize the provisioning cost while satisfying the QoS requirements. Next, we need to embed the virtual network (i.e., VNF chain) onto the substrate network, in which we need to assign the physical resources in an economical way to meet the resource demands of VNFs and links. This involves determining the locations of NFV nodes to host the VNFs and the routing from source to destination. Finally, we need to schedule the VNFs for multiple services to minimize the service completion time and maximize the network performance. In this thesis, we study resource allocation in SDN/NFV-enabled core networks from the aforementioned three aspects. First, we jointly study how to design the topology of a VN and embed the resultant VN onto a substrate network with the objective of minimizing the embedding cost while satisfying the QoS requirements. In VN topology design, optimizing the resource requirement for each virtual node and link is necessary. Without topology optimization, the resources assigned to the virtual network may be insufficient or redundant, leading to degraded service quality or increased embedding cost. The joint problem is formulated as a Mixed Integer Nonlinear Programming (MINLP), where queueing theory is utilized as the methodology to analyze the network delay and help to define the optimal set of physical resource requirements at network elements. Two algorithms are proposed to obtain the optimal/near-optimal solutions of the MINLP model. Second, we address the multi-SFC embedding problem by a game theoretical approach, considering the heterogeneity of NFV nodes, the effect of processing-resource sharing among various VNFs, and the capacity constraints of NFV nodes. In the proposed resource constrained multi-SFC embedding game (RC-MSEG), each SFC is treated as a player whose objective is to minimize the overall latency experienced by the supported service flow, while satisfying the capacity constraints of all its NFV nodes. Due to processing-resource sharing, additional delay is incurred and integrated into the overall latency for each SFC. The capacity constraints of NFV nodes are considered by adding a penalty term into the cost function of each player, and are guaranteed by a prioritized admission control mechanism. We first prove that the proposed game RC-MSEG is an exact potential game admitting at least one pure Nash Equilibrium (NE) and has the finite improvement property (FIP). Then, we design two iterative algorithms, namely, the best response (BR) algorithm with fast convergence and the spatial adaptive play (SAP) algorithm with great potential to obtain the best NE of the proposed game. Third, the VNF scheduling problem is investigated to minimize the makespan (i.e., overall completion time) of all services, while satisfying their different end-to-end (E2E) delay requirements. The problem is formulated as a mixed integer linear program (MILP) which is NP-hard with exponentially increasing computational complexity as the network size expands. To solve the MILP with high efficiency and accuracy, the original problem is reformulated as a Markov decision process (MDP) problem with variable action set. Then, a reinforcement learning (RL) algorithm is developed to learn the best scheduling policy by continuously interacting with the network environment. The proposed learning algorithm determines the variable action set at each decision-making state and accommodates different execution time of the actions. The reward function in the proposed algorithm is carefully designed to realize delay-aware VNF scheduling. To sum up, it is of great importance to integrate SDN and NFV in the same network to accelerate the evolution toward software-enabled network services. We have studied VN topology design, multi-VNF chain embedding, and delay-aware VNF scheduling to achieve efficient resource allocation in different dimensions. The proposed approaches pave the way for exploiting network slicing to improve resource utilization and facilitate QoS-guaranteed service provisioning in SDN/NFV-enabled networks

    Adaptive Order Dispatching based on Reinforcement Learning: Application in a Complex Job Shop in the Semiconductor Industry

    Get PDF
    Heutige Produktionssysteme tendieren durch die Marktanforderungen getrieben zu immer kleineren Losgrößen, höherer Produktvielfalt und größerer Komplexität der Materialflusssysteme. Diese Entwicklungen stellen bestehende Produktionssteuerungsmethoden in Frage. Im Zuge der Digitalisierung bieten datenbasierte Algorithmen des maschinellen Lernens einen alternativen Ansatz zur Optimierung von Produktionsabläufen. Aktuelle Forschungsergebnisse zeigen eine hohe Leistungsfähigkeit von Verfahren des Reinforcement Learning (RL) in einem breiten Anwendungsspektrum. Im Bereich der Produktionssteuerung haben sich jedoch bisher nur wenige Autoren damit befasst. Eine umfassende Untersuchung verschiedener RL-Ansätze sowie eine Anwendung in der Praxis wurden noch nicht durchgeführt. Unter den Aufgaben der Produktionsplanung und -steuerung gewährleistet die Auftragssteuerung (order dispatching) eine hohe Leistungsfähigkeit und Flexibilität der Produktionsabläufe, um eine hohe Kapazitätsauslastung und kurze Durchlaufzeiten zu erreichen. Motiviert durch komplexe Werkstattfertigungssysteme, wie sie in der Halbleiterindustrie zu finden sind, schließt diese Arbeit die Forschungslücke und befasst sich mit der Anwendung von RL für eine adaptive Auftragssteuerung. Die Einbeziehung realer Systemdaten ermöglicht eine genauere Erfassung des Systemverhaltens als statische Heuristiken oder mathematische Optimierungsverfahren. Zusätzlich wird der manuelle Aufwand reduziert, indem auf die Inferenzfähigkeiten des RL zurückgegriffen wird. Die vorgestellte Methodik fokussiert die Modellierung und Implementierung von RL-Agenten als Dispatching-Entscheidungseinheit. Bekannte Herausforderungen der RL-Modellierung in Bezug auf Zustand, Aktion und Belohnungsfunktion werden untersucht. Die Modellierungsalternativen werden auf der Grundlage von zwei realen Produktionsszenarien eines Halbleiterherstellers analysiert. Die Ergebnisse zeigen, dass RL-Agenten adaptive Steuerungsstrategien erlernen können und bestehende regelbasierte Benchmarkheuristiken übertreffen. Die Erweiterung der Zustandsrepräsentation verbessert die Leistung deutlich, wenn ein Zusammenhang mit den Belohnungszielen besteht. Die Belohnung kann so gestaltet werden, dass sie die Optimierung mehrerer Zielgrößen ermöglicht. Schließlich erreichen spezifische RL-Agenten-Konfigurationen nicht nur eine hohe Leistung in einem Szenario, sondern weisen eine Robustheit bei sich ändernden Systemeigenschaften auf. Damit stellt die Forschungsarbeit einen wesentlichen Beitrag in Richtung selbstoptimierender und autonomer Produktionssysteme dar. Produktionsingenieure müssen das Potenzial datenbasierter, lernender Verfahren bewerten, um in Bezug auf Flexibilität wettbewerbsfähig zu bleiben und gleichzeitig den Aufwand für den Entwurf, den Betrieb und die Überwachung von Produktionssteuerungssystemen in einem vernünftigen Gleichgewicht zu halten
    corecore