52 research outputs found

    Reinforcement learning for business modeling.

    Get PDF
    This chapter summarizes the reinforcement learning theory, emphasizing its relationship with dynamic programming (reinforcement learning algorithms may replace dynamic programming when a full model of the environment is not available) and analysing its limitations as a tool for modelling human and organizational behaviour

    Adaptive Order Dispatching based on Reinforcement Learning: Application in a Complex Job Shop in the Semiconductor Industry

    Get PDF
    Heutige Produktionssysteme tendieren durch die Marktanforderungen getrieben zu immer kleineren Losgrößen, höherer Produktvielfalt und größerer Komplexität der Materialflusssysteme. Diese Entwicklungen stellen bestehende Produktionssteuerungsmethoden in Frage. Im Zuge der Digitalisierung bieten datenbasierte Algorithmen des maschinellen Lernens einen alternativen Ansatz zur Optimierung von Produktionsabläufen. Aktuelle Forschungsergebnisse zeigen eine hohe Leistungsfähigkeit von Verfahren des Reinforcement Learning (RL) in einem breiten Anwendungsspektrum. Im Bereich der Produktionssteuerung haben sich jedoch bisher nur wenige Autoren damit befasst. Eine umfassende Untersuchung verschiedener RL-Ansätze sowie eine Anwendung in der Praxis wurden noch nicht durchgeführt. Unter den Aufgaben der Produktionsplanung und -steuerung gewährleistet die Auftragssteuerung (order dispatching) eine hohe Leistungsfähigkeit und Flexibilität der Produktionsabläufe, um eine hohe Kapazitätsauslastung und kurze Durchlaufzeiten zu erreichen. Motiviert durch komplexe Werkstattfertigungssysteme, wie sie in der Halbleiterindustrie zu finden sind, schließt diese Arbeit die Forschungslücke und befasst sich mit der Anwendung von RL für eine adaptive Auftragssteuerung. Die Einbeziehung realer Systemdaten ermöglicht eine genauere Erfassung des Systemverhaltens als statische Heuristiken oder mathematische Optimierungsverfahren. Zusätzlich wird der manuelle Aufwand reduziert, indem auf die Inferenzfähigkeiten des RL zurückgegriffen wird. Die vorgestellte Methodik fokussiert die Modellierung und Implementierung von RL-Agenten als Dispatching-Entscheidungseinheit. Bekannte Herausforderungen der RL-Modellierung in Bezug auf Zustand, Aktion und Belohnungsfunktion werden untersucht. Die Modellierungsalternativen werden auf der Grundlage von zwei realen Produktionsszenarien eines Halbleiterherstellers analysiert. Die Ergebnisse zeigen, dass RL-Agenten adaptive Steuerungsstrategien erlernen können und bestehende regelbasierte Benchmarkheuristiken übertreffen. Die Erweiterung der Zustandsrepräsentation verbessert die Leistung deutlich, wenn ein Zusammenhang mit den Belohnungszielen besteht. Die Belohnung kann so gestaltet werden, dass sie die Optimierung mehrerer Zielgrößen ermöglicht. Schließlich erreichen spezifische RL-Agenten-Konfigurationen nicht nur eine hohe Leistung in einem Szenario, sondern weisen eine Robustheit bei sich ändernden Systemeigenschaften auf. Damit stellt die Forschungsarbeit einen wesentlichen Beitrag in Richtung selbstoptimierender und autonomer Produktionssysteme dar. Produktionsingenieure müssen das Potenzial datenbasierter, lernender Verfahren bewerten, um in Bezug auf Flexibilität wettbewerbsfähig zu bleiben und gleichzeitig den Aufwand für den Entwurf, den Betrieb und die Überwachung von Produktionssteuerungssystemen in einem vernünftigen Gleichgewicht zu halten

    Application of a Reinforcement Learning-based Automated Order Release in Production

    Get PDF
    The importance of job shop production is increasing in order to meet the customer-driven greater demand for products with a larger number of variants in small quantities. However, it also leads to higher requirements for the production planning and control. In order to meet logistical target values and customer needs, one approach is the focus on dynamic planning systems, which can reduce ad-hoc control interventions in the running production. In particular, the release of orders at the beginning of the production process has a high influence on the planning quality. Previous approaches used advanced methods such as combinations of reinforcement learning (RL) and simulation to improve specific production environments, which are sometimes highly simplified and not practical enough. This paper presents a practice-based application of an automated order release procedure based on RL using the example of real-world production scenarios. Both, the training environment, and the data processing method are introduced. Primarily, three aspects to achieve a higher practical orientation are addressed: A more realistic problem size compared to previous approaches, a higher customer orientation by means of an objective regarding adherence to delivery date and a control application for development and performance evaluation of the considered algorithms against known order release strategies. Follow-up research will refine the objective function, continue to scale-up the problem size and evaluate the algorithm’s scheduling results in case of changes in the system

    반도체 공장 내 일시적인 생산 용량 확장 정책 제안

    Get PDF
    학위논문 (석사) -- 서울대학교 대학원 : 공과대학 산업공학과, 2021. 2. 박건수.Due to the instability of the capacity of the semiconductor process, there are cases in which the production capacity temporarily becomes insufficient compared to the capacity allocated by the initial plan. To respond, production managers require capacity to other lines with compatible equipment. This decision can have an adverse effect on the entire line because the processes are connected in a sequence. In particular, it becomes more problematic when the machine group is a bottleneck process group. Therefore, this study proposes a capacity expansion policy learned by reinforcement learning algorithms in this environment using a FAB simulator built upon a WIP balancing scheduler and a machine disruption model. These policies performed better than policies imitating human decision in terms of throughput and machine efficiency.반도체공장은 설비 용량의 불안정성 때문에 초기 계획하여 할당된 설비 용량에 비해 일시적으로 생산 용량이 부족해지는 경우가 발생한다. 이를 대응하기 위해 생산 담당자들은 다른 라인에 호환가능한 설비를 공유하는 것을 요청하는데, 가능한 많은 양의 WIP에 대한 요청을 한다. 이러한 의사결정은 공정이 순차적으로 연결된 점 때문에 라인 전체 측면에서는 오히려 WIP Balancing을 악화시킬 수 있다. 특히 해당 공정군이 병목공정군인 경우 더 문제가 된다. 따라서 본 연구에서는 병목공정군을 중심으로 한 WIP Balancing scheduler를 이용하여 FAB simulator를 만든 뒤 이러한 환경속에서 강화학습 알고리즘으로 학습한 생산 용량 확장 정책을 제안한다. 이러한 정책은 throughput, machine efficiency 측면에서 사람의 의사결정을 모방한 정책보다 좋은 성과를 보였다.Abstract i Contents ii List of Tables iv List of Figures v Chapter 1 Introduction 1 1.1 Problem Description 3 1.2 Research Motivation and Contribution 5 1.3 Organization of the Thesis 5 Chapter 2 Literature Review 6 2.1 Review on FAB scheduling 6 2.2 Review on Dynamic production control 7 Chapter 3 Proposed Approach and Methodology 8 3.1 Proposed Approach 8 3.2 FAB Simulator 17 3.3 Reinforcement Learning Approach 26 Chapter 4 Computational Experiments 30 4.1 Experiment settings 30 4.2 Test Instances 31 4.3 Test Results 33 Chapter 5 Conclusions 37 Bibliography 38 국문초록 39Maste

    Demystifying reinforcement learning approaches for production scheduling

    Get PDF
    Recent years has seen a sharp rise in interest pertaining to Reinforcement Learning (RL) approaches for production scheduling. This is because RL is seen as a an advantageous compromise between the two most typical scheduling solution approaches, namely priority rules and exact approaches. However, there are many variations of both production scheduling problems and RL solutions. Additionally, the RL production scheduling literature is characterized by a lack of standardization, which leads to the field being shrouded in mysticism. The burden of showcasing the exact situations where RL outshines other approaches still lies with the research community. To pave the way towards this goal, we make the following four contributions to the scientific community, aiding in the process of RL demystification. First, we develop a standardization framework for RL scheduling approaches using a comprehensive literature review as a conduit. Secondly, we design and implement FabricatioRL, an open-source benchmarking simulation framework for production scheduling covering a vast array of scheduling problems and ensuring experiment reproducibility. Thirdly, we create a set of baseline scheduling algorithms sharing some of the RL advantages. The set of RL-competitive algorithms consists of a Constraint Programming (CP) meta-heuristic developed by us, CP3, and two simulation-based approaches namely a novel approach we call Simulation Search and Monte Carlo Tree Search. Fourth and finally, we use FabricatioRL to build two benchmarking instances for two popular stochastic production scheduling problems, and run fully reproducible experiments on them, pitting Double Deep Q Networks (DDQN) and AlphaGo Zero (AZ) against the chosen baselines and priority rules. Our results show that AZ manages to marginally outperform priority rules and DDQN, but fails to outperform our competitive baselines

    Diseño e implementación de algoritmos de Inteligencia Artificial basados en Q-learning para la programación de la producción en una empresa del sector cerámico

    Full text link
    [ES] En el Sistema de planificación y Control de Operaciones (SPCO) la programación de la producción se ubica en el ámbito de las decisiones a corto plazo e implica un nivel de complejidad muy elevado. La mayoría de los problemas de programación de la producción se pueden considerar problemas combinatorios NP-Hard, por lo que las soluciones óptimas no suelen ser una opción razonable en un entorno realista. El estudiante deberá realizar un análisis de las alternativas existentes en el ámbito de los algoritmos metaheurísticos, y también deberá revisar las propuestas que ámbito de la inteligencia artificial ha planteados en los últimos años para los problemas de programación de la producción. A partir de las conclusiones obtenidas planteará un conjunto de algoritmos candidatos a ofrecer buenas soluciones y los adaptará al contexto de una empresa fabricante de baldosas cerámicas. Los algoritmos serán programados mediante Phyton empleando la librería Anaconda. Los algoritmos se parametrizarán mediante un Diseño de Experimentos y después se compararán con otros algoritmos ganadores de la bibliografía analizada empleando un juego de datos estándar. En el último bloque del trabajo el estudiante planteará un caso realista basado en datos de una empresa cerámica y se resolverán diversos escenarios de programación de la producción esperando que se pueda confirmar que se ha producido una mejora con respecto a los métodos empleados previamente.[EN] In the Operations Planning and Control System (SPCO), production scheduling falls within the scope of short-term decisions and involves a very high level of complexity. Most production scheduling problems can be considered NP-Hard combination problems, so optimal solutions are not usually a reasonable option in a realistic environment. The student must carry out an analysis of the existing alternatives in the field of metaheuristic algorithms, and must also review the proposals that the field of artificial intelligence has made in recent years for the problems of production programming. Based on the conclusions obtained, he will propose a set of candidate algorithms to offer good solutions and adapt them to the context of a ceramic tile manufacturing company. The algorithms will be programmed by Phyton using the Anaconda library. The algorithms will be parameterized through a Design of Experiments and then compared with other winning algorithms from the analyzed literature using a standard data set. In the last block of the work the student will raise a realistic case based on data from a ceramic company and will solve various scenarios of production scheduling in the hope that it can be confirmed that there has been an improvement over the methods previously used.[CA] En el Sistema de planificació i Control d'Operacions (SPCO) la programació de la producció se situa en l'àmbit de les decisions a curt termini i implica un nivell de complexitat molt elevat. La majoria dels problemes de programació de la producció es poden considerar problemes combinatoris NP-Hard, de manera que les solucions òptimes no solen ser una opció raonable en un entorn realista. En el treball es revisaran les propostes que l'àmbit de la intel·ligència artificial ha plantejat en els últims anys per als problemes de programació de la producció. A partir de les conclusions obtingudes es plantejarà un conjunt d'algoritmes candidats a oferir bones solucions i s'adaptaran a el context d'una empresa fabricant de rajoles ceràmiques. Els algoritmes seran programats mitjançant Python fent servir l'entorn de programació Anaconda. Els algoritmes es parametrizarán mitjançant un Disseny d'Experiments i després es compararan entre ells emprant un joc de dades estàndard. En l'últim bloc de la feina es plantejarà un cas realista basat en dades d'una empresa ceràmica i es resoldran diversos escenaris de programació de la producció per observar el funcionament de l'algoritme guanyador escollit en aquest entorn esperant que es pugui confirmar que s'ha produït una millora pel que fa als mètodes emprats prèviament.Navarro Aláez, A. (2020). Diseño e implementación de algoritmos de Inteligencia Artificial basados en Q-learning para la programación de la producción en una empresa del sector cerámico. http://hdl.handle.net/10251/148026TFG

    Designing an adaptive production control system using reinforcement learning

    Get PDF
    Modern production systems face enormous challenges due to rising customer requirements resulting in complex production systems. The operational efficiency in the competitive industry is ensured by an adequate production control system that manages all operations in order to optimize key performance indicators. Currently, control systems are mostly based on static and model-based heuristics, requiring significant human domain knowledge and, hence, do not match the dynamic environment of manufacturing companies. Data-driven reinforcement learning (RL) showed compelling results in applications such as board and computer games as well as first production applications. This paper addresses the design of RL to create an adaptive production control system by the real-world example of order dispatching in a complex job shop. As RL algorithms are “black box” approaches, they inherently prohibit a comprehensive understanding. Furthermore, the experience with advanced RL algorithms is still limited to single successful applications, which limits the transferability of results. In this paper, we examine the performance of the state, action, and reward function RL design. When analyzing the results, we identify robust RL designs. This makes RL an advantageous control system for highly dynamic and complex production systems, mainly when domain knowledge is limited

    An intelligent resource allocation decision support system with Q-learning

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Setup Change Scheduling Under Due-date Constraints Using Deep Reinforcement Learning with Self-supervision

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 산업·조선공학부, 2021.8. 박종헌.납기 제약 하에서 셋업 스케줄을 수립하는 것은 현실의 여러 제조 산업에서 쉽게 찾아 볼 수 있으며 학계의 많은 관심을 끌고 있는 중대한 문제이다. 그러나 납기와 셋업 제약이 동시에 존재함에 따라 문제의 복잡도가 증가하게 되며, 시시각각 새로운 생산 계획이 주어지고 초기 설비 상태가 변화되는 환경에서 고품질의 스케줄 수립은 더 어려워진다. 본 논문에서는 학습된 심층신경망이 상기한 변화가 발생한 스케줄링 문제도 재학습 없이 해결할 수 있도록, 자기지도 기반 심층강화학습 기법을 제안한다. 구체적으로, 상태와 행동 표현을 생산 계획과 설비 상태에 무관한 차원을 갖도록 설계한다. 동시에 주어진 상태로부터 효율적으로 신경망을 학습하기 위해 파라미터 공유 구조를 도입한다. 이에 더하여, 스케줄링 문제에 적합한 자기지도를 고안하여 설비와 잡의 수, 생산 계획의 분포가 상이한 평가 환경으로도 일반화 가능한 심층신경망을 학습한다. 제안 기법의 유효성을 검증하기 위해 현실의 병렬설비 및 잡샵 공정을 모사한 대규모 데이터셋에서 집약적인 실험을 수행하였다. 제안 기법을 메타휴리스틱 기법과 다른 강화학습 기반 기법, 규칙 기반 기법과 비교함으로써 납기 준수 성능과 연산 시간 관점에서 우수성을 입증하였다. 더불어 상태 표현, 파라미터 공유, 자기지도 각각으로 인한 효과를 조사한 결과, 개별적으로 성능 개선에 기여함을 밝혀냈다.Setup change scheduling under due-date constraints has attracted much attention from academia and industry due to its practical applications. In a real-world manufacturing system, however, solving the scheduling problem becomes challenging since it is required to address urgent and frequent changes in demand and due-dates of products, and initial machine status. In this thesis, we propose a scheduling framework based on deep reinforcement learning (RL) with self-supervision in which trained neural networks (NNs) are able to solve unseen scheduling problems without re-training even when such changes occur. Specifically, we propose state and action representations whose dimensions are independent of production requirements and due-dates of jobs while accommodating family setups. At the same time, an NN architecture with parameter sharing was utilized to improve the training efficiency. Finally, we devise an additional self-supervised loss specific to the scheduling problem for training the NN scheduler robust to the variations in the numbers of machines and jobs, and distribution of production plans. We carried out extensive experiments in large-scale datasets that simulate the real-world wafer preparation facility and semiconductor packaging line. Experiment results demonstrate that the proposed method outperforms the recent metaheuristics, rule-based, and other RL-based methods in terms of the schedule quality and computation time for obtaining a schedule. Besides, we investigated individual contributions of the state representation, parameter sharing, and self-supervision on the performance improvements.제 1 장 서론 1 1.1 연구 동기 및 배경 1 1.2 연구 목적 및 공헌 4 1.3 논문구성 6 제 2 장 배경 7 2.1 순서 의존적 셋업이 있는 납기 제약 하에서의 스케줄링 문제 7 2.1.1 납기 제약 하에서의 스케줄링 문제 7 2.1.2 패밀리 셋업을 고려한 병렬설비 스케줄링 8 2.1.3 셋업 제약이 있는 잡샵 스케줄링 9 2.2 강화학습 기반 스케줄링 12 2.2.1 이론적 배경 12 2.2.2 강화학습을 이용한 제조 라인 스케줄링 13 2.2.3 스케줄링 문제에서의 심층강화학습 15 2.3 자기지도 기반 심층강화학습 19 제 3 장 문제 정의 22 3.1 병렬설비 스케줄링 문제 22 3.1.1 지연시간 최소화를 위한 병렬설비 스케줄링 문제 22 3.1.2 혼합정수계획 모형 24 3.1.3 예시 공정 25 3.2 잡샵 스케줄링 문제 26 3.2.1 투입량 최대화를 위한 유연잡샵 스케줄링 26 3.2.2 예시 공정 27 제 4 장 자기지도 기반 심층강화학습을 이용한 병렬설비 스케줄링 31 4.1 MDP 모형 31 4.1.1 행동 정의 31 4.1.2 상태 표현 32 4.1.3 보상 정의 37 4.1.4 상태 전이 38 4.1.5 예시 39 4.2 신경망 학습 41 4.2.1 심층신경망 구조 41 4.2.2 손실 함수 42 4.2.3 DQN 학습 절차 43 4.2.4 DQN 평가 절차 44 4.3 스케줄링 문제에서의 자기지도 46 4.3.1 내재적 보상 설계 46 4.3.2 셋업 스케줄링을 위한 선호도 점수 설계 47 4.4 자기지도 기반 DQN 학습 49 4.4.1 자기지도 손실 함수 49 4.4.2 학습 절차 50 제 5 장 자기지도 기반 심층강화학습을 이용한 잡샵 스케줄링 53 5.1 스케줄링 프레임워크 53 5.1.1 병목 공정 정의 53 5.1.2 디스패치 규칙 54 5.1.3 이산 사건 시뮬레이터 55 5.1.4 스케줄러 학습 56 5.2 투입 정책과 자기지도 58 5.3 MDP 모형 수정 59 5.3.1 행동 정의 59 5.3.2 상태 표현 59 5.3.3 보상 정의 61 제 6 장 실험 및 결과 62 6.1 병렬설비 스케줄링 문제 62 6.1.1 데이터셋 62 6.1.2 실험 세팅 64 6.1.3 지연시간 총합 성능 비교 67 6.1.4 상태 표현 방식에 따른 성능 비교 72 6.2 잡샵 스케줄링 문제 74 6.2.1 데이터셋 74 6.2.2 실험 세팅 75 6.2.3 투입량 성능 비교 77 6.2.4 행동 정의 방식에 따른 성능 비교 80 6.3 자기지도로 인한 효과 84 6.3.1 데이터셋 84 6.3.2 실험 세팅 86 6.3.3 파라미터 공유 여부에 따른 자기지도의 효과 87 6.3.4 학습 시와 다른 데이터셋에서의 성능 평가 91 제 7 장 결론 및 향후 연구 방향 96 7.1 결론 96 7.2 향후 연구 방향 98 참고문헌 100 Abstract 118 감사의 글 120박
    corecore