5 research outputs found

    A dynamic scheduling method with Conv-Dueling and generalized representation based on reinforcement learning

    Get PDF
    In modern industrial manufacturing, there are uncertain dynamic disturbances between processing machines and jobs which will disrupt the original production plan. This research focuses on dynamic multi-objective flexible scheduling problems such as the multi-constraint relationship among machines, jobs, and uncertain disturbance events. The possible disturbance events include job insertion, machine breakdown, and processing time change. The paper proposes a conv-dueling network model, a multidimensional state representation of the job processing information, and multiple scheduling objectives for minimizing makespan and delay time, while maximizing the completion punctuality rate. We design a multidimensional state space that includes job and machine processing information, an efficient and complete intelligent agent scheduling action space, and a compound scheduling reward function that combines the main task and the branch task. The unsupervised training of the network model utilizes the dueling-double-deep Q-network (D3QN) algorithm. Finally, based on the multi-constraint and multi-disturbance production environment information, the multidimensional state representation matrix of the job is used as input and the optimal scheduling rules are output after the feature extraction of the conv-dueling network model and decision making. This study carries out simulation experiments on 50 test cases. The results show the proposed conv-dueling network model can quickly converge for DQN, DDQN, and D3QN algorithms, and has good stability and universality. The experimental results indicate that the scheduling algorithm proposed in this paper outperforms DQN, DDQN, and single scheduling algorithms in all three scheduling objectives. It also demonstrates high robustness and excellent comprehensive scheduling performance

    Resolució d’un problema de planificació dinàmica de treballs mitjançant aprenentatge per reforç

    Get PDF
    El problema Job Shop Scheduling, en el que s’ha de determinar l’ordre o seqüència òptima per a processar una sèrie de treballs en una sèrie de màquines, ha rebut una gran atenció en el món de l’organització industrial. Multitud d’heurístiques i models s’han anat proposant per tal de resoldre’l. Aquests algorismes però, acostumen a proposar solucions al problema simplificat, resolent-lo amb una sèrie de restriccions i limitacions que fan que la seva implementació sigui limitada. A més a més, la seva utilitat es veu reduïda quan es considera el problema dinàmic i no estàtic, considerant esdeveniments estocàstics com la fallada d’una màquina o l’arribada d’un nou treball. L’ús de models matemàtics exactes deixa de ser una proposta factible quan les dimensions del problema (número de màquines i treballs) augmenten. En aquest document es proposa una solució al problema d’assignació de treballs a màquines en un entorn dinàmic, on són presents esdeveniments estocàstics com la fallada o aturada d’una màquina i el temps d’arribada dels treballs a la zona de producció. Aquest problema s’aborda mitjançant l’Aprenentatge per Reforç (Reinforcement Learning), una branca de la intel·ligència artificial basada en la prova i error i l’experiènci

    Setup Change Scheduling Under Due-date Constraints Using Deep Reinforcement Learning with Self-supervision

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 공과대학 산업·조선공학부, 2021.8. 박종헌.납기 제약 하에서 셋업 스케줄을 수립하는 것은 현실의 여러 제조 산업에서 쉽게 찾아 볼 수 있으며 학계의 많은 관심을 끌고 있는 중대한 문제이다. 그러나 납기와 셋업 제약이 동시에 존재함에 따라 문제의 복잡도가 증가하게 되며, 시시각각 새로운 생산 계획이 주어지고 초기 설비 상태가 변화되는 환경에서 고품질의 스케줄 수립은 더 어려워진다. 본 논문에서는 학습된 심층신경망이 상기한 변화가 발생한 스케줄링 문제도 재학습 없이 해결할 수 있도록, 자기지도 기반 심층강화학습 기법을 제안한다. 구체적으로, 상태와 행동 표현을 생산 계획과 설비 상태에 무관한 차원을 갖도록 설계한다. 동시에 주어진 상태로부터 효율적으로 신경망을 학습하기 위해 파라미터 공유 구조를 도입한다. 이에 더하여, 스케줄링 문제에 적합한 자기지도를 고안하여 설비와 잡의 수, 생산 계획의 분포가 상이한 평가 환경으로도 일반화 가능한 심층신경망을 학습한다. 제안 기법의 유효성을 검증하기 위해 현실의 병렬설비 및 잡샵 공정을 모사한 대규모 데이터셋에서 집약적인 실험을 수행하였다. 제안 기법을 메타휴리스틱 기법과 다른 강화학습 기반 기법, 규칙 기반 기법과 비교함으로써 납기 준수 성능과 연산 시간 관점에서 우수성을 입증하였다. 더불어 상태 표현, 파라미터 공유, 자기지도 각각으로 인한 효과를 조사한 결과, 개별적으로 성능 개선에 기여함을 밝혀냈다.Setup change scheduling under due-date constraints has attracted much attention from academia and industry due to its practical applications. In a real-world manufacturing system, however, solving the scheduling problem becomes challenging since it is required to address urgent and frequent changes in demand and due-dates of products, and initial machine status. In this thesis, we propose a scheduling framework based on deep reinforcement learning (RL) with self-supervision in which trained neural networks (NNs) are able to solve unseen scheduling problems without re-training even when such changes occur. Specifically, we propose state and action representations whose dimensions are independent of production requirements and due-dates of jobs while accommodating family setups. At the same time, an NN architecture with parameter sharing was utilized to improve the training efficiency. Finally, we devise an additional self-supervised loss specific to the scheduling problem for training the NN scheduler robust to the variations in the numbers of machines and jobs, and distribution of production plans. We carried out extensive experiments in large-scale datasets that simulate the real-world wafer preparation facility and semiconductor packaging line. Experiment results demonstrate that the proposed method outperforms the recent metaheuristics, rule-based, and other RL-based methods in terms of the schedule quality and computation time for obtaining a schedule. Besides, we investigated individual contributions of the state representation, parameter sharing, and self-supervision on the performance improvements.제 1 장 서론 1 1.1 연구 동기 및 배경 1 1.2 연구 목적 및 공헌 4 1.3 논문구성 6 제 2 장 배경 7 2.1 순서 의존적 셋업이 있는 납기 제약 하에서의 스케줄링 문제 7 2.1.1 납기 제약 하에서의 스케줄링 문제 7 2.1.2 패밀리 셋업을 고려한 병렬설비 스케줄링 8 2.1.3 셋업 제약이 있는 잡샵 스케줄링 9 2.2 강화학습 기반 스케줄링 12 2.2.1 이론적 배경 12 2.2.2 강화학습을 이용한 제조 라인 스케줄링 13 2.2.3 스케줄링 문제에서의 심층강화학습 15 2.3 자기지도 기반 심층강화학습 19 제 3 장 문제 정의 22 3.1 병렬설비 스케줄링 문제 22 3.1.1 지연시간 최소화를 위한 병렬설비 스케줄링 문제 22 3.1.2 혼합정수계획 모형 24 3.1.3 예시 공정 25 3.2 잡샵 스케줄링 문제 26 3.2.1 투입량 최대화를 위한 유연잡샵 스케줄링 26 3.2.2 예시 공정 27 제 4 장 자기지도 기반 심층강화학습을 이용한 병렬설비 스케줄링 31 4.1 MDP 모형 31 4.1.1 행동 정의 31 4.1.2 상태 표현 32 4.1.3 보상 정의 37 4.1.4 상태 전이 38 4.1.5 예시 39 4.2 신경망 학습 41 4.2.1 심층신경망 구조 41 4.2.2 손실 함수 42 4.2.3 DQN 학습 절차 43 4.2.4 DQN 평가 절차 44 4.3 스케줄링 문제에서의 자기지도 46 4.3.1 내재적 보상 설계 46 4.3.2 셋업 스케줄링을 위한 선호도 점수 설계 47 4.4 자기지도 기반 DQN 학습 49 4.4.1 자기지도 손실 함수 49 4.4.2 학습 절차 50 제 5 장 자기지도 기반 심층강화학습을 이용한 잡샵 스케줄링 53 5.1 스케줄링 프레임워크 53 5.1.1 병목 공정 정의 53 5.1.2 디스패치 규칙 54 5.1.3 이산 사건 시뮬레이터 55 5.1.4 스케줄러 학습 56 5.2 투입 정책과 자기지도 58 5.3 MDP 모형 수정 59 5.3.1 행동 정의 59 5.3.2 상태 표현 59 5.3.3 보상 정의 61 제 6 장 실험 및 결과 62 6.1 병렬설비 스케줄링 문제 62 6.1.1 데이터셋 62 6.1.2 실험 세팅 64 6.1.3 지연시간 총합 성능 비교 67 6.1.4 상태 표현 방식에 따른 성능 비교 72 6.2 잡샵 스케줄링 문제 74 6.2.1 데이터셋 74 6.2.2 실험 세팅 75 6.2.3 투입량 성능 비교 77 6.2.4 행동 정의 방식에 따른 성능 비교 80 6.3 자기지도로 인한 효과 84 6.3.1 데이터셋 84 6.3.2 실험 세팅 86 6.3.3 파라미터 공유 여부에 따른 자기지도의 효과 87 6.3.4 학습 시와 다른 데이터셋에서의 성능 평가 91 제 7 장 결론 및 향후 연구 방향 96 7.1 결론 96 7.2 향후 연구 방향 98 참고문헌 100 Abstract 118 감사의 글 120박

    Diseño e implementación de algoritmos de Inteligencia Artificial basados en Q-learning para la programación de la producción en una empresa del sector cerámico

    Full text link
    [ES] En el Sistema de planificación y Control de Operaciones (SPCO) la programación de la producción se ubica en el ámbito de las decisiones a corto plazo e implica un nivel de complejidad muy elevado. La mayoría de los problemas de programación de la producción se pueden considerar problemas combinatorios NP-Hard, por lo que las soluciones óptimas no suelen ser una opción razonable en un entorno realista. El estudiante deberá realizar un análisis de las alternativas existentes en el ámbito de los algoritmos metaheurísticos, y también deberá revisar las propuestas que ámbito de la inteligencia artificial ha planteados en los últimos años para los problemas de programación de la producción. A partir de las conclusiones obtenidas planteará un conjunto de algoritmos candidatos a ofrecer buenas soluciones y los adaptará al contexto de una empresa fabricante de baldosas cerámicas. Los algoritmos serán programados mediante Phyton empleando la librería Anaconda. Los algoritmos se parametrizarán mediante un Diseño de Experimentos y después se compararán con otros algoritmos ganadores de la bibliografía analizada empleando un juego de datos estándar. En el último bloque del trabajo el estudiante planteará un caso realista basado en datos de una empresa cerámica y se resolverán diversos escenarios de programación de la producción esperando que se pueda confirmar que se ha producido una mejora con respecto a los métodos empleados previamente.[EN] In the Operations Planning and Control System (SPCO), production scheduling falls within the scope of short-term decisions and involves a very high level of complexity. Most production scheduling problems can be considered NP-Hard combination problems, so optimal solutions are not usually a reasonable option in a realistic environment. The student must carry out an analysis of the existing alternatives in the field of metaheuristic algorithms, and must also review the proposals that the field of artificial intelligence has made in recent years for the problems of production programming. Based on the conclusions obtained, he will propose a set of candidate algorithms to offer good solutions and adapt them to the context of a ceramic tile manufacturing company. The algorithms will be programmed by Phyton using the Anaconda library. The algorithms will be parameterized through a Design of Experiments and then compared with other winning algorithms from the analyzed literature using a standard data set. In the last block of the work the student will raise a realistic case based on data from a ceramic company and will solve various scenarios of production scheduling in the hope that it can be confirmed that there has been an improvement over the methods previously used.[CA] En el Sistema de planificació i Control d'Operacions (SPCO) la programació de la producció se situa en l'àmbit de les decisions a curt termini i implica un nivell de complexitat molt elevat. La majoria dels problemes de programació de la producció es poden considerar problemes combinatoris NP-Hard, de manera que les solucions òptimes no solen ser una opció raonable en un entorn realista. En el treball es revisaran les propostes que l'àmbit de la intel·ligència artificial ha plantejat en els últims anys per als problemes de programació de la producció. A partir de les conclusions obtingudes es plantejarà un conjunt d'algoritmes candidats a oferir bones solucions i s'adaptaran a el context d'una empresa fabricant de rajoles ceràmiques. Els algoritmes seran programats mitjançant Python fent servir l'entorn de programació Anaconda. Els algoritmes es parametrizarán mitjançant un Disseny d'Experiments i després es compararan entre ells emprant un joc de dades estàndard. En l'últim bloc de la feina es plantejarà un cas realista basat en dades d'una empresa ceràmica i es resoldran diversos escenaris de programació de la producció per observar el funcionament de l'algoritme guanyador escollit en aquest entorn esperant que es pugui confirmar que s'ha produït una millora pel que fa als mètodes emprats prèviament.Navarro Aláez, A. (2020). Diseño e implementación de algoritmos de Inteligencia Artificial basados en Q-learning para la programación de la producción en una empresa del sector cerámico. http://hdl.handle.net/10251/148026TFG

    Demystifying reinforcement learning approaches for production scheduling

    Get PDF
    Recent years has seen a sharp rise in interest pertaining to Reinforcement Learning (RL) approaches for production scheduling. This is because RL is seen as a an advantageous compromise between the two most typical scheduling solution approaches, namely priority rules and exact approaches. However, there are many variations of both production scheduling problems and RL solutions. Additionally, the RL production scheduling literature is characterized by a lack of standardization, which leads to the field being shrouded in mysticism. The burden of showcasing the exact situations where RL outshines other approaches still lies with the research community. To pave the way towards this goal, we make the following four contributions to the scientific community, aiding in the process of RL demystification. First, we develop a standardization framework for RL scheduling approaches using a comprehensive literature review as a conduit. Secondly, we design and implement FabricatioRL, an open-source benchmarking simulation framework for production scheduling covering a vast array of scheduling problems and ensuring experiment reproducibility. Thirdly, we create a set of baseline scheduling algorithms sharing some of the RL advantages. The set of RL-competitive algorithms consists of a Constraint Programming (CP) meta-heuristic developed by us, CP3, and two simulation-based approaches namely a novel approach we call Simulation Search and Monte Carlo Tree Search. Fourth and finally, we use FabricatioRL to build two benchmarking instances for two popular stochastic production scheduling problems, and run fully reproducible experiments on them, pitting Double Deep Q Networks (DDQN) and AlphaGo Zero (AZ) against the chosen baselines and priority rules. Our results show that AZ manages to marginally outperform priority rules and DDQN, but fails to outperform our competitive baselines
    corecore