84,465 research outputs found

    제쑰 μ‹œμŠ€ν…œμ—μ„œμ˜ 예츑 λͺ¨λΈλ§μ„ μœ„ν•œ 지λŠ₯적 데이터 νšλ“

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 산업곡학과, 2021. 2. μ‘°μ„±μ€€.Predictive modeling is a type of supervised learning to find the functional relationship between the input variables and the output variable. Predictive modeling is used in various aspects in manufacturing systems, such as automation of visual inspection, prediction of faulty products, and result estimation of expensive inspection. To build a high-performance predictive model, it is essential to secure high quality data. However, in manufacturing systems, it is practically impossible to acquire enough data of all kinds that are needed for the predictive modeling. There are three main difficulties in the data acquisition in manufacturing systems. First, labeled data always comes with a cost. In many problems, labeling must be done by experienced engineers, which is costly. Second, due to the inspection cost, not all inspections can be performed on all products. Because of time and monetary constraints in the manufacturing system, it is impossible to obtain all the desired inspection results. Third, changes in the manufacturing environment make data acquisition difficult. A change in the manufacturing environment causes a change in the distribution of generated data, making it impossible to obtain enough consistent data. Then, the model have to be trained with a small amount of data. In this dissertation, we overcome this difficulties in data acquisition through active learning, active feature-value acquisition, and domain adaptation. First, we propose an active learning framework to solve the high labeling cost of the wafer map pattern classification. This makes it possible to achieve higher performance with a lower labeling cost. Moreover, the cost efficiency is further improved by incorporating the cluster-level annotation into active learning. For the inspection cost for fault prediction problem, we propose a active inspection framework. By selecting products to undergo high-cost inspection with the novel uncertainty estimation method, high performance can be obtained with low inspection cost. To solve the recipe transition problem that frequently occurs in faulty wafer prediction in semiconductor manufacturing, a domain adaptation methods are used. Through sequential application of unsupervised domain adaptation and semi-supervised domain adaptation, performance degradation due to recipe transition is minimized. Through experiments on real-world data, it was demonstrated that the proposed methodologies can overcome the data acquisition problems in the manufacturing systems and improve the performance of the predictive models.예츑 λͺ¨λΈλ§μ€ 지도 ν•™μŠ΅μ˜ μΌμ’…μœΌλ‘œ, ν•™μŠ΅ 데이터λ₯Ό 톡해 μž…λ ₯ λ³€μˆ˜μ™€ 좜λ ₯ λ³€μˆ˜ κ°„μ˜ ν•¨μˆ˜μ  관계λ₯Ό μ°ΎλŠ” 과정이닀. 이런 예츑 λͺ¨λΈλ§μ€ μœ‘μ•ˆ 검사 μžλ™ν™”, λΆˆλŸ‰ μ œν’ˆ 사전 탐지, κ³ λΉ„μš© 검사 κ²°κ³Ό μΆ”μ • λ“± 제쑰 μ‹œμŠ€ν…œ μ „λ°˜μ— 걸쳐 ν™œμš©λœλ‹€. 높은 μ„±λŠ₯의 예츑 λͺ¨λΈμ„ λ‹¬μ„±ν•˜κΈ° μœ„ν•΄μ„œλŠ” μ–‘μ§ˆμ˜ 데이터가 ν•„μˆ˜μ μ΄λ‹€. ν•˜μ§€λ§Œ 제쑰 μ‹œμŠ€ν…œμ—μ„œ μ›ν•˜λŠ” μ’…λ₯˜μ˜ 데이터λ₯Ό μ›ν•˜λŠ” 만큼 νšλ“ν•˜λŠ” 것은 ν˜„μ‹€μ μœΌλ‘œ 거의 λΆˆκ°€λŠ₯ν•˜λ‹€. 데이터 νšλ“μ˜ 어렀움은 크게 세가지 원인에 μ˜ν•΄ λ°œμƒν•œλ‹€. 첫번째둜, 라벨링이 된 λ°μ΄ν„°λŠ” 항상 λΉ„μš©μ„ μˆ˜λ°˜ν•œλ‹€λŠ” 점이닀. λ§Žμ€ λ¬Έμ œμ—μ„œ, 라벨링은 μˆ™λ ¨λœ μ—”μ§€λ‹ˆμ–΄μ— μ˜ν•΄ μˆ˜ν–‰λ˜μ–΄μ•Ό ν•˜κ³ , μ΄λŠ” 큰 λΉ„μš©μ„ λ°œμƒμ‹œν‚¨λ‹€. λ‘λ²ˆμ§Έλ‘œ, 검사 λΉ„μš© λ•Œλ¬Έμ— λͺ¨λ“  검사가 λͺ¨λ“  μ œν’ˆμ— λŒ€ν•΄ μˆ˜ν–‰λ  수 μ—†λ‹€. 제쑰 μ‹œμŠ€ν…œμ—λŠ” μ‹œκ°„μ , κΈˆμ „μ  μ œμ•½μ΄ μ‘΄μž¬ν•˜κΈ° λ•Œλ¬Έμ—, μ›ν•˜λŠ” λͺ¨λ“  검사 결과값을 νšλ“ν•˜λŠ” 것이 μ–΄λ ΅λ‹€. μ„Έλ²ˆμ§Έλ‘œ, 제쑰 ν™˜κ²½μ˜ λ³€ν™”κ°€ 데이터 νšλ“μ„ μ–΄λ ΅κ²Œ λ§Œλ“ λ‹€. 제쑰 ν™˜κ²½μ˜ λ³€ν™”λŠ” μƒμ„±λ˜λŠ” λ°μ΄ν„°μ˜ 뢄포λ₯Ό λ³€ν˜•μ‹œμΌœ, 일관성 μžˆλŠ” 데이터λ₯Ό μΆ©λΆ„νžˆ νšλ“ν•˜μ§€ λͺ»ν•˜κ²Œ ν•œλ‹€. 이둜 인해 적은 μ–‘μ˜ λ°μ΄ν„°λ§ŒμœΌλ‘œ λͺ¨λΈμ„ μž¬ν•™μŠ΅μ‹œμΌœμ•Ό ν•˜λŠ” 상황이 λΉˆλ²ˆν•˜κ²Œ λ°œμƒν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 이런 데이터 νšλ“μ˜ 어렀움을 κ·Ήλ³΅ν•˜κΈ° μœ„ν•΄ λŠ₯동 ν•™μŠ΅, λŠ₯동 피쳐값 νšλ“, 도메인 적응 방법을 ν™œμš©ν•œλ‹€. λ¨Όμ €, 웨이퍼 맡 νŒ¨ν„΄ λΆ„λ₯˜ 문제의 높은 라벨링 λΉ„μš©μ„ ν•΄κ²°ν•˜κΈ° μœ„ν•΄ λŠ₯λ™ν•™μŠ΅ ν”„λ ˆμž„μ›Œν¬λ₯Ό μ œμ•ˆν•œλ‹€. 이λ₯Ό 톡해 적은 라벨링 λΉ„μš©μœΌλ‘œ 높은 μ„±λŠ₯의 λΆ„λ₯˜ λͺ¨λΈμ„ ꡬ좕할 수 μžˆλ‹€. λ‚˜μ•„κ°€, ꡰ집 λ‹¨μœ„μ˜ 라벨링 방법을 λŠ₯λ™ν•™μŠ΅μ— μ ‘λͺ©ν•˜μ—¬ λΉ„μš© νš¨μœ¨μ„±μ„ ν•œμ°¨λ‘€ 더 κ°œμ„ ν•œλ‹€. μ œν’ˆ λΆˆλŸ‰ μ˜ˆμΈ‘μ— ν™œμš©λ˜λŠ” 검사 λΉ„μš© 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄μ„œλŠ” λŠ₯동 검사 방법을 μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” μƒˆλ‘œμš΄ λΆˆν™•μ‹€μ„± μΆ”μ • 방법을 톡해 κ³ λΉ„μš© 검사 λŒ€μƒ μ œν’ˆμ„ μ„ νƒν•¨μœΌλ‘œμ¨ 적은 검사 λΉ„μš©μœΌλ‘œ 높은 μ„±λŠ₯을 얻을 수 μžˆλ‹€. λ°˜λ„μ²΄ 제쑰의 웨이퍼 λΆˆλŸ‰ μ˜ˆμΈ‘μ—μ„œ λΉˆλ²ˆν•˜κ²Œ λ°œμƒν•˜λŠ” λ ˆμ‹œν”Ό λ³€κ²½ 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄μ„œλŠ” 도메인 적응 방법을 ν™œμš©ν•œλ‹€. 비ꡐ사 도메인 적응과 λ°˜κ΅μ‚¬ 도메인 μ μ‘μ˜ 순차적인 μ μš©μ„ 톡해 λ ˆμ‹œν”Ό 변경에 μ˜ν•œ μ„±λŠ₯ μ €ν•˜λ₯Ό μ΅œμ†Œν™”ν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ‹€μ œ 데이터에 λŒ€ν•œ μ‹€ν—˜μ„ 톡해 μ œμ•ˆλœ 방법둠듀이 μ œμ‘°μ‹œμŠ€ν…œμ˜ 데이터 νšλ“ 문제λ₯Ό κ·Ήλ³΅ν•˜κ³  예츑 λͺ¨λΈμ˜ μ„±λŠ₯을 높일 수 μžˆμŒμ„ ν™•μΈν•˜μ˜€λ‹€.1. Introduction 1 2. Literature Review 9 2.1 Review of Related Methodologies 9 2.1.1 Active Learning 9 2.1.2 Active Feature-value Acquisition 11 2.1.3 Domain Adaptation 14 2.2 Review of Predictive Modelings in Manufacturing 15 2.2.1 Wafer Map Pattern Classification 15 2.2.2 Fault Detection and Classification 16 3. Active Learning for Wafer Map Pattern Classification 19 3.1 Problem Description 19 3.2 Proposed Method 21 3.2.1 System overview 21 3.2.2 Prediction model 25 3.2.3 Uncertainty estimation 25 3.2.4 Query wafer selection 29 3.2.5 Query wafer labeling 30 3.2.6 Model update 30 3.3 Experiments 31 3.3.1 Data description 31 3.3.2 Experimental design 31 3.3.3 Results and discussion 34 4. Active Cluster Annotation for Wafer Map Pattern Classification 42 4.1 Problem Description 42 4.2 Proposed Method 44 4.2.1 Clustering of unlabeled data 46 4.2.2 CNN training with labeled data 48 4.2.3 Cluster-level uncertainty estimation 49 4.2.4 Query cluster selection 50 4.2.5 Cluster-level annotation 50 4.3 Experiments 51 4.3.1 Data description 51 4.3.2 Experimental setting 51 4.3.3 Clustering results 53 4.3.4 Classification performance 54 4.3.5 Analysis for label noise 57 5. Active Inspection for Fault Prediction 60 5.1 Problem Description 60 5.2 Proposed Method 65 5.2.1 Active inspection framework 65 5.2.2 Acquisition based on Expected Prediction Change 68 5.3 Experiments 71 5.3.1 Data description 71 5.3.2 Fault prediction models 72 5.3.3 Experimental design 73 5.3.4 Results and discussion 74 6. Adaptive Fault Detection for Recipe Transition 76 6.1 Problem Description 76 6.2 Proposed Method 78 6.2.1 Overview 78 6.2.2 Unsupervised adaptation phase 81 6.2.3 Semi-supervised adaptation phase 83 6.3 Experiments 85 6.3.1 Data description 85 6.3.2 Experimental setting 85 6.3.3 Performance degradation caused by recipe transition 86 6.3.4 Effect of unsupervised adaptation 87 6.3.5 Effect of semi-supervised adaptation 88 7. Conclusion 91 7.1 Contributions 91 7.2 Future work 94Docto

    A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    Full text link
    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality, scalability, and issues which are related to diversity and inconsistency in content of different web pages. Due to the wide range of domains and the dynamic environments that the Semantic Annotation systems must be performed on, the problem of automating annotation process is one of the significant challenges in this domain. To overcome this problem, different machine learning approaches such as supervised learning, unsupervised learning and more recent ones like, semi-supervised learning and active learning have been utilized. In this paper we present an inclusive layered classification of Semantic Annotation challenges and discuss the most important issues in this field. Also, we review and analyze machine learning applications for solving semantic annotation problems. For this goal, the article tries to closely study and categorize related researches for better understanding and to reach a framework that can map machine learning techniques into the Semantic Annotation challenges and requirements

    Challenges and solutions for Latin named entity recognition

    Get PDF
    Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality

    Global stellar variability study in the field-of-view of the Kepler satellite

    Full text link
    We present the results of an automated variability analysis of the Kepler public data measured in the first quarter (Q1) of the mission. In total, about 150 000 light curves have been analysed to detect stellar variability, and to identify new members of known variability classes. We also focus on the detection of variables present in eclipsing binary systems, given the important constraints on stellar fundamental parameters they can provide. The methodology we use here is based on the automated variability classification pipeline which was previously developed for and applied successfully to the CoRoT exofield database and to the limited subset of a few thousand Kepler asteroseismology light curves. We use a Fourier decomposition of the light curves to describe their variability behaviour and use the resulting parameters to perform a supervised classification. Several improvements have been made, including a separate extractor method to detect the presence of eclipses when other variability is present in the light curves. We also included two new variability classes compared to previous work: variables showing signs of rotational modulation and of activity. Statistics are given on the number of variables and the number of good candidates per class. A comparison is made with results obtained for the CoRoT exoplanet data. We present some special discoveries, including variable stars in eclipsing binary systems. Many new candidate non-radial pulsators are found, mainly Delta Sct and Gamma Dor stars. We have studied those samples in more detail by using 2MASS colours. The full classification results are made available as an online catalogue.Comment: 15 pages, 5 figures, Accepted for publication in Astronomy and Astrophysics on 09/02/201
    • …
    corecore