Search CORE

381 research outputs found

Dimension reduction for Gaussian process emulation: an application to the influence of bathymetry on tsunami heights

Author: Guillas Serge
Liu Xiaoyu
Publication venue
Publication date: 24/08/2016
Field of study

High accuracy complex computer models, or simulators, require large resources in time and memory to produce realistic results. Statistical emulators are computationally cheap approximations of such simulators. They can be built to replace simulators for various purposes, such as the propagation of uncertainties from inputs to outputs or the calibration of some internal parameters against observations. However, when the input space is of high dimension, the construction of an emulator can become prohibitively expensive. In this paper, we introduce a joint framework merging emulation with dimension reduction in order to overcome this hurdle. The gradient-based kernel dimension reduction technique is chosen due to its ability to drastically decrease dimensionality with little loss in information. The Gaussian process emulation technique is combined with this dimension reduction approach. Our proposed approach provides an answer to the dimension reduction issue in emulation for a wide range of simulation problems that cannot be tackled using existing methods. The efficiency and accuracy of the proposed framework is demonstrated theoretically, and compared with other methods on an elliptic partial differential equation (PDE) problem. We finally present a realistic application to tsunami modeling. The uncertainties in the bathymetry (seafloor elevation) are modeled as high-dimensional realizations of a spatial process using a geostatistical approach. Our dimension-reduced emulation enables us to compute the impact of these uncertainties on resulting possible tsunami wave heights near-shore and on-shore. We observe a significant increase in the spread of uncertainties in the tsunami heights due to the contribution of the bathymetry uncertainties. These results highlight the need to include the effect of uncertainties in the bathymetry in tsunami early warnings and risk assessments.Comment: 26 pages, 8 figures, 2 table

arXiv.org e-Print Archive

UCL Discovery

Regularized System Identification

Author: Chen Tianshi
Chiuso Alessandro
De Nicolao Giuseppe
Ljung Lennart
Pillonetto Gianluigi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

This open access book provides a comprehensive treatment of recent developments in kernel-based identification that are of interest to anyone engaged in learning dynamic systems from data. The reader is led step by step into understanding of a novel paradigm that leverages the power of machine learning without losing sight of the system-theoretical principles of black-box identification. The authors’ reformulation of the identification problem in the light of regularization theory not only offers new insight on classical questions, but paves the way to new and powerful algorithms for a variety of linear and nonlinear problems. Regression methods such as regularization networks and support vector machines are the basis of techniques that extend the function-estimation problem to the estimation of dynamic models. Many examples, also from real-world applications, illustrate the comparative advantages of the new nonparametric approach with respect to classic parametric prediction error methods. The challenges it addresses lie at the intersection of several disciplines so Regularized System Identification will be of interest to a variety of researchers and practitioners in the areas of control systems, machine learning, statistics, and data science. This is an open access book

Archivio Istituzionale della Ricerca - Università degli Studi di Pavia

Directory of Open Access Books (DOAB)

Archivio istituzionale della ricerca - Università di Padova

Penalized Likelihood and Bayesian Function Selection in Regression Models

Author: Fahrmeir Ludwig
Kneib Thomas
Scheipl Fabian
Publication venue
Publication date: 04/03/2013
Field of study

Challenging research in various fields has driven a wide range of methodological advances in variable selection for regression models with high-dimensional predictors. In comparison, selection of nonlinear functions in models with additive predictors has been considered only more recently. Several competing suggestions have been developed at about the same time and often do not refer to each other. This article provides a state-of-the-art review on function selection, focusing on penalized likelihood and Bayesian concepts, relating various approaches to each other in a unified framework. In an empirical comparison, also including boosting, we evaluate several methods through applications to simulated and real data, thereby providing some guidance on their performance in practice

arXiv.org e-Print Archive

CiteSeerX

Learning from data: Plant breeding applications of machine learning

Author: Xavier Alencar
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

Increasingly, new sources of data are being incorporated into plant breeding pipelines. Enormous amounts of data from field phenomics and genotyping technologies places data mining and analysis into a completely different level that is challenging from practical and theoretical standpoints. Intelligent decision-making relies on our capability of extracting from data useful information that may help us to achieve our goals more efficiently. Many plant breeders, agronomists and geneticists perform analyses without knowing relevant underlying assumptions, strengths or pitfalls of the employed methods. The study endeavors to assess statistical learning properties and plant breeding applications of supervised and unsupervised machine learning techniques. A soybean nested association panel (aka. SoyNAM) was the base-population for experiments designed in situ and in silico. We used mixed models and Markov random fields to evaluate phenotypic-genotypic-environmental associations among traits and learning properties of genome-wide prediction methods. Alternative methods for analyses were proposed

Purdue E-Pubs

마르코프 랜덤 필드 학습 및 추론과 그래프 라쏘를 활용한 공정 이상 감지 및 진단 방법론

Author: 김창수
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 공과대학 화학생물공학부, 2019. 2. 이원보.Fault detection and diagnosis (FDD) is an essential part of safe plant operation. Fault detection refers to the process of detecting the occurrence of a fault quickly and accurately, and representative methods include the use of principal component analysis (PCA), and autoencoders (AE). Fault diagnosis is the process of isolating the root cause node of the fault, then determining the fault propagation path to identify the characteristic of the fault. Among the various methods, data-driven methods are the most widely-used, due to their applicability and good performance compared to analytical and knowledge-based methods. Although many studies have been conducted regarding FDD, no methodology for conducting every step of FDD exists, where the fault is effectively detected and diagnosed. Moreover, existing methods have limited applicability and show limited performance. Previous fault detection methods show loss of variable characteristics in dimensionality reduction methods and have large computational loads, leading to poor performance for complex faults. Likewise, preceding fault diagnosis methods show inaccurate fault isolation results, and biased fault propagation path analysis as a consequence of implementing knowledge-based characteristics for construction of digraphs of process variable relationships. Thus a comprehensive methodology for FDD which shows good performance for complex faults and variable relationships, is required. In this study, an efficient and effective comprehensive FDD methodology based on Markov random fields (MRF) modelling is proposed. MRFs provide an effective means for modelling complex variable relationships, and allows efficient computation of marginal probability of the process variables, leading to good performance regarding FDD. First, a fault detection framework for process variables, integrating the MRF modelling and structure learning with iterative graphical lasso is proposed. Graphical lasso is an algorithm for learning the structure of MRFs, and is applicable to large variable sets since it approximates the MRF structure by assuming the relationships between variables to be Gaussian. By iteratively applying the graphical lasso to monitored variables, the variable set is subdivided into smaller groups, and consequently the computational cost of MRF inference is mitigated allowing efficient fault detection. After variable groups are obtained through iterative graphical lasso, they are subject to the MRF monitoring framework that is proposed in this work. The framework obtains the monitoring statistics by calculating the probability density of the variable groups through kernel density estimation, and the monitoring limits are obtained separately for each group by using a false alarm rate of 5%. Second, a fault isolation and propagation path analysis methodology is proposed, where the conditional marginal probability of each variable is computed via inference, then is used to calculate the conditional contribution of individual variables during the occurrence of a fault. Using the kernel belief propagation (KBP) algorithm, which is an algorithm for learning and inferencing MRFs comprising continuous variables, the parameters of MRF are trained using normal process data, then the individual conditional contribution of each variable is calculated for every sample of the fault process data. By analyzing the magnitude and reaction speed of the conditional contribution of individual variables, the root fault node can be isolated and the fault propagation path can be determined effectively. Finally, the proposed methodology is verified by applying it to the well-known Tennessee Eastman process (TEP) model. Since the TEP has been used as a benchmark process over the past years for verifying various FDD methods, it serves the purpose of performance comparison. Also, since it consists of multiple units and has complex variable relationships such as recycle loops, it is suitable for verifying the performance of the proposed methodology. Application results show that the proposed methodology performs better compared to state-of-the-art FDD algorithms, in terms of both fault detection and diagnosis. Fault detection results showed that all 28 faults designed inside the TEP model were detected with a fault detection accuracy of over 95%, which is higher than any other previously proposed fault detection method. Also, the method showed good fault isolation and propagation path analysis results, where the root-cause node for every fault was detected correctly, and the characteristics of the initiated faults were identified through fault propagation path analysis.공정 이상의 감지 및 진단 시스템은 안전한 공정 운영에 필수적인 부분이다. 이상 감지는 이상이 발생했을 경우 즉각적으로 이를 정확하게 감지하는 프로세스를 의미하며, 대표적인 방법으로는 주성분 분석 및 오토인코더를 활용한 감지 방법론이 있다. 이상 진단은 결함의 근본 원인이 되는 노드를 격리하고, 이상의 전파 경로를 탐지하여 이상의 특성을 식별하는 프로세스이다. 공정 이상의 감지 및 진단 방법론에는 모델 분석 방법론, 지식 기반 방법론 등의 다양한 방법론이 있지만, 공정에 대한 적용 가능성과 성능 측면에서 가장 유용하다고 알려져 있는 데이터 기반 방법론이 널리 활용되고 있다. 공정 이상의 감지 및 진단에 대한 데이터 기반 방법론은 다방면으로 연구되어 왔지만, 이상 감지 및 진단을 모두 효과적으로 수행할 수 있는 방법론은 소수에 불과하며, 존재하고 있는 방법론들 역시 두 분야 모두에서 좋은 성능을 보여주고 있는 경우는 없다. 이는 기존 방법론들의 적용 가능성이 제한되어 있으며 공정에 적용시 제한된 성능을 보여주기 때문이다. 이상 감지의 경우, 대용량의 데이터를 처리할 때 발생하는 과부하로 인한 감지 능력의 저하, 차원 축소 방법론들을 사용할 시 이에 따른 변수 특성 반영의 부정확성, 그리고 축소된 차원에서의 계산으로 인하여 복합적인 형태의 이상을 감지해 내지 못하는 문제 등이 있다. 이상 진단의 경우 이상의 원인이 되는 노드의 격리 및 이상 전파 경로에 대한 분석이 부정확한 경우가 많은데, 이는 차원 축소로 인하여 공정 변수의 특성이 소실되는 성질이 있고, 방향성 그래프를 활용할 시 공정에 대한 선행 지식을 적용함으로써 편향된 이상 진단 결과가 나타나는 경우들이 발생하기 때문이다. 기존 방법론들에 대한 이러한 한계점들을 고려해 봤을때, 변수 각각의 특성이 소실되지 않도록하여 효과적으로 이상에 대한 감지와 진단을 모두 수행해 낼 수 있으면서도, 계산상의 효율성을 갖춘, 이상 감지 및 진단에 대한 통합된 방법론의 개발이 시급하다고 할 수 있다. 본 연구에서는 마르코프 랜덤 필드 모델링과 그래프 라쏘를 기반으로하여, 이상에 대한 감지 및 진단을 모두 수행해 낼 수 있는 통합적인 공정 모니터링 방법론을 제안한다. 마르코프 랜덤 필드는 비선형적이고 비정규적인 변수 관계를 효과적으로 모델링할 수 있게 해주고, 이상 발생 상황에서의 모니터링 통계값 계산시에 각 변수의 특성을 반영하여 확률 계산을 해 낼 수 있기 때문에 효과적인 이상 감지 및 진단 수단이 된다. 기본적으로 마르코프 랜덤 필드는 확률값 계산시의 부하가 크지만, 본 연구에서는 그래프 라쏘 방법론을 추가적으로 함께 활용하여 계산 상의 부하를 줄이고 효율적으로 이상 감지 및 진단을 해낼 수 있도록 하였다. 본 연구에서 제안된 내용들은 다음과 같다. 첫째, 공정 변수를 마르코프 랜덤 필드 형태로 모델링하고, 그래프 라쏘를 활용해 마르코프 랜덤 필드의 구조를 얻을 수 있는 방법론을 제시하였다. 그래프 라쏘는 마르코프 랜덤 필드의 구조를 파악하기 위한 방법론인데, 변수 간의 관계를 가우스 함수의 형태로 가정하기 때문에 다변수 시스템에서도 효율적으로 그래프 구조를 파악할 수 있도록 해준다. 본 연구에서는 반복적 그래프 라쏘를 제안하여 모든 공정 변수들이 상관관계가 높은 변수 집단으로 묶일 수 있도록 하였다. 이를 활용하면 전체 공정 변수 집단을 다수의 소집단으로 분류하고 각각에 대한 그래프 구조를 파악할 수 있게 되는데, 크게 두 가지의 효과를 얻을 수 있다. 우선적으로 마르코프 랜덤 필드 확률 계산의 대상이 되는 변수의 개수를 줄여줌으로써 계산 부하를 줄이고 효율적인 이상 감지가 이루어질 수 있도록 한다. 또한 상관관계가 높은 집단끼리 묶여서 모델링 된 그래프를 활용하여 이상의 진단 과정에서 공정 변수 간의 관계 파악 및 전파 경로 분석을 용이하도록 해준다. 두 번째로, 마르코프 랜덤 필드의 확률 추론을 기반으로 하여 효과적으로 이상 감지가 이루어질 수 있도록 하는 방법론을 제안하였다. 반복적 그래프 라쏘를 통해 얻어진 다수의 변수 소집단에 대하여 각각 확률 추론을 적용하여 이상 감지를 진행하게 되는데, 제안된 방법론에서는 커널 밀도 추정 방법론을 활용하였다. 정상 데이터를 활용하여 각 변수들에 대한 커널 밀도의 대역폭을 학습하고, 이상 데이터가 발생할 시 이를 활용한 커널 밀도 추정법을 사용하여 이상감시 통계치를 계산하게 된다. 이때 허위 진단율을 5%로 가정하여 각각의 소집단에 대한 공정 감지 기준선을 설정하였고, 이상감시 통계치가 공정 감시 기준선보다 낮게 될 경우 이상이 감지된다. 세 번째로, 이상 발생 시 원인이 되는 변수의 격리 및 이상 전파 경로 분석을 효과적으로 수행할 수 있는 방법론을 제시하였다. 제시된 방법론에서는 마르코프 랜덤 필드의 확률 추론 과정을 활용하여 이상 발생 시 각 변수의 조건부 한계 확률을 계산하고, 이를 활용해 새롭게 정의된 조건부 기여도 값을 계산하여, 이상에 대한 각 변수의 기여도를 파악할 수 있도록 한다. 이 과정에서는 커널 신뢰도 전파 방법론이 사용되는데, 이는 연속 변수를 가지는 마르코프 랜덤 필드에 대하여 확률 추론을 수행할 수 있도록 하는 방법론이다. 커널 신뢰도 전파법을 사용하면 정상 상태의 공정 데이터를 활용하여 마르코프 랜덤 필드를 구성하는 파라미터 값들을 학습하고, 이상 발생시 이상 데이터에 대하여 각 변수의 조건부 기여도 값을 계산할 수 있게 된다. 이 때 계산된 조건부 기여도 값의 크기와, 이상 발생 이후 각 변수의 조건부 기여도 값의 변화 반응 속도를 종합적으로 판단하여, 이상의 원인 변수에 대한 격리와 이상 전파 경로 분석을 효과적으로 수행할 수 있게 된다. 본 연구에서는 제안된 이상 감지 및 진단 방법론의 성능을 검증하기 위하여 테네시 이스트만 공정 모델에 이를 적용하고 결과를 분석하였다. 테네시 이스트만 공정은 수년간 공정 감시 방법론을 검증하기 위한 벤치마크 공정으로 널리 사용되어 왔기 때문에, 제시된 방법론을 이에 적용해 봄으로써 다른 공정 감시 방법론들과의 성능을 비교해 볼 수 있었다. 또한 다수의 단위 공정을 포함하고 있고, 순환적인 변수 관계 역시 포함하고 있기 때문에 제시된 방법론의 성능을 시험해 보기에 적합했다. 테네시 이스트만 공정 내부에는 28개 종류의 이상이 프로그램 상에 내장되어 있는데, 제시된 공정 감지 방법론을 적용한 결과 모든 이상에 대하여 96% 이상의 높은 이상 감지율을 나타내었다. 이는 기존에 제시된 공정 감시 방법론들에 비하여 월등히 높은 수치였다. 또한 이상 진단 성능을 분석해 본 결과, 모든 이상에 대하여 원인이 되는 노드를 효과적으로 파악할 수 있었고, 이상 전파 경로 역시 정확하게 탐지하여 기존 방법론들과는 차별화된 성능을 나타내었다. 제시된 방법론을 테네시 이스트만 공정에 적용해 봄으로써, 본 연구 내용이 공정 이상의 감지 및 진단에 대한 통합적인 방법론 중에서 가장 우수한 성능을 나타내는 것을 확인할 수 있었다.Contents Abstract i Contents iv List of Tables vii List of Figures ix 1 Introduction 1 1.1 Research Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Markov Random Fields Modelling, Graphical Lasso, and Optimal Structure Learning 10 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Graphical Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 MRF Modelling & Structure Learning . . . . . . . . . . . . . . . . . 19 2.4.1 MRF modelling in process systems . . . . . . . . . . . . . . 19 2.4.2 Structure learning using iterative graphical lasso . . . . . . . 20 2.5 Application of Iterative Graphical Lasso on the TEP . . . . . . . . . . 24 3 Efficient Process Monitoring via the Integrated Use of Markov Random Fields Learning and the Graphical Lasso 31 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 MRF Monitoring Integrated with Graphical Lasso . . . . . . . . . . . 35 3.2.1 Step 1: Iterative graphical lasso . . . . . . . . . . . . . . . . 36 3.2.2 Step 2: MRF monitoring . . . . . . . . . . . . . . . . . . . . 36 3.3 Implementation of Glasso-MRF monitoring to the Tennessee Eastman process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.1 Tennessee Eastman process . . . . . . . . . . . . . . . . . . 41 3.3.2 Glasso-MRF monitoring on TEP . . . . . . . . . . . . . . . . 48 3.3.3 Fault detection accuracy comparison with other monitoring techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3.4 Fault detection speed & fault propagation . . . . . . . . . . . 95 4 Process Fault Diagnosis via Markov Random Fields Learning and Inference 101 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.2.1 Probabilistic graphical models & Markov random fields . . . 106 4.2.2 Kernel belief propagation . . . . . . . . . . . . . . . . . . . . 107 4.3 Fault Diagnosis via MRF Modeling . . . . . . . . . . . . . . . . . . 113 4.3.1 MRF structure learning via graphical lasso . . . . . . . . . . 116 4.3.2 Kernel belief propagation - bandwidth selection . . . . . . . . 116 4.3.3 Conditional contribution evaluation . . . . . . . . . . . . . . 117 4.4 Application Results & Discussion . . . . . . . . . . . . . . . . . . . 118 4.4.1 Two tank process . . . . . . . . . . . . . . . . . . . . . . . . 119 4.4.2 Tennessee Eastman process . . . . . . . . . . . . . . . . . . 137 5 Concluding Remarks 152 Bibliography 157 Nomenclature 169 Abstract (In Korean) 170Docto

SNU Open Repository and Archive