2,327 research outputs found

    Graph Laplacian for Image Anomaly Detection

    Get PDF
    Reed-Xiaoli detector (RXD) is recognized as the benchmark algorithm for image anomaly detection; however, it presents known limitations, namely the dependence over the image following a multivariate Gaussian model, the estimation and inversion of a high-dimensional covariance matrix, and the inability to effectively include spatial awareness in its evaluation. In this work, a novel graph-based solution to the image anomaly detection problem is proposed; leveraging the graph Fourier transform, we are able to overcome some of RXD's limitations while reducing computational cost at the same time. Tests over both hyperspectral and medical images, using both synthetic and real anomalies, prove the proposed technique is able to obtain significant gains over performance by other algorithms in the state of the art.Comment: Published in Machine Vision and Applications (Springer

    Features extraction using random matrix theory.

    Get PDF
    Representing the complex data in a concise and accurate way is a special stage in data mining methodology. Redundant and noisy data affects generalization power of any classification algorithm, undermines the results of any clustering algorithm and finally encumbers the monitoring of large dynamic systems. This work provides several efficient approaches to all aforementioned sides of the analysis. We established, that notable difference can be made, if the results from the theory of ensembles of random matrices are employed. Particularly important result of our study is a discovered family of methods based on projecting the data set on different subsets of the correlation spectrum. Generally, we start with traditional correlation matrix of a given data set. We perform singular value decomposition, and establish boundaries between essential and unimportant eigen-components of the spectrum. Then, depending on the nature of the problem at hand we either use former or later part for the projection purpose. Projecting the spectrum of interest is a common technique in linear and non-linear spectral methods such as Principal Component Analysis, Independent Component Analysis and Kernel Principal Component Analysis. Usually the part of the spectrum to project is defined by the amount of variance of overall data or feature space in non-linear case. The applicability of these spectral methods is limited by the assumption that larger variance has important dynamics, i.e. if the data has a high signal-to-noise ratio. If it is true, projection of principal components targets two problems in data mining, reduction in the number of features and selection of more important features. Our methodology does not make an assumption of high signal-to-noise ratio, instead, using the rigorous instruments of Random Matrix Theory (RNIT) it identifies the presence of noise and establishes its boundaries. The knowledge of the structure of the spectrum gives us possibility to make more insightful projections. For instance, in the application to router network traffic, the reconstruction error procedure for anomaly detection is based on the projection of noisy part of the spectrum. Whereas, in bioinformatics application of clustering the different types of leukemia, implicit denoising of the correlation matrix is achieved by decomposing the spectrum to random and non-random parts. For temporal high dimensional data, spectrum and eigenvectors of its correlation matrix is another representation of the data. Thus, eigenvalues, components of the eigenvectors, inverse participation ratio of eigenvector components and other operators of eigen analysis are spectral features of dynamic system. In our work we proposed to extract spectral features using the RMT. We demonstrated that with extracted spectral features we can monitor the changing dynamics of network traffic. Experimenting with the delayed correlation matrices of network traffic and extracting its spectral features, we visualized the delayed processes in the system. We demonstrated in our work that broad range of applications in feature extraction can benefit from the novel RMT based approach to the spectral representation of the data

    Flexible Time Series Matching for Clinical and Behavioral Data

    Get PDF
    Time Series data became broadly applied by the research community in the last decades after a massive explosion of its availability. Nonetheless, this rise required an improvement in the existing analysis techniques which, in the medical domain, would help specialists to evaluate their patients condition. One of the key tasks in time series analysis is pattern recognition (segmentation and classification). Traditional methods typically perform subsequence matching, making use of a pattern template and a similarity metric to search for similar sequences throughout time series. However, real-world data is noisy and variable (morphological distortions), making a template-based exact matching an elementary approach. Intending to increase flexibility and generalize the pattern searching tasks across domains, this dissertation proposes two Deep Learning-based frameworks to solve pattern segmentation and anomaly detection problems. Regarding pattern segmentation, a Convolution/Deconvolution Neural Network is proposed, learning to distinguish, point-by-point, desired sub-patterns from background content within a time series. The proposed framework was validated in two use-cases: electrocardiogram (ECG) and inertial sensor-based human activity (IMU) signals. It outperformed two conventional matching techniques, being capable of notably detecting the targeted cycles even in noise-corrupted or extremely distorted signals, without using any reference template nor hand-coded similarity scores. Concerning anomaly detection, the proposed unsupervised framework uses the reconstruction ability of Variational Autoencoders and a local similarity score to identify non-labeled abnormalities. The proposal was validated in two public ECG datasets (MITBIH Arrhythmia and ECG5000), performing cardiac arrhythmia identification. Results indicated competitiveness relative to recent techniques, achieving detection AUC scores of 98.84% (ECG5000) and 93.32% (MIT-BIH Arrhythmia).Dados de séries temporais tornaram-se largamente aplicados pela comunidade científica nas últimas decadas após um aumento massivo da sua disponibilidade. Contudo, este aumento exigiu uma melhoria das atuais técnicas de análise que, no domínio clínico, auxiliaria os especialistas na avaliação da condição dos seus pacientes. Um dos principais tipos de análise em séries temporais é o reconhecimento de padrões (segmentação e classificação). Métodos tradicionais assentam, tipicamente, em técnicas de correspondência em subsequências, fazendo uso de um padrão de referência e uma métrica de similaridade para procurar por subsequências similares ao longo de séries temporais. Todavia, dados do mundo real são ruidosos e variáveis (morfologicamente), tornando uma correspondência exata baseada num padrão de referência uma abordagem rudimentar. Pretendendo aumentar a flexibilidade da análise de séries temporais e generalizar tarefas de procura de padrões entre domínios, esta dissertação propõe duas abordagens baseadas em Deep Learning para solucionar problemas de segmentação de padrões e deteção de anomalias. Acerca da segmentação de padrões, a rede neuronal de Convolução/Deconvolução proposta aprende a distinguir, ponto a ponto, sub-padrões pretendidos de conteúdo de fundo numa série temporal. O modelo proposto foi validado em dois casos de uso: sinais eletrocardiográficos (ECG) e de sensores inerciais em atividade humana (IMU). Este superou duas técnicas convencionais, sendo capaz de detetar os ciclos-alvo notavelmente, mesmo em sinais corrompidos por ruído ou extremamente distorcidos, sem o uso de nenhum padrão de referência nem métricas de similaridade codificadas manualmente. A respeito da deteção de anomalias, a técnica não supervisionada proposta usa a capacidade de reconstrução dos Variational Autoencoders e uma métrica de similaridade local para identificar anomalias desconhecidas. A proposta foi validada na identificação de arritmias cardíacas em duas bases de dados públicas de ECG (MIT-BIH Arrhythmia e ECG5000). Os resultados revelam competitividade face a técnicas recentes, alcançando métricas AUC de deteção de 93.32% (MIT-BIH Arrhythmia) e 98.84% (ECG5000)

    Hyperspectral Imagery Target Detection Using Improved Anomaly Detection and Signature Matching Methods

    Get PDF
    This research extends the field of hyperspectral target detection by developing autonomous anomaly detection and signature matching methodologies that reduce false alarms relative to existing benchmark detectors, and are practical for use in an operational environment. The proposed anomaly detection methodology adapts multivariate outlier detection algorithms for use with hyperspectral datasets containing tens of thousands of non-homogeneous, high-dimensional spectral signatures. In so doing, the limitations of existing, non-robust, anomaly detectors are identified, an autonomous clustering methodology is developed to divide an image into homogeneous background materials, and competing multivariate outlier detection methods are evaluated for their ability to uncover hyperspectral anomalies. To arrive at a final detection algorithm, robust parameter design methods are employed to determine parameter settings that achieve good detection performance over a range of hyperspectral images and targets, thereby removing the burden of these decisions from the user. The final anomaly detection algorithm is tested against existing local and global anomaly detectors, and is shown to achieve superior detection accuracy when applied to a diverse set of hyperspectral images. The proposed signature matching methodology employs image-based atmospheric correction techniques in an automated process to transform a target reflectance signature library into a set of image signatures. This set of signatures is combined with an existing linear filter to form a target detector that is shown to perform as well or better relative to detectors that rely on complicated, information-intensive, atmospheric correction schemes. The performance of the proposed methodology is assessed using a range of target materials in both woodland and desert hyperspectral scenes

    Computational Intelligence Techniques for OES Data Analysis

    Get PDF
    Semiconductor manufacturers are forced by market demand to continually deliver lower cost and faster devices. This results in complex industrial processes that, with continuous evolution, aim to improve quality and reduce costs. Plasma etching processes have been identified as a critical part of the production of semiconductor devices. It is therefore important to have good control over plasma etching but this is a challenging task due to the complex physics involved. Optical Emission Spectroscopy (OES) measurements can be collected non-intrusively during wafer processing and are being used more and more in semiconductor manufacturing as they provide real time plasma chemical information. However, the use of OES measurements is challenging due to its complexity, high dimension and the presence of many redundant variables. The development of advanced analysis algorithms for virtual metrology, anomaly detection and variables selection is fundamental in order to effectively use OES measurements in a production process. This thesis focuses on computational intelligence techniques for OES data analysis in semiconductor manufacturing presenting both theoretical results and industrial application studies. To begin with, a spectrum alignment algorithm is developed to align OES measurements from different sensors. Then supervised variables selection algorithms are developed. These are defined as improved versions of the LASSO estimator with the view to selecting a more stable set of variables and better prediction performance in virtual metrology applications. After this, the focus of the thesis moves to the unsupervised variables selection problem. The Forward Selection Component Analysis (FSCA) algorithm is improved with the introduction of computationally efficient implementations and different refinement procedures. Nonlinear extensions of FSCA are also proposed. Finally, the fundamental topic of anomaly detection is investigated and an unsupervised variables selection algorithm tailored to anomaly detection is developed. In addition, it is shown how OES data can be effectively used for semi-supervised anomaly detection in a semiconductor manufacturing process. The developed algorithms open up opportunities for the effective use of OES data for advanced process control. All the developed methodologies require minimal user intervention and provide easy to interpret models. This makes them practical for engineers to use during production for process monitoring and for in-line detection and diagnosis of process issues, thereby resulting in an overall improvement in production performance

    Artificial intelligence for digital twins in energy systems and turbomachinery: development of machine learning frameworks for design, optimization and maintenance

    Get PDF
    The expression Industry4.0 identifies a new industrial paradigm that includes the development of Cyber-Physical Systems (CPS) and Digital Twins promoting the use of Big-Data, Internet of Things (IoT) and Artificial Intelligence (AI) tools. Digital Twins aims to build a dynamic environment in which, with the help of vertical, horizontal and end-to-end integration among industrial processes, smart technologies can communicate and exchange data to analyze and solve production problems, increase productivity and provide cost, time and energy savings. Specifically in the energy systems field, the introduction of AI technologies can lead to significant improvements in both machine design and optimization and maintenance procedures. Over the past decade, data from engineering processes have grown in scale. In fact, the use of more technologically sophisticated sensors and the increase in available computing power have enabled both experimental measurements and highresolution numerical simulations, making available an enormous amount of data on the performance of energy systems. Therefore, to build a Digital Twin model capable of exploring these unorganized data pools collected from massive and heterogeneous resources, new Artificial Intelligence and Machine Learning strategies need to be developed. In light of the exponential growth in the use of smart technologies in manufacturing processes, this thesis aims at enhancing traditional approaches to the design, analysis, and optimization phases of turbomachinery and energy systems, which today are still predominantly based on empirical procedures or computationally intensive CFD-based optimizations. This improvement is made possible by the implementation of Digital Twins models, which, being based primarily on the use of Machine Learning that exploits performance Big-Data collected from energy systems, are acknowledged as crucial technologies to remain competitive in the dynamic energy production landscape. The introduction of Digital Twin models changes the overall structure of design and maintenance approaches and results in modern support tools that facilitate real-time informed decision making. In addition, the introduction of supervised learning algorithms facilitates the exploration of the design space by providing easy-to-run analytical models, which can also be used as cost functions in multi-objective optimization problems, avoiding the need for time-consuming numerical simulations or experimental campaings. Unsupervised learning methods can be applied, for example, to extract new insights from turbomachinery performance data and improve designers’ understanding of blade-flow interaction. Alternatively, Artificial Intelligence frameworks can be developed for Condition-Based Maintenance, allowing the transition from preventive to predictive maintenance. This thesis can be conceptually divided into two parts. The first reviews the state of the art of Cyber-Physical Systems and Digital Twins, highlighting the crucial role of Artificial Intelligence in supporting informed decision making during the design, optimization, and maintenance phases of energy systems. The second part covers the development of Machine Learning strategies to improve the classical approach to turbomachinery design and maintenance strategies for energy systems by exploiting data from numerical simulations, experimental campaigns, and sensor datasets (SCADA). The different Machine Learning approaches adopted include clustering algorithms, regression algorithms and dimensionality reduction techniques: Autoencoder and Principal Component Analysis. A first work shows the potential of unsupervised learning approaches (clustering algorithms) in exploring a Design of Experiment of 76 numerical simulations for turbomachinery design purposes. The second work takes advantage of a nonsequential experimental dataset, measured on a rotating turbine rig characterized by 48 blades divided into 7 sectors that share the same baseline rotor geometry but have different tip designs, to infer and dissect the causal relationship among different tip geometries and unsteady aero-thermodynamic performance via a novel Machine-Learning procedure based on dimensionality reduction techniques. The last application proposes a new anomaly detection framework for gensets in DH networks, based on SCADA data that exploits and compares the performance of regression algorithms such as XGBoost and Multi-layer Perceptron

    Spectral Target Detecting Using Schroedinger Eigenmaps

    Get PDF
    Applications of optical remote sensing processes include environmental monitoring, military monitoring, meteorology, mapping, surveillance, etc. Many of these tasks include the detection of specific objects or materials, usually few or small, which are surrounded by other materials that clutter the scene and hide the relevant information. This target detection process has been boosted lately by the use of hyperspectral imagery (HSI) since its high spectral dimension provides more detailed spectral information that is desirable in data exploitation. Typical spectral target detectors rely on statistical or geometric models to characterize the spectral variability of the data. However, in many cases these parametric models do not fit well HSI data that impacts the detection performance. On the other hand, non-linear transformation methods, mainly based on manifold learning algorithms, have shown a potential use in HSI transformation, dimensionality reduction and classification. In target detection, non-linear transformation algorithms are used as preprocessing techniques that transform the data to a more suitable lower dimensional space, where the statistical or geometric detectors are applied. One of these non-linear manifold methods is the Schroedinger Eigenmaps (SE) algorithm that has been introduced as a technique for semi-supervised classification. The core tool of the SE algorithm is the Schroedinger operator that includes a potential term that encodes prior information about the materials present in a scene, and enables the embedding to be steered in some convenient directions in order to cluster similar pixels together. A completely novel target detection methodology based on SE algorithm is proposed for the first time in this thesis. The proposed methodology does not just include the transformation of the data to a lower dimensional space but also includes the definition of a detector that capitalizes on the theory behind SE. The fact that target pixels and those similar pixels are clustered in a predictable region of the low-dimensional representation is used to define a decision rule that allows one to identify target pixels over the rest of pixels in a given image. In addition, a knowledge propagation scheme is used to combine spectral and spatial information as a means to propagate the \potential constraints to nearby points. The propagation scheme is introduced to reinforce weak connections and improve the separability between most of the target pixels and the background. Experiments using different HSI data sets are carried out in order to test the proposed methodology. The assessment is performed from a quantitative and qualitative point of view, and by comparing the SE-based methodology against two other detection methodologies that use linear/non-linear algorithms as transformations and the well-known Adaptive Coherence/Cosine Estimator (ACE) detector. Overall results show that the SE-based detector outperforms the other two detection methodologies, which indicates the usefulness of the SE transformation in spectral target detection problems
    • …
    corecore