62 research outputs found
A Survey on Unsupervised Anomaly Detection Algorithms for Industrial Images
In line with the development of Industry 4.0, surface defect
detection/anomaly detection becomes a topical subject in the industry field.
Improving efficiency as well as saving labor costs has steadily become a matter
of great concern in practice, where deep learning-based algorithms perform
better than traditional vision inspection methods in recent years. While
existing deep learning-based algorithms are biased towards supervised learning,
which not only necessitates a huge amount of labeled data and human labor, but
also brings about inefficiency and limitations. In contrast, recent research
shows that unsupervised learning has great potential in tackling the above
disadvantages for visual industrial anomaly detection. In this survey, we
summarize current challenges and provide a thorough overview of recently
proposed unsupervised algorithms for visual industrial anomaly detection
covering five categories, whose innovation points and frameworks are described
in detail. Meanwhile, publicly available datasets for industrial anomaly
detection are introduced. By comparing different classes of methods, the
advantages and disadvantages of anomaly detection algorithms are summarized.
Based on the current research framework, we point out the core issue that
remains to be resolved and provide further improvement directions. Meanwhile,
based on the latest technological trends, we offer insights into future
research directions. It is expected to assist both the research community and
industry in developing a broader and cross-domain perspective
Recommended from our members
One-class Classification: An Approach to Handle Class Imbalance in Multimodal Biometric Authentication
Biometric verification is the process of authenticating a person‟s identity using his/her physiological and behavioural characteristics. It is well-known that multimodal biometric systems can further improve the authentication accuracy by combining information from multiple biometric traits at various levels, namely sensor, feature, match score and decision levels. Fusion at match score level is generally preferred due to the trade-off between information availability and fusion complexity. However, combining match scores poses a number of challenges, when treated as a two-class classification problem due to the highly imbalanced class distributions. Most conventional classifiers assume equally balanced classes. They do not work well when samples of one class vastly outnumber the samples of the other class. These challenges become even more significant, when the fusion is based on user-specific processing due to the limited availability of the genuine samples per user. This thesis aims at exploring the paradigm of one-class classification to advance the classification performance of imbalanced biometric data sets. The contributions of the research can be enumerated as follows.
Firstly, a thorough investigation of the various one-class classifiers, including Gaussian Mixture Model, k-Nearest Neighbour, K-means clustering and Support Vector Data Description, has been provided. These classifiers are applied in learning the user-specific and user-independent descriptions for the biometric decision inference. It is demonstrated that the one-class classifiers are particularly useful in handling the imbalanced learning problem in multimodal biometric authentication. User-specific approach is a better alternative with respect to user-independent counterpart because it is able to overcome the so-called within-class sub-concepts problem, which arises very often in multimodal biometric systems due to the existence of user variation.
Secondly, a novel adapted score fusion scheme that consists of one-class classifiers and is trained using both the genuine user and impostor samples has been proposed. This method also replaces user-independent by user-specific description to learn the characteristics of the impostor class, and thus, reducing the degree of imbalanced proportion of data for different classes. Extensive experiments are conducted on the BioSecure DS2 and XM2VTS databases to illustrate the potential of the proposed adapted score fusion scheme, which provides a relative improvement in terms of Equal Error Rate of 32% and 20% as compared to the standard sum of scores and likelihood ratio based score fusion, respectively.
Thirdly, a hybrid boosting algorithm, called r-ABOC has been developed, which is capable of exploiting the natural capabilities of both the well-known Real AdaBoost and one-class classification to further improve the system performance without causing overfitting. However, unlike the conventional Real AdaBoost, the individual classifiers in the proposed schema are trained on the same data set, but with different parameter choices. This does not only generate a high diversity, which is vital to the success of r-ABOC, but also reduces the number of user-specified parameters. A comprehensive empirical study using the BioSecure DS2 and XM2VTS databases demonstrates that r-ABOC may achieve a performance gain in terms of Half Total Error Rate of up to 28% with respect to other state-of-the-art biometric score fusion techniques.
Finally, a Robust Imputation based on Group Method of Data Handling (RIBG) has been proposed to handle the missing data problem in the BioSecure DS2 database. RIBG is able to provide accurate predictions of incomplete score vectors. It is observed to achieve a better performance with respect to the state-of-the-art imputation techniques, including mean, median and k-NN imputations. An important feature of RIBG is that it does not require any parameter fine-tuning, and hence, is amendable to immediate applications
Data-Driven Fault Detection and Reasoning for Industrial Monitoring
This open access book assesses the potential of data-driven methods in industrial process monitoring engineering. The process modeling, fault detection, classification, isolation, and reasoning are studied in detail. These methods can be used to improve the safety and reliability of industrial processes. Fault diagnosis, including fault detection and reasoning, has attracted engineers and scientists from various fields such as control, machinery, mathematics, and automation engineering. Combining the diagnosis algorithms and application cases, this book establishes a basic framework for this topic and implements various statistical analysis methods for process monitoring. This book is intended for senior undergraduate and graduate students who are interested in fault diagnosis technology, researchers investigating automation and industrial security, professional practitioners and engineers working on engineering modeling and data processing applications. This is an open access book
Development of unsupervised learning methods with applications to life sciences data
Machine Learning makes computers capable of performing tasks typically requiring human intelligence. A domain where it is having a considerable impact is the life sciences, allowing to devise new biological analysis protocols, develop patients’ treatments efficiently and faster, and reduce healthcare costs. This Thesis work presents new Machine Learning methods and pipelines for the life sciences focusing on the unsupervised field.
At a methodological level, two methods are presented. The first is an “Ab Initio Local Principal Path” and it is a revised and improved version of a pre-existing algorithm in the manifold learning realm. The second contribution is an improvement over the Import Vector Domain Description (one-class learning) through the Kullback-Leibler divergence. It hybridizes kernel methods to Deep Learning obtaining a scalable solution, an improved probabilistic model, and state-of-the-art performances. Both methods are tested through several experiments, with a central focus on their relevance in life sciences. Results show that they improve the performances achieved by their previous versions.
At the applicative level, two pipelines are presented. The first one is for the analysis of RNA-Seq datasets, both transcriptomic and single-cell data, and is aimed at identifying genes that may be involved in biological processes (e.g., the transition of tissues from normal to cancer). In this project, an R package is released on CRAN to make the pipeline accessible to the bioinformatic Community through high-level APIs. The second pipeline is in the drug discovery domain and is useful for identifying druggable pockets, namely regions of a protein with a high probability of accepting a small molecule (a drug). Both these pipelines achieve remarkable results.
Lastly, a detour application is developed to identify the strengths/limitations of the “Principal Path” algorithm by analyzing Convolutional Neural Networks induced vector spaces. This application is conducted in the music and visual arts domains
Cyber Security and Critical Infrastructures 2nd Volume
The second volume of the book contains the manuscripts that were accepted for publication in the MDPI Special Topic "Cyber Security and Critical Infrastructure" after a rigorous peer-review process. Authors from academia, government and industry contributed their innovative solutions, consistent with the interdisciplinary nature of cybersecurity. The book contains 16 articles, including an editorial that explains the current challenges, innovative solutions and real-world experiences that include critical infrastructure and 15 original papers that present state-of-the-art innovative solutions to attacks on critical systems
Credit Scoring Using Machine Learning
For financial institutions and the economy at large, the role of credit scoring in lending decisions cannot be overemphasised. An accurate and well-performing credit scorecard allows lenders to control their risk exposure through the selective allocation of credit based on the statistical analysis of historical customer data. This thesis identifies and investigates a number of specific challenges that occur during the development of credit scorecards. Four main contributions are made in this thesis. First, we examine the performance of a number supervised classification techniques on a collection of imbalanced credit scoring datasets. Class imbalance occurs when there are significantly fewer examples in one or more classes in a dataset compared to the remaining classes. We demonstrate that oversampling the minority class leads to no overall improvement to the best performing classifiers. We find that, in contrast, adjusting the threshold on classifier output yields, in many cases, an improvement in classification performance. Our second contribution investigates a particularly severe form of class imbalance, which, in credit scoring, is referred to as the low-default portfolio problem. To address this issue, we compare the performance of a number of semi-supervised classification algorithms with that of logistic regression. Based on the detailed comparison of classifier performance, we conclude that both approaches merit consideration when dealing with low-default portfolios. Third, we quantify the differences in classifier performance arising from various implementations of a real-world behavioural scoring dataset. Due to commercial sensitivities surrounding the use of behavioural scoring data, very few empirical studies which directly address this topic are published. This thesis describes the quantitative comparison of a range of dataset parameters impacting classification performance, including: (i) varying durations of historical customer behaviour for model training; (ii) different lengths of time from which a borrower’s class label is defined; and (iii) using alternative approaches to define a customer’s default status in behavioural scoring. Finally, this thesis demonstrates how artificial data may be used to overcome the difficulties associated with obtaining and using real-world data. The limitations of artificial data, in terms of its usefulness in evaluating classification performance, are also highlighted. In this work, we are interested in generating artificial data, for credit scoring, in the absence of any available real-world data
Data-Driven Fault Detection and Reasoning for Industrial Monitoring
This open access book assesses the potential of data-driven methods in industrial process monitoring engineering. The process modeling, fault detection, classification, isolation, and reasoning are studied in detail. These methods can be used to improve the safety and reliability of industrial processes. Fault diagnosis, including fault detection and reasoning, has attracted engineers and scientists from various fields such as control, machinery, mathematics, and automation engineering. Combining the diagnosis algorithms and application cases, this book establishes a basic framework for this topic and implements various statistical analysis methods for process monitoring. This book is intended for senior undergraduate and graduate students who are interested in fault diagnosis technology, researchers investigating automation and industrial security, professional practitioners and engineers working on engineering modeling and data processing applications. This is an open access book
Recommended from our members
A Benchmarking Study of Unsupervised Anomaly Detection Algorithms
It is common practice in the unsupervised anomaly detection literature to create experimental benchmarks by sampling from existing supervised learning datasets. We seek to improve this practice by identifying four dimensions important to real-world anomaly detection applications --- point difficulty, clusteredness of anomalies, relevance of features, and relative frequency of anomalies --- and then proposing how to simulate and control these factors when sampling points during benchmark creation. We apply this methodology to produce a large corpus of unsupervised anomaly detection benchmarks and then evaluate several state-of-the-art anomaly detection algorithms against this corpus. Our final analysis not only compares the performance of these algorithms across a large variety of problems, but it also assesses the impact of our identified problems dimensions on experimental outcomes
- …