44 research outputs found

    K-Means and Alternative Clustering Methods in Modern Power Systems

    Get PDF
    As power systems evolve by integrating renewable energy sources, distributed generation, and electric vehicles, the complexity of managing these systems increases. With the increase in data accessibility and advancements in computational capabilities, clustering algorithms, including K-means, are becoming essential tools for researchers in analyzing, optimizing, and modernizing power systems. This paper presents a comprehensive review of over 440 articles published through 2022, emphasizing the application of K-means clustering, a widely recognized and frequently used algorithm, along with its alternative clustering methods within modern power systems. The main contributions of this study include a bibliometric analysis to understand the historical development and wide-ranging applications of K-means clustering in power systems. This research also thoroughly examines K-means, its various variants, potential limitations, and advantages. Furthermore, the study explores alternative clustering algorithms that can complete or substitute K-means. Some prominent examples include K-medoids, Time-series K-means, BIRCH, Bayesian clustering, HDBSCAN, CLIQUE, SPECTRAL, SOMs, TICC, and swarm-based methods, broadening the understanding and applications of clustering methodologies in modern power systems. The paper highlights the wide-ranging applications of these techniques, from load forecasting and fault detection to power quality analysis and system security assessment. Throughout the examination, it has been observed that the number of publications employing clustering algorithms within modern power systems is following an exponential upward trend. This emphasizes the necessity for professionals to understand various clustering methods, including their benefits and potential challenges, to incorporate the most suitable ones into their studies

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Design of Machine Learning Algorithms with Applications to Breast Cancer Detection

    Get PDF
    Machine learning is concerned with the design and development of algorithms and techniques that allow computers to 'learn' from experience with respect to some class of tasks and performance measure. One application of machine learning is to improve the accuracy and efficiency of computer-aided diagnosis systems to assist physician, radiologists, cardiologists, neuroscientists, and health-care technologists. This thesis focuses on machine learning and the applications to breast cancer detection. Emphasis is laid on preprocessing of features, pattern classification, and model selection. Before the classification task, feature selection and feature transformation may be performed to reduce the dimensionality of the features and to improve the classification performance. Genetic algorithm (GA) can be employed for feature selection based on different measures of data separability or the estimated risk of a chosen classifier. A separate nonlinear transformation can be performed by applying kernel principal component analysis and kernel partial least squares. Different classifiers are proposed in this work: The SOM-RBF network combines self-organizing maps (SOMs) and radial basis function (RBF) networks, with the RBF centers set as the weight vectors of neurons from the competitive layer of a trained SaM. The pairwise Rayleigh quotient (PRQ) classifier seeks one discriminating boundary by maximizing an unconstrained optimization objective, named as the PRQ criterion, formed with a set of pairwise const~aints instead of individual training samples. The strict 2-surface proximal (S2SP) classifier seeks two proximal planes that are not necessary parallel to fit the distribution of the samples in the original feature space or a kernel-defined feature space, by ma-ximizing two strict optimization objectives with a 'square of sum' optimization factor. Two variations of the support vector data description (SVDD) with negative samples (NSVDD) are proposed by involving different forms of slack vectors, which learn a closed spherically shaped boundary, named as the supervised compact hypersphere (SCH), around a set of samples in the target class. \Ve extend the NSVDDs to solve the multi-class classification problems based on distances between the samples and the centers of the learned SCHs in a kernel-defined feature space, using a combination of linear discriminant analysis and the nearest-neighbor rule. The problem of model selection is studied to pick the best values of the hyperparameters for a parametric classifier. To choose the optimal kernel or regularization parameters of a classifier, we investigate different criteria, such as the validation error estimate and the leave-out-out bound, as well as different optimization methods, such as grid search, gradient descent, and GA. By viewing the tuning problem of the multiple parameters of an 2-norm support vector machine (SVM) as an identification problem of a nonlinear dynamic system, we design a tuning system by employing the extended Kalman filter based on cross validation. Independent kernel optimization based on different measures of data separability are a~so investigated for different kernel-based classifiers. Numerous computer experiments using the benchmark datasets verify the theoretical results, make comparisons among the techniques in measures of classification accuracy or area under the receiver operating characteristics curve. Computational requirements, such as the computing time and the number of hyper-parameters, are also discussed. All of the presented methods are applied to breast cancer detection from fine-needle aspiration and in mammograms, as well as screening of knee-joint vibroarthrographic signals and automatic monitoring of roller bearings with vibration signals. Experimental results demonstrate the excellence of these methods with improved classification performance. For breast cancer detection, instead of only providing a binary diagnostic decision of 'malignant' or 'benign', we propose methods to assign a measure of confidence of malignancy to an individual mass, by calculating probabilities of being benign and malignant with a single classifier or a set of classifiers

    Development of a machine learning based methodology for bridge health monitoring

    Get PDF
    Tesi en modalitat de compendi de publicacionsIn recent years the scientific community has been developing new techniques in structural health monitoring (SHM) to identify the damages in civil structures specially in bridges. The bridge health monitoring (BHM) systems serve to reduce overall life-cycle maintenance costs for bridges, as their main objective is to prevent catastrophic failures and damages. In the BHM using dynamic data, there are several problems related to the post-processing of the vibration signals such as: (i) when the modal-based dynamic features like natural frequencies, modes shape and damping are used, they present a limitation in relation to damage location, since they are based on a global response of the structure; (ii) presence of noise in the measurement of vibration responses; (iii) inadequate use of existing algorithms for damage feature extraction because of neglecting the non-linearity and non-stationarity of the recorded signals; (iv) environmental and operational conditions can also generate false damage detections in bridges; (v) the drawbacks of traditional algorithms for processing large amounts of data obtained from the BHM. This thesis proposes new vibration-based parameters and methods with focus on damage detection, localization and quantification, considering a mixed robust methodology that includes signal processing and machine learning methods to solve the identified problems. The increasing volume of bridge monitoring data makes it interesting to study the ability of advanced tools and systems to extract useful information from dynamic and static variables. In the field of Machine Learning (ML) and Artificial Intelligence (AI), powerful algorithms have been developed to face problems where the amount of data is much larger (big data). The possibilities of ML techniques (unsupervised algorithms) were analyzed here in bridges taking into account both operational and environmental conditions. A critical literature review was performed and a deep study of the accuracy and performance of a set of algorithms for detecting damage in three real bridges and one numerical model. In the literature review inherent to the vibration-based damage detection, several state-of-the-art methods have been studied that do not consider the nature of the data and the characteristics of the applied excitation (possible non-linearity, non-stationarity, presence or absence of environmental and/or operational effects) and the noise level of the sensors. Besides, most research uses modal-based damage characteristics that have some limitations. A poor data normalization is performed by the majority of methods and both operational and environmental variability is not properly accounted for. Likewise, the huge amount of data recorded requires automatic procedures with proven capacity to reduce the possibility of false alarms. On the other hand, many investigations have limitations since only numerical or laboratory cases are studied. Therefore, a methodology is proposed by the combination of several algorithms to avoid them. The conclusions show a robust methodology based on ML algorithms capable to detect, localize and quantify damage. It allows the engineers to verify bridges and anticipate significant structural damage when occurs. Moreover, the proposed non-modal parameters show their feasibility as damage features using ambient and forced vibrations. Hilbert-Huang Transform (HHT) in conjunction with Marginal Hilbert Spectrum and Instantaneous Phase Difference shows a great capability to analyze the nonlinear and nonstationary response signals for damage identification under operational conditions. The proposed strategy combines algorithms for signal processing (ICEEMDAN and HHT) and ML (k-means) to conduct damage detection and localization in bridges by using the traffic-induced vibration data in real-time operation.En los últimos años la comunidad científica ha desarrollado nuevas técnicas en monitoreo de salud estructural (SHM) para identificar los daños en estructuras civiles especialmente en puentes. Los sistemas de monitoreo de puentes (BHM) sirven para reducir los costos generales de mantenimiento del ciclo de vida, ya que su principal objetivo es prevenir daños y fallas catastróficas. En el BHM que utiliza datos dinámicos, existen varios problemas relacionados con el procesamiento posterior de las señales de vibración, tales como: (i) cuando se utilizan características dinámicas modales como frecuencias naturales, formas de modos y amortiguamiento, presentan una limitación en relación con la localización del daño, ya que se basan en una respuesta global de la estructura; (ii) presencia de ruido en la medición de las respuestas de vibración; (iii) uso inadecuado de los algoritmos existentes para la extracción de características de daño debido a la no linealidad y la no estacionariedad de las señales registradas; (iv) las condiciones ambientales y operativas también pueden generar falsas detecciones de daños en los puentes; (v) los inconvenientes de los algoritmos tradicionales para procesar grandes cantidades de datos obtenidos del BHM. Esta tesis propone nuevos parámetros y métodos basados en vibraciones con enfoque en la detección, localización y cuantificación de daños, considerando una metodología robusta que incluye métodos de procesamiento de señales y aprendizaje automático. El creciente volumen de datos de monitoreo de puentes hace que sea interesante estudiar la capacidad de herramientas y sistemas avanzados para extraer información útil de variables dinámicas y estáticas. En el campo del Machine Learning (ML) y la Inteligencia Artificial (IA) se han desarrollado potentes algoritmos para afrontar problemas donde la cantidad de datos es mucho mayor (big data). Aquí se analizaron las posibilidades de las técnicas ML (algoritmos no supervisados) teniendo en cuenta tanto las condiciones operativas como ambientales. Se realizó una revisión crítica de la literatura y se llevó a cabo un estudio profundo de la precisión y el rendimiento de un conjunto de algoritmos para la detección de daños en tres puentes reales y un modelo numérico. En la revisión de literatura se han estudiado varios métodos que no consideran la naturaleza de los datos y las características de la excitación aplicada (posible no linealidad, no estacionariedad, presencia o ausencia de efectos ambientales y/u operativos) y el nivel de ruido de los sensores. Además, la mayoría de las investigaciones utilizan características de daño modales que tienen algunas limitaciones. Estos métodos realizan una normalización deficiente de los datos y no se tiene en cuenta la variabilidad operativa y ambiental. Asimismo, la gran cantidad de datos registrados requiere de procedimientos automáticos para reducir la posibilidad de falsas alarmas. Por otro lado, muchas investigaciones tienen limitaciones ya que solo se estudian casos numéricos o de laboratorio. Por ello, se propone una metodología mediante la combinación de varios algoritmos. Las conclusiones muestran una metodología robusta basada en algoritmos de ML capaces de detectar, localizar y cuantificar daños. Permite a los ingenieros verificar puentes y anticipar daños estructurales. Además, los parámetros no modales propuestos muestran su viabilidad como características de daño utilizando vibraciones ambientales y forzadas. La Transformada de Hilbert-Huang (HHT) junto con el Espectro Marginal de Hilbert y la Diferencia de Fase Instantánea muestran una gran capacidad para analizar las señales de respuesta no lineales y no estacionarias para la identificación de daños en condiciones operativas. La estrategia propuesta combina algoritmos para el procesamiento de señales (ICEEMDAN y HHT) y ML (k-means) para detectar y localizar daños en puentes mediante el uso de datos de vibraciones inducidas por el tráfico en tiempo real.Postprint (published version

    Effective Fault Diagnosis in Chemical Plants By Integrating Multiple Methodologies

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Data driven methods for updating fault detection and diagnosis system in chemical processes

    Get PDF
    Modern industrial processes are becoming more complex, and consequently monitoring them has become a challenging task. Fault Detection and Diagnosis (FDD) as a key element of process monitoring, needs to be investigated because of its essential role in decision making processes. Among available FDD methods, data driven approaches are currently receiving increasing attention because of their relative simplicity in implementation. Regardless of FDD types, one of the main traits of reliable FDD systems is their ability of being updated while new conditions that were not considered at their initial training appear in the process. These new conditions would emerge either gradually or abruptly, but they have the same level of importance as in both cases they lead to FDD poor performance. For addressing updating tasks, some methods have been proposed, but mainly not in research area of chemical engineering. They could be categorized to those that are dedicated to managing Concept Drift (CD) (that appear gradually), and those that deal with novel classes (that appear abruptly). The available methods, mainly, in addition to the lack of clear strategies for updating, suffer from performance weaknesses and inefficient required time of training, as reported. Accordingly, this thesis is mainly dedicated to data driven FDD updating in chemical processes. The proposed schemes for handling novel classes of faults are based on unsupervised methods, while for coping with CD both supervised and unsupervised updating frameworks have been investigated. Furthermore, for enhancing the functionality of FDD systems, some major methods of data processing, including imputation of missing values, feature selection, and feature extension have been investigated. The suggested algorithms and frameworks for FDD updating have been evaluated through different benchmarks and scenarios. As a part of the results, the suggested algorithms for supervised handling CD surpass the performance of the traditional incremental learning in regard to MGM score (defined dimensionless score based on weighted F1 score and training time) even up to 50% improvement. This improvement is achieved by proposed algorithms that detect and forget redundant information as well as properly adjusting the data window for timely updating and retraining the fault detection system. Moreover, the proposed unsupervised FDD updating framework for dealing with novel faults in static and dynamic process conditions achieves up to 90% in terms of the NPP score (defined dimensionless score based on number of the correct predicted class of samples). This result relies on an innovative framework that is able to assign samples either to new classes or to available classes by exploiting one class classification techniques and clustering approaches.Los procesos industriales modernos son cada vez más complejos y, en consecuencia, su control se ha convertido en una tarea desafiante. La detección y el diagnóstico de fallos (FDD), como un elemento clave de la supervisión del proceso, deben ser investigados debido a su papel esencial en los procesos de toma de decisiones. Entre los métodos disponibles de FDD, los enfoques basados en datos están recibiendo una atención creciente debido a su relativa simplicidad en la implementación. Independientemente de los tipos de FDD, una de las principales características de los sistemas FDD confiables es su capacidad de actualización, mientras que las nuevas condiciones que no fueron consideradas en su entrenamiento inicial, ahora aparecen en el proceso. Estas nuevas condiciones pueden surgir de forma gradual o abrupta, pero tienen el mismo nivel de importancia ya que en ambos casos conducen al bajo rendimiento de FDD. Para abordar las tareas de actualización, se han propuesto algunos métodos, pero no mayoritariamente en el área de investigación de la ingeniería química. Podrían ser categorizados en los que están dedicados a manejar Concept Drift (CD) (que aparecen gradualmente), y a los que tratan con clases nuevas (que aparecen abruptamente). Los métodos disponibles, además de la falta de estrategias claras para la actualización, sufren debilidades en su funcionamiento y de un tiempo de capacitación ineficiente, como se ha referenciado. En consecuencia, esta tesis está dedicada principalmente a la actualización de FDD impulsada por datos en procesos químicos. Los esquemas propuestos para manejar nuevas clases de fallos se basan en métodos no supervisados, mientras que para hacer frente a la CD se han investigado los marcos de actualización supervisados y no supervisados. Además, para mejorar la funcionalidad de los sistemas FDD, se han investigado algunos de los principales métodos de procesamiento de datos, incluida la imputación de valores perdidos, la selección de características y la extensión de características. Los algoritmos y marcos sugeridos para la actualización de FDD han sido evaluados a través de diferentes puntos de referencia y escenarios. Como parte de los resultados, los algoritmos sugeridos para el CD de manejo supervisado superan el rendimiento del aprendizaje incremental tradicional con respecto al puntaje MGM (puntuación adimensional definida basada en el puntaje F1 ponderado y el tiempo de entrenamiento) hasta en un 50% de mejora. Esta mejora se logra mediante los algoritmos propuestos que detectan y olvidan la información redundante, así como ajustan correctamente la ventana de datos para la actualización oportuna y el reciclaje del sistema de detección de fallas. Además, el marco de actualización FDD no supervisado propuesto para tratar fallas nuevas en condiciones de proceso estáticas y dinámicas logra hasta 90% en términos de la puntuación de NPP (puntuación adimensional definida basada en el número de la clase de muestras correcta predicha). Este resultado se basa en un marco innovador que puede asignar muestras a clases nuevas o a clases disponibles explotando una clase de técnicas de clasificación y enfoques de agrupamientoPostprint (published version

    FAULT DETECTION AND PREDICTION IN ELECTROMECHANICAL SYSTEMS VIA THE DISCRETIZED STATE VECTOR-BASED PATTERN ANALYSIS OF MULTI-SENSOR SIGNALS

    Get PDF
    Department of System Design and Control EngineeringIn recent decades, operation and maintenance strategies for industrial applications have evolved from corrective maintenance and preventive maintenance, to condition-based monitoring and eventually predictive maintenance. High performance sensors and data logging technologies have enabled us to monitor the operational states of systems and predict fault occurrences. Several time series analysis methods have been proposed in the literature to classify system states via multi-sensor signals. Since the time series of sensor signals is often characterized as very-short, intermittent, transient, highly nonlinear, and non-stationary random signals, they make time series analyses more complex. Therefore, time series discretization has been popularly applied to extract meaningful features from original complex signals. There are several important issues to be addressed in discretization for fault detection and prediction: (i) What is the fault pattern that represents a system???s faulty states, (ii) How can we effectively search for fault patterns, (iii) What is a symptom pattern to predict fault occurrences, and (iv) What is a systematic procedure for online fault detection and prediction. In this regard, this study proposes a fault detection and prediction framework that consists of (i) definition of system???s operational states, (ii) definitions of fault and symptom patterns, (iii) multivariate discretization, (iv) severity and criticality analyses, and (v) online detection and prediction procedures. Given the time markers of fault occurrences, we can divide a system???s operational states into fault and no-fault states. We postulate that a symptom state precedes the occurrence of a fault within a certain time period and hence a no-fault state consists of normal and symptom states. Fault patterns are therefore found only in fault states, whereas symptom patterns are either only found in the system???s symptom states (being absent in the normal states) or not found in the given time series, but similar to fault patterns. To determine the length of a symptom state, we present a symptom pattern-based iterative search method. In order to identify the distinctive behaviors of multi-sensor signals, we propose a multivariate discretization approach that consists mainly of label definition, label specification, and event codification. Discretization parameters are delicately controlled by considering the key characteristics of multi-sensor signals. We discuss how to measure the severity degrees of fault and symptom patterns, and how to assess the criticalities of fault states. We apply the fault and symptom pattern extraction and severity assessment methods to online fault detection and prediction. Finally, we demonstrate the performance of the proposed framework through the following six case studies: abnormal cylinder temperature in a marine diesel engine, automotive gasoline engine knockings, laser weld defects, buzz, squeak, and rattle (BSR) noises from a car door trim (using a typical acoustic sensor array and using acoustic emission sensors respectively), and visual stimuli cognition tests by the P300 experiment.ope
    corecore