72 research outputs found

    Machine learning methods for the characterization and classification of complex data

    Get PDF
    This thesis work presents novel methods for the analysis and classification of medical images and, more generally, complex data. First, an unsupervised machine learning method is proposed to order anterior chamber OCT (Optical Coherence Tomography) images according to a patient's risk of developing angle-closure glaucoma. In a second study, two outlier finding techniques are proposed to improve the results of above mentioned machine learning algorithm, we also show that they are applicable to a wide variety of data, including fraud detection in credit card transactions. In a third study, the topology of the vascular network of the retina, considering it a complex tree-like network is analyzed and we show that structural differences reveal the presence of glaucoma and diabetic retinopathy. In a fourth study we use a model of a laser with optical injection that presents extreme events in its intensity time-series to evaluate machine learning methods to forecast such extreme events.El presente trabajo de tesis desarrolla nuevos métodos para el análisis y clasificación de imágenes médicas y datos complejos en general. Primero, proponemos un método de aprendizaje automático sin supervisión que ordena imágenes OCT (tomografía de coherencia óptica) de la cámara anterior del ojo en función del grado de riesgo del paciente de padecer glaucoma de ángulo cerrado. Luego, desarrollamos dos métodos de detección automática de anomalías que utilizamos para mejorar los resultados del algoritmo anterior, pero que su aplicabilidad va mucho más allá, siendo útil, incluso, para la detección automática de fraudes en transacciones de tarjetas de crédito. Mostramos también, cómo al analizar la topología de la red vascular de la retina considerándola una red compleja, podemos detectar la presencia de glaucoma y de retinopatía diabética a través de diferencias estructurales. Estudiamos también un modelo de un láser con inyección óptica que presenta eventos extremos en la serie temporal de intensidad para evaluar diferentes métodos de aprendizaje automático para predecir dichos eventos extremos.Aquesta tesi desenvolupa nous mètodes per a l’anàlisi i la classificació d’imatges mèdiques i dades complexes. Hem proposat, primer, un mètode d’aprenentatge automàtic sense supervisió que ordena imatges OCT (tomografia de coherència òptica) de la cambra anterior de l’ull en funció del grau de risc del pacient de patir glaucoma d’angle tancat. Després, hem desenvolupat dos mètodes de detecció automàtica d’anomalies que hem utilitzat per millorar els resultats de l’algoritme anterior, però que la seva aplicabilitat va molt més enllà, sent útil, fins i tot, per a la detecció automàtica de fraus en transaccions de targetes de crèdit. Mostrem també, com en analitzar la topologia de la xarxa vascular de la retina considerant-la una xarxa complexa, podem detectar la presència de glaucoma i de retinopatia diabètica a través de diferències estructurals. Finalment, hem estudiat un làser amb injecció òptica, el qual presenta esdeveniments extrems en la sèrie temporal d’intensitat. Hem avaluat diferents mètodes per tal de predir-los.Postprint (published version

    Exploring Prognostic and Diagnostic Techniques for Jet Engine Health Monitoring: A Review of Degradation Mechanisms and Advanced Prediction Strategies

    Get PDF
    Maintenance is crucial for aircraft engines because of the demanding conditions to which they are exposed during operation. A proper maintenance plan is essential for ensuring safe flights and prolonging the life of the engines. It also plays a major role in managing costs for aeronautical companies. Various forms of degradation can affect different engine components. To optimize cost management, modern maintenance plans utilize diagnostic and prognostic techniques, such as Engine Health Monitoring (EHM), which assesses the health of the engine based on monitored parameters. In recent years, various EHM systems have been developed utilizing computational techniques. These algorithms are often enhanced by utilizing data reduction and noise filtering tools, which help to minimize computational time and efforts, and to improve performance by reducing noise from sensor data. This paper discusses the various mechanisms that lead to the degradation of aircraft engine components and the impact on engine performance. Additionally, it provides an overview of the most commonly used data reduction and diagnostic and prognostic techniques

    Condition Monitoring and Fault Diagnosis of Roller Element Bearing

    Get PDF
    Rolling element bearings play a crucial role in determining the overall health condition of a rotating machine. An effective condition-monitoring program on bearing operation can improve a machine’s operation efficiency, reduce the maintenance/replacement cost, and prolong the useful lifespan of a machine. This chapter presents a general overview of various condition-monitoring and fault diagnosis techniques for rolling element bearings in the current practice and discusses the pros and cons of each technique. The techniques introduced in the chapter include data acquisition techniques, major parameters used for bearing condition monitoring, signal analysis techniques, and bearing fault diagnosis techniques using either statistical features or artificial intelligent tools. Several case studies are also presented in the chapter to exemplify the application of these techniques in the data analysis as well as bearing fault diagnosis and pattern recognition

    On the relevance of preprocessing in predictive maintenance for dynamic systems

    Get PDF
    The complexity involved in the process of real-time data-driven monitoring dynamic systems for predicted maintenance is usually huge. With more or less in-depth any data-driven approach is sensitive to data preprocessing, understood as any data treatment prior to the application of the monitoring model, being sometimes crucial for the final development of the employed monitoring technique. The aim of this work is to quantify the sensitiveness of data-driven predictive maintenance models in dynamic systems in an exhaustive way. We consider a couple of predictive maintenance scenarios, each of them defined by some public available data. For each scenario, we consider its properties and apply several techniques for each of the successive preprocessing steps, e.g. data cleaning, missing values treatment, outlier detection, feature selection, or imbalance compensation. The pretreatment configurations, i.e. sequential combinations of techniques from different preprocessing steps, are considered together with different monitoring approaches, in order to determine the relevance of data preprocessing for predictive maintenance in dynamical systems

    Principal Component Analysis

    Get PDF
    This book is aimed at raising awareness of researchers, scientists and engineers on the benefits of Principal Component Analysis (PCA) in data analysis. In this book, the reader will find the applications of PCA in fields such as image processing, biometric, face recognition and speech processing. It also includes the core concepts and the state-of-the-art methods in data analysis and feature extraction

    Geometry- and Accuracy-Preserving Random Forest Proximities with Applications

    Get PDF
    Many machine learning algorithms use calculated distances or similarities between data observations to make predictions, cluster similar data, visualize patterns, or generally explore the data. Most distances or similarity measures do not incorporate known data labels and are thus considered unsupervised. Supervised methods for measuring distance exist which incorporate data labels and thereby exaggerate separation between data points of different classes. This approach tends to distort the natural structure of the data. Instead of following similar approaches, we leverage a popular algorithm used for making data-driven predictions, known as random forests, to naturally incorporate data labels into similarity measures known as random forest proximities. In this dissertation, we explore previously defined random forest proximities and demonstrate their weaknesses in popular proximity-based applications. Additionally, we develop a new proximity definition that can be used to recreate the random forest’s predictions. We call these random forest-geometry-and accuracy-Preserving proximities or RF-GAP. We show by proof and empirical demonstration can be used to perfectly reconstruct the random forest’s predictions and, as a result, we argue that RF-GAP proximities provide a truer representation of the random forest’s learning when used in proximity-based applications. We provide evidence to suggest that RF-GAP proximities improve applications including imputing missing data, detecting outliers, and visualizing the data. We also introduce a new random forest proximity-based technique that can be used to generate 2- or 3-dimensional data representations which can be used as a tool to visually explore the data. We show that this method does well at portraying the relationship between data variables and the data labels. We show quantitatively and qualitatively that this method surpasses other existing methods for this task

    FlaKat: A Machine Learning-Based Categorization Framework for Flaky Tests

    Get PDF
    Flaky tests can pass or fail non-deterministically, without alterations to a software system. Such tests are frequently encountered by developers and hinder the credibility of test suites. Thus, flaky tests have caught the attention of researchers in recent years. Numerous approaches have been published on defining, locating, and categorizing flaky tests, along with auto-repairing strategies for specific types of flakiness. Practitioners have developed several techniques to detect flaky tests automatically. The most traditional approaches adopt repeated execution of test suites accompanied by techniques such as shuffled execution order, and random distortion of environment. State-of-the-art research also incorporates machine learning solutions into flaky test detection and achieves reasonably good accuracy. Moreover, strategies for repairing flaky tests have also been published for specific flaky test categories and the process has been automated as well. However, there is a research gap between flaky test detection and category-specific flakiness repair. To address the aforementioned gap, this thesis proposes a novel categorization framework, called FlaKat, which uses machine-learning classifiers for fast and accurate categorization of a given flaky test case. FlaKat first parses and converts raw flaky tests into vector embeddings. The dimensionality of embeddings is reduced and then used for training machine learning classifiers. Sampling techniques are applied to address the imbalance between flaky test categories in the dataset. The evaluation of FlaKat was conducted to determine its performance with different combinations of configurations using known flaky tests from 108 open-source Java projects. Notably, Implementation-Dependent and Order-Dependent flaky tests, which represent almost 75% of the total dataset, achieved F1 scores (harmonic mean of precision and recall) of 0.94 and 0.90 respectively while the overall macro average (no weight difference between categories) is at 0.67. This research work also proposes a new evaluation metric, called Flakiness Detection Capacity (FDC), for measuring the accuracy of classifiers from the perspective of information theory and provides proof for its effectiveness. The final obtained results for FDC also aligns with F1 score regarding which classifier yields the best flakiness classification

    Contribution to supervised representation learning: algorithms and applications.

    Get PDF
    278 p.In this thesis, we focus on supervised learning methods for pattern categorization. In this context, itremains a major challenge to establish efficient relationships between the discriminant properties of theextracted features and the inter-class sparsity structure.Our first attempt to address this problem was to develop a method called "Robust Discriminant Analysiswith Feature Selection and Inter-class Sparsity" (RDA_FSIS). This method performs feature selectionand extraction simultaneously. The targeted projection transformation focuses on the most discriminativeoriginal features while guaranteeing that the extracted (or transformed) features belonging to the sameclass share a common sparse structure, which contributes to small intra-class distances.In a further study on this approach, some improvements have been introduced in terms of theoptimization criterion and the applied optimization process. In fact, we proposed an improved version ofthe original RDA_FSIS called "Enhanced Discriminant Analysis with Class Sparsity using GradientMethod" (EDA_CS). The basic improvement is twofold: on the first hand, in the alternatingoptimization, we update the linear transformation and tune it with the gradient descent method, resultingin a more efficient and less complex solution than the closed form adopted in RDA_FSIS.On the other hand, the method could be used as a fine-tuning technique for many feature extractionmethods. The main feature of this approach lies in the fact that it is a gradient descent based refinementapplied to a closed form solution. This makes it suitable for combining several extraction methods andcan thus improve the performance of the classification process.In accordance with the above methods, we proposed a hybrid linear feature extraction scheme called"feature extraction using gradient descent with hybrid initialization" (FE_GD_HI). This method, basedon a unified criterion, was able to take advantage of several powerful linear discriminant methods. Thelinear transformation is computed using a descent gradient method. The strength of this approach is thatit is generic in the sense that it allows fine tuning of the hybrid solution provided by different methods.Finally, we proposed a new efficient ensemble learning approach that aims to estimate an improved datarepresentation. The proposed method is called "ICS Based Ensemble Learning for Image Classification"(EM_ICS). Instead of using multiple classifiers on the transformed features, we aim to estimate multipleextracted feature subsets. These were obtained by multiple learned linear embeddings. Multiple featuresubsets were used to estimate the transformations, which were ranked using multiple feature selectiontechniques. The derived extracted feature subsets were concatenated into a single data representationvector with strong discriminative properties.Experiments conducted on various benchmark datasets ranging from face images, handwritten digitimages, object images to text datasets showed promising results that outperformed the existing state-ofthe-art and competing methods

    Approach for Improved Signal-Based Fault Diagnosis of Hot Rolling Mills

    Get PDF
    Der hier vorgestellte Ansatz ist in der Lage, zwei spezifische schwere Fehler zu erkennen, sie zu identifizieren, zwischen vier verschiedenen Systemzuständen zu unterscheiden und eine Prognose bezüglich des Systemverhaltens zu geben. Die vorliegende Arbeit untersucht die Zustandsüberwachung des komplexen Herstellungsprozesses eines Warmbandwalzwerks. Eine signalbasierte Fehlerdiagnose und ein Fehlerprognoseansatz für den Bandlauf werden entwickelt. Eine Literaturübersicht gibt einen Überblick über die bisherige Forschung zu verwandten Themen. Es wird gezeigt, dass die große Anzahl vorheriger Arbeiten diese Thematik nicht gelöst hat und dass weitere Untersuchungen erforderlich sind, um eine zufriedenstellende Lösung der behandelten Probleme zu erhalten. Die Entwicklung einer neuen Signalverarbeitungskette und die Signalverarbeitungsschritte sind detailliert dargestellt. Die Klassifikationsaufgabe wird in Fehlerdiagnose, Fehleridentifikation und Fehlerprognose differenziert. Der vorgeschlagene Ansatz kombiniert fünf verschiedene Methoden zur Merkmalsextraktion, nämlich Short-Time Fourier Transformation, kontinuierliche Wavelet Transformation, diskrete Wavelet Transformation, Wigner-Ville Distribution und Empirical Mode Decomposition, mit zwei verschiedenen Klassifikationsalgorithmen, nämlich Support-Vektor Maschine und eine Variation der Kreuzkorrelation, wobei letztere in dieser Arbeit entwickelt wurde. Kombinationen dieser Merkmalsextraktion und Klassifikationsverfahren werden an Walzkraft-Daten aus einer Warmbreitbandstraße angewendet.The approach introduced here is able to detect two specific severe faults, to identify them, to distinguish between four different system states, and to give a prognosis on the system behavior. The presented work investigates the condition monitoring of the complex production process of a hot strip rolling mill. A signal-based fault diagnosis and fault prognosis approach for strip travel is developed. A literature review gives an overview about previous research on related topics. It is shown that the great amount of previous work does not cope with the problems treated in this work and that further investigation is necessary to provide a satisfactory solution. The design of a new signal processing chain is presented and the signal processing steps are detailed. The classification task is differentiated into fault detection, fault identification and fault prognosis. The proposed approach combines five different methods for feature extraction, namely short time Fourier transform, continuous wavelet transform, discrete wavelet transform, Wigner-Ville distribution, and empirical mode decomposition, with two different classification algorithms, namely support vector machine and a variation of cross-correlation, the latter developed in this work. Combinations of these feature extraction and classification methods are applied to rolling force data originating from a hot strip mill
    • …
    corecore