178 research outputs found

    One-class classifiers based on entropic spanning graphs

    Get PDF
    One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the α\alpha-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

    Age Sensitivity of Face Recognition Algorithms

    Get PDF
    This paper investigates the performance degradation of facial recognition systems due to the influence of age. A comparative analysis of verification performance is conducted for four subspace projection techniques combined with four different distance metrics. The experimental results based on a subset of the MORPH-II database show that the choice of subspace projection technique and associated distance metric can have a significant impact on the performance of the face recognition system for particular age groups

    Unsupervised multiple kernel learning approaches for integrating molecular cancer patient data

    Get PDF
    Cancer is the second leading cause of death worldwide. A characteristic of this disease is its complexity leading to a wide variety of genetic and molecular aberrations in the tumors. This heterogeneity necessitates personalized therapies for the patients. However, currently defined cancer subtypes used in clinical practice for treatment decision-making are based on relatively few selected markers and thus provide only a coarse classifcation of tumors. The increased availability in multi-omics data measured for cancer patients now offers the possibility of defining more informed cancer subtypes. Such a more fine-grained characterization of cancer subtypes harbors the potential of substantially expanding treatment options in personalized cancer therapy. In this thesis, we identify comprehensive cancer subtypes using multidimensional data. For this purpose, we apply and extend unsupervised multiple kernel learning methods. Three challenges of unsupervised multiple kernel learning are addressed: robustness, applicability, and interpretability. First, we show that regularization of the multiple kernel graph embedding framework, which enables the implementation of dimensionality reduction techniques, can increase the stability of the resulting patient subgroups. This improvement is especially beneficial for data sets with a small number of samples. Second, we adapt the objective function of kernel principal component analysis to enable the application of multiple kernel learning in combination with this widely used dimensionality reduction technique. Third, we improve the interpretability of kernel learning procedures by performing feature clustering prior to integrating the data via multiple kernel learning. On the basis of these clusters, we derive a score indicating the impact of a feature cluster on a patient cluster, thereby facilitating further analysis of the cluster-specific biological properties. All three procedures are successfully tested on real-world cancer data. Comparing our newly derived methodologies to established methods provides evidence that our work offers novel and beneficial ways of identifying patient subgroups and gaining insights into medically relevant characteristics of cancer subtypes.Krebs ist eine der häufigsten Todesursachen weltweit. Krebs ist gekennzeichnet durch seine Komplexität, die zu vielen verschiedenen genetischen und molekularen Aberrationen im Tumor führt. Die Unterschiede zwischen Tumoren erfordern personalisierte Therapien für die einzelnen Patienten. Die Krebssubtypen, die derzeit zur Behandlungsplanung in der klinischen Praxis verwendet werden, basieren auf relativ wenigen, genetischen oder molekularen Markern und können daher nur eine grobe Unterteilung der Tumoren liefern. Die zunehmende Verfügbarkeit von Multi-Omics-Daten für Krebspatienten ermöglicht die Neudefinition von fundierteren Krebssubtypen, die wiederum zu spezifischeren Behandlungen für Krebspatienten führen könnten. In dieser Dissertation identifizieren wir neue, potentielle Krebssubtypen basierend auf Multi-Omics-Daten. Hierfür verwenden wir unüberwachtes Multiple Kernel Learning, welches in der Lage ist mehrere Datentypen miteinander zu kombinieren. Drei Herausforderungen des unüberwachten Multiple Kernel Learnings werden adressiert: Robustheit, Anwendbarkeit und Interpretierbarkeit. Zunächst zeigen wir, dass die zusätzliche Regularisierung des Multiple Kernel Learning Frameworks zur Implementierung verschiedener Dimensionsreduktionstechniken die Stabilität der identifizierten Patientengruppen erhöht. Diese Robustheit ist besonders vorteilhaft für Datensätze mit einer geringen Anzahl von Proben. Zweitens passen wir die Zielfunktion der kernbasierten Hauptkomponentenanalyse an, um eine integrative Version dieser weit verbreiteten Dimensionsreduktionstechnik zu ermöglichen. Drittens verbessern wir die Interpretierbarkeit von kernbasierten Lernprozeduren, indem wir verwendete Merkmale in homogene Gruppen unterteilen bevor wir die Daten integrieren. Mit Hilfe dieser Gruppen definieren wir eine Bewertungsfunktion, die die weitere Auswertung der biologischen Eigenschaften von Patientengruppen erleichtert. Alle drei Verfahren werden an realen Krebsdaten getestet. Den Vergleich unserer Methodik mit etablierten Methoden weist nach, dass unsere Arbeit neue und nützliche Möglichkeiten bietet, um integrative Patientengruppen zu identifizieren und Einblicke in medizinisch relevante Eigenschaften von Krebssubtypen zu erhalten

    Face Recognition Methodologies Using Component Analysis: The Contemporary Affirmation of The Recent Literature

    Get PDF
    This paper explored the contemporary affirmation of the recent literature in the context of face recognition systems, a review motivated by contradictory claims in the literature. This paper shows how the relative performance of recent claims based on methodologies such as PCA and ICA, which are depend on the task statement. It then explores the space of each model acclaimed in recent literature. In the process, this paper verifies the results of many of the face recognition models in the literature, and relates them to each other and to this work

    Efficient Clustering on Riemannian Manifolds: A Kernelised Random Projection Approach

    Get PDF
    Reformulating computer vision problems over Riemannian manifolds has demonstrated superior performance in various computer vision applications. This is because visual data often forms a special structure lying on a lower dimensional space embedded in a higher dimensional space. However, since these manifolds belong to non-Euclidean topological spaces, exploiting their structures is computationally expensive, especially when one considers the clustering analysis of massive amounts of data. To this end, we propose an efficient framework to address the clustering problem on Riemannian manifolds. This framework implements random projections for manifold points via kernel space, which can preserve the geometric structure of the original space, but is computationally efficient. Here, we introduce three methods that follow our framework. We then validate our framework on several computer vision applications by comparing against popular clustering methods on Riemannian manifolds. Experimental results demonstrate that our framework maintains the performance of the clustering whilst massively reducing computational complexity by over two orders of magnitude in some cases

    K-means based clustering and context quantization

    Get PDF

    Development of a machine learning based methodology for bridge health monitoring

    Get PDF
    Tesi en modalitat de compendi de publicacionsIn recent years the scientific community has been developing new techniques in structural health monitoring (SHM) to identify the damages in civil structures specially in bridges. The bridge health monitoring (BHM) systems serve to reduce overall life-cycle maintenance costs for bridges, as their main objective is to prevent catastrophic failures and damages. In the BHM using dynamic data, there are several problems related to the post-processing of the vibration signals such as: (i) when the modal-based dynamic features like natural frequencies, modes shape and damping are used, they present a limitation in relation to damage location, since they are based on a global response of the structure; (ii) presence of noise in the measurement of vibration responses; (iii) inadequate use of existing algorithms for damage feature extraction because of neglecting the non-linearity and non-stationarity of the recorded signals; (iv) environmental and operational conditions can also generate false damage detections in bridges; (v) the drawbacks of traditional algorithms for processing large amounts of data obtained from the BHM. This thesis proposes new vibration-based parameters and methods with focus on damage detection, localization and quantification, considering a mixed robust methodology that includes signal processing and machine learning methods to solve the identified problems. The increasing volume of bridge monitoring data makes it interesting to study the ability of advanced tools and systems to extract useful information from dynamic and static variables. In the field of Machine Learning (ML) and Artificial Intelligence (AI), powerful algorithms have been developed to face problems where the amount of data is much larger (big data). The possibilities of ML techniques (unsupervised algorithms) were analyzed here in bridges taking into account both operational and environmental conditions. A critical literature review was performed and a deep study of the accuracy and performance of a set of algorithms for detecting damage in three real bridges and one numerical model. In the literature review inherent to the vibration-based damage detection, several state-of-the-art methods have been studied that do not consider the nature of the data and the characteristics of the applied excitation (possible non-linearity, non-stationarity, presence or absence of environmental and/or operational effects) and the noise level of the sensors. Besides, most research uses modal-based damage characteristics that have some limitations. A poor data normalization is performed by the majority of methods and both operational and environmental variability is not properly accounted for. Likewise, the huge amount of data recorded requires automatic procedures with proven capacity to reduce the possibility of false alarms. On the other hand, many investigations have limitations since only numerical or laboratory cases are studied. Therefore, a methodology is proposed by the combination of several algorithms to avoid them. The conclusions show a robust methodology based on ML algorithms capable to detect, localize and quantify damage. It allows the engineers to verify bridges and anticipate significant structural damage when occurs. Moreover, the proposed non-modal parameters show their feasibility as damage features using ambient and forced vibrations. Hilbert-Huang Transform (HHT) in conjunction with Marginal Hilbert Spectrum and Instantaneous Phase Difference shows a great capability to analyze the nonlinear and nonstationary response signals for damage identification under operational conditions. The proposed strategy combines algorithms for signal processing (ICEEMDAN and HHT) and ML (k-means) to conduct damage detection and localization in bridges by using the traffic-induced vibration data in real-time operation.En los últimos años la comunidad científica ha desarrollado nuevas técnicas en monitoreo de salud estructural (SHM) para identificar los daños en estructuras civiles especialmente en puentes. Los sistemas de monitoreo de puentes (BHM) sirven para reducir los costos generales de mantenimiento del ciclo de vida, ya que su principal objetivo es prevenir daños y fallas catastróficas. En el BHM que utiliza datos dinámicos, existen varios problemas relacionados con el procesamiento posterior de las señales de vibración, tales como: (i) cuando se utilizan características dinámicas modales como frecuencias naturales, formas de modos y amortiguamiento, presentan una limitación en relación con la localización del daño, ya que se basan en una respuesta global de la estructura; (ii) presencia de ruido en la medición de las respuestas de vibración; (iii) uso inadecuado de los algoritmos existentes para la extracción de características de daño debido a la no linealidad y la no estacionariedad de las señales registradas; (iv) las condiciones ambientales y operativas también pueden generar falsas detecciones de daños en los puentes; (v) los inconvenientes de los algoritmos tradicionales para procesar grandes cantidades de datos obtenidos del BHM. Esta tesis propone nuevos parámetros y métodos basados en vibraciones con enfoque en la detección, localización y cuantificación de daños, considerando una metodología robusta que incluye métodos de procesamiento de señales y aprendizaje automático. El creciente volumen de datos de monitoreo de puentes hace que sea interesante estudiar la capacidad de herramientas y sistemas avanzados para extraer información útil de variables dinámicas y estáticas. En el campo del Machine Learning (ML) y la Inteligencia Artificial (IA) se han desarrollado potentes algoritmos para afrontar problemas donde la cantidad de datos es mucho mayor (big data). Aquí se analizaron las posibilidades de las técnicas ML (algoritmos no supervisados) teniendo en cuenta tanto las condiciones operativas como ambientales. Se realizó una revisión crítica de la literatura y se llevó a cabo un estudio profundo de la precisión y el rendimiento de un conjunto de algoritmos para la detección de daños en tres puentes reales y un modelo numérico. En la revisión de literatura se han estudiado varios métodos que no consideran la naturaleza de los datos y las características de la excitación aplicada (posible no linealidad, no estacionariedad, presencia o ausencia de efectos ambientales y/u operativos) y el nivel de ruido de los sensores. Además, la mayoría de las investigaciones utilizan características de daño modales que tienen algunas limitaciones. Estos métodos realizan una normalización deficiente de los datos y no se tiene en cuenta la variabilidad operativa y ambiental. Asimismo, la gran cantidad de datos registrados requiere de procedimientos automáticos para reducir la posibilidad de falsas alarmas. Por otro lado, muchas investigaciones tienen limitaciones ya que solo se estudian casos numéricos o de laboratorio. Por ello, se propone una metodología mediante la combinación de varios algoritmos. Las conclusiones muestran una metodología robusta basada en algoritmos de ML capaces de detectar, localizar y cuantificar daños. Permite a los ingenieros verificar puentes y anticipar daños estructurales. Además, los parámetros no modales propuestos muestran su viabilidad como características de daño utilizando vibraciones ambientales y forzadas. La Transformada de Hilbert-Huang (HHT) junto con el Espectro Marginal de Hilbert y la Diferencia de Fase Instantánea muestran una gran capacidad para analizar las señales de respuesta no lineales y no estacionarias para la identificación de daños en condiciones operativas. La estrategia propuesta combina algoritmos para el procesamiento de señales (ICEEMDAN y HHT) y ML (k-means) para detectar y localizar daños en puentes mediante el uso de datos de vibraciones inducidas por el tráfico en tiempo real.Postprint (published version
    corecore