17 research outputs found

    Perplexity-free Parametric t-SNE

    Full text link
    The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm is a ubiquitously employed dimensionality reduction (DR) method. Its non-parametric nature and impressive efficacy motivated its parametric extension. It is however bounded to a user-defined perplexity parameter, restricting its DR quality compared to recently developed multi-scale perplexity-free approaches. This paper hence proposes a multi-scale parametric t-SNE scheme, relieved from the perplexity tuning and with a deep neural network implementing the mapping. It produces reliable embeddings with out-of-sample extensions, competitive with the best perplexity adjustments in terms of neighborhood preservation on multiple data sets.Comment: ESANN 2020 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Online event, 2-4 October 2020, i6doc.com publ., ISBN 978-2-87587-074-2. Available from http://www.i6doc.com/en

    Large-scale nonlinear dimensionality reduction for network intrusion detection

    Get PDF
    International audienceNetwork intrusion detection (NID) is a complex classification problem. In this paper, we combine classification with recent and scalable nonlinear dimensionality reduction (NLDR) methods. Classification and DR are not necessarily adversarial, provided adequate cluster magnification occurring in NLDR methods like tt-SNE: DR mitigates the curse of dimensionality, while cluster magnification can maintain class separability. We demonstrate experimentally the effectiveness of the approach by analyzing and comparing results on the big KDD99 dataset, using both NLDR quality assessment and classification rate for SVMs and random forests. Since data involves features of mixed types (numerical and categorical), the use of Gower's similarity coefficient as metric further improves the results over the classical similarity metric

    Human-centered machine learning through interactive visualization

    Get PDF
    The goal of visual analytics (VA) systems is to solve complex problems by integrating automated data analysis methods, such as machine learning (ML) algorithms, with interactive visualizations. We propose a conceptual framework that models human interactions with ML components in the VA process, and makes the crucial interplay between automated algorithms and interactive visualizations more concrete. The framework is illustrated through several examples. We derive three open research challenges at the intersection of ML and visualization research that will lead to more effective data analysis

    Intrinsic Universal Measurements of Non-linear Embeddings

    Full text link
    A basic problem in machine learning is to find a mapping ff from a low dimensional latent space to a high dimensional observation space. Equipped with the representation power of non-linearity, a learner can easily find a mapping which perfectly fits all the observations. However such a mapping is often not considered as good as it is not simple enough and over-fits. How to define simplicity? This paper tries to make such a formal definition of the amount of information imposed by a non-linear mapping. This definition is based on information geometry and is independent of observations, nor specific parametrizations. We prove these basic properties and discuss relationships with parametric and non-parametric embeddings.Comment: work in progres

    Estimación de características relevantes para el monitoreo de condición de motores de combustión interna a partir de señales de vibración

    Get PDF
    Condition monitoring of Internal Combustion Engines (ICE) benefits cost-effective operations in the modern industrial sector. Because of this, vibration signals are commonly monitored as part of a non-invasive approach to ICE analysis. However, vibration-based ICE monitoring poses a challenge due to the properties of this kind of signals. They are highly dynamic and non-stationary, let alone the diverse sources involved in the combustion process. In this paper, we propose a feature relevance estimation strategy for vibration-based ICE analysis. Our approach is divided into three main stages: signal decomposition using an Ensemble Empirical Mode Decomposition algorithm, multi-domain parameter estimation from time and frequency representations, and a supervised feature selection based on the Relief-F technique. Accordingly, we decomposed the vibration signals by using self-adaptive analysis to represent nonlinear and non-stationary time series. Afterwards, time and frequency-based parameters were calculated to code complex and/or non-stationary dynamics. Subsequently, we computed a relevance vector index to measure the contribution of each multi-domain feature to the discrimination of different fuel blend estimation/diagnosis categories for ICE. In particular, we worked with an ICE dataset collected from fuel blends under normal and fault scenarios at different engine speeds to test our approach. Our classification results presented nearly 98% of accuracy after using a k-Nearest Neighbors machine. They reveal the way our approach identifies a relevant subset of features for ICE condition monitoring. One of the benefits is the reduced number of parameters.El monitoreo de condición de motores de combustión interna (MCI) facilita que las operaciones del sector industrial moderno sean más rentables. En este sentido, las señales de vibración comúnmente son empleadas como un enfoque no invasivo para el análisis de MCI. Sin embargo, el monitoreo de MCI basado en vibraciones presenta un desafío relacionado con las propiedades de la señal, la cual es altamente dinámica y noestacionaria, sin mencionar las diversas fuentes presentes durante el proceso de combustión. En este artículo, se propone una estrategia de análisis de relevancia orientada al monitoreo de MCI basado en vibraciones. Este enfoque incorpora tres etapas principales: descomposición de la señal utilizando un algoritmo de Ensemble Empirical Mode Decomposition, estimación de parámetros multi-dominio desde representaciones en tiempo y frecuencia, y una selección supervisada de características basada en Relief-F. Así, las señales de vibración se descomponen utilizando un análisis auto-adaptativo para representar la no-linealidad y no-estacionariedad de las series de tiempo. Luego, para codificar dinámicas complejas y/o no estacionarias, se calculan algunos parámetros en el dominio del tiempo y de la frecuencia. Posteriormente, se calcula un vector de índice de relevancia para cuantificar la contribución de cada una de las características multidominio para discriminar diferentes categorías de estimación de mezcla de combustible y diagnóstico de MCI. Los resultados de clasificación obtenidos (cercanos al 98% de acierto) en una base de datos de MCI, revelan como la propuesta planteada identifica un subconjunto de características relevantes en el monitorio de condición de MCI
    corecore