5 research outputs found

    Parsimonious Mahalanobis Kernel for the Classification of High Dimensional Data

    Full text link
    The classification of high dimensional data with kernel methods is considered in this article. Exploit- ing the emptiness property of high dimensional spaces, a kernel based on the Mahalanobis distance is proposed. The computation of the Mahalanobis distance requires the inversion of a covariance matrix. In high dimensional spaces, the estimated covariance matrix is ill-conditioned and its inversion is unstable or impossible. Using a parsimonious statistical model, namely the High Dimensional Discriminant Analysis model, the specific signal and noise subspaces are estimated for each considered class making the inverse of the class specific covariance matrix explicit and stable, leading to the definition of a parsimonious Mahalanobis kernel. A SVM based framework is used for selecting the hyperparameters of the parsimonious Mahalanobis kernel by optimizing the so-called radius-margin bound. Experimental results on three high dimensional data sets show that the proposed kernel is suitable for classifying high dimensional data, providing better classification accuracies than the conventional Gaussian kernel

    ÇOK DEĞİŞKENLİ AYKIRI DEĞER TESPİTİ İÇİN KLASİK VE DAYANIKLI MAHALANOBİS UZAKLIK ÖLÇÜTLERİ: FİNANSAL VERİ İLE BİR UYGULAMA

    Get PDF
    Çok değişkenli veri setlerinde aykırı değerlerin varlığı anakütle parametre tahminini zorlaştırmakta ve hata varyansını arttırarak kullanılan istatistiki testin gücünü azaltmaktadır. Bu durum, değişkenlerin eşit varyansa ve çok değişkenli normal dağılıma sahip olduğu varsayımlarından sapmalara sebep olmaktadır. Çok değişkenli aykırı değer tespitinde kullanılan tekniklerden biri olan Mahalanobis uzaklığı, aykırı değişkenlere karşı hassas ölçütler olan çok değişkenli ortalamalar ve kovaryans matrisine dayalı olarak hesaplanmakta; çok değişkenli veri setlerinde aykırı gözlemlerin tespitinin engellenmesi veya normal gözlemlerin aykırı gözlem olarak tespit edilmesi problemlerine karşı dayanıklı ölçütlerle de kullanılmaktadır. Bu çalışmada, çok değişkenli aykırı değer tespitinde kullanılan klasik ve dayanıklı Mahalanobis ölçütlerinin aykırı gözlem tespitlerinin karşılaştırılması amaçlanmıştır. Uygulama verisi olarak, Ocak 2013 – Aralık 2017 döneminde New York ve NASDAQ borsasında yatırımcılar tarafından gerçekleştirilen 1.239.507 adet hisse senedi alım ve satım işlemi kullanılmıştır. Aykırı işlemlerin tespitinde miktar ve hacim değişkenleri ele alınarak, her bir işlem için klasik ve dayanıklı ölçütlere dayalı uzaklık skorları hesaplanarak, söz konusu teknikler karşılaştırılmıştır. Çalışma sonucunda, klasik Mahalanobis ölçütü ve En Küçük Hacimli Elipsoid ile tespit edilemeyen maskelenmiş aykırı gözlemlerin, Hızlı Minimum Kovaryans Determinant yöntemiyle tespit edilmiş olduğu; söz konusu yöntemin finans uygulama alanında çok değişkenli veri setlerinde aykırı gözlemlerin tespiti için kullanılabilecek etkin bir veri madenciliği yöntemi olduğu sonucuna ulaşılmıştır

    Scalable and Compact 3D Action Recognition with Approximated RBF Kernel Machines

    Get PDF
    Despite the recent deep learning (DL) revolution, kernel machines still remain powerful methods for action recognition. DL has brought the use of large datasets and this is typically a problem for kernel approaches, which are not scaling up eciently due to kernel Gram matrices. Nevertheless, kernel methods are still attractive and more generally applicable since they can equally manage dierent sizes of the datasets, also in cases where DL techniques show some limitations. This work investigates these issues by proposing an explicit ap- proximated representation that, together with a linear model, is an equivalent, yet scalable, implementation of a kernel machine. Our approximation is directly inspired by the exact feature map that is induced by an RBF Gaussian kernel but, unlike the latter, it is nite dimensional and very compact. We justify the soundness of our idea with a theoretical analysis which proves the unbiasedness of the approximation, and provides a vanishing bound for its variance, which is shown to decrease much rapidly than in alternative methods in the literature. In a broad experimental validation, we assess the superiority of our approximation in terms of 1) ease and speed of training, 2) compactness of the model, and 3) improvements with respect to the state-of-the-art performance

    Parsimonious Mahalanobis Kernel for the Classification of High Dimensional Data

    No full text
    International audienceThe classification of high dimensional data with kernel methods is considered in this paper. Exploiting the emptiness property of high dimensional spaces, a kernel based on the Mahalanobis distance is proposed. The computation of the Mahalanobis distance requires the inversion of a covariance matrix. In high dimensional spaces, the estimated covariance matrix is ill-conditioned and its inversion is unstable or impossible. Using a parsimonious statistical model, namely the High Dimensional Discriminant Analysis model, the specific signal and noise subspaces are estimated for each considered class making the inverse of the class specific covariance matrix explicit and stable, leading to the definition of a parsimonious Mahalanobis kernel. A SVM based framework is used for selecting the hyperparameters of the parsimonious Mahalanobis kernel by optimizing the so-called radius-margin bound. Experimental results on three high dimensional data sets show that the proposed kernel is suitable for classifying high dimensional data, providing better classification accuracies than the conventional Gaussian kernel

    Técnicas basadas en kernel para el análisis de texturas en imagen biomédica

    Get PDF
    [Resumen] En problemas del mundo real es relevante el estudio de la importancia de todas las variables obtenidas de manera que sea posible la eliminación de ruido, es en este punto donde surgen las técnicas de selección de variables. El objetivo de estas técnicas es pues encontrar el subconjunto de variables que describan de la mejor manera posible la información útil contenida en los datos permitiendo mejorar el rendimiento. En espacios de alta dimensionalidad son especialmente interesantes las técnicas basadas en kernel, donde han demostrado una alta eficiencia debido a su capacidad para generalizar en dichos espacios. En este trabajo se realiza una nueva propuesta para el análisis de texturas en imagen biomédica mediante la integración, utilizando técnicas basadas en kernel, de diferentes tipos de datos de textura para la selección de las variables más representativas con el objetivo de mejorar los resultados obtenidos en clasificación y en interpretabilidad de las variables obtenidas. Para validar esta propuesta se ha formalizado un diseño experimental con cuatro fases diferenciadas: extracción y preprocesado de los datos, aprendizaje y selección del mejor modelo asegurando la reproducibilidad de los resultados a la vez que una comparación en condiciones de igualdad.[Resumo] En problemas do mundo real é relevante o estudo da importancia de todas as variables obtidas de maneira que sexa posible a eliminación de ruído, é neste punto onde xorden as técnicas de selección de variables. O obxectivo destas técnicas é pois encontrar o subconxunto de variables que describan do mellor xeito posible a información útil contida nos datos permitindo mellorar o rendemento. En espazos de alta dimensionalidade son especialmente interesantes as técnicas baseadas en kernel, onde demostraron unha alta eficiencia debido á súa capacidade para xeneralizar nos devanditos espazos. Neste traballo realízase unha nova proposta para a análise de texturas en imaxe biomédica mediante a integración, utilizando técnicas baseadas en kernel, de diferentes tipos de datos de textura para a selección das variables máis representativas co obxectivo de mellorar os resultados obtidos en clasificación e en interpretabilidade das variables obtidas. Para validar esta proposta formalizouse un deseño experimental con catro fases diferenciadas: extracción e preprocesar dos datos, aprendizaxe e selección do mellor modelo asegurando a reproducibilidade dos resultados á vez que unha comparación en condicións de igualdade. Utilizáronse imaxes de xeles de electroforese bidimensional.[Abstract] In real-world problems it is of relevance to study the importance of all the variables obtained, so that denoising could be possible, because it is at this point when the variable selection techniques arise. Therefore, these techniques are aimed at finding the subset of variables that describe' in the best possible way the useful information contained in the data, allowing improved performance. In high-dimensional spaces, the kernel-based techniques are of special relevance, as they have demonstrated a high efficiency due to their ability to generalize in these spaces. In this work, a new approach for texture analysis in biomedical imaging is performed by means of integration. For this procedure, kernel-based techniques were used with different types of texture data for the selection of the most representative variables in order to improve the results obtained in classification and interpretability of the obtained variables. To validate this proposal, an experimental design has been concluded, consisting of four different phases: 1) Data extraction; 2) Data pre-processing; 3) Learning and 4) Selection of the best model to ensure the reproducibility of results while making a comparison under conditions of equality. In this regard, two-dimensional electrophoresis gel images have been used
    corecore