Search CORE

512 research outputs found

Nonlinear Supervised Dimensionality Reduction via Smooth Regular Embeddings

Author: Ornek Cem
Vural Elif
Publication venue
Publication date: 28/05/2018
Field of study

The recovery of the intrinsic geometric structures of data collections is an important problem in data analysis. Supervised extensions of several manifold learning approaches have been proposed in the recent years. Meanwhile, existing methods primarily focus on the embedding of the training data, and the generalization of the embedding to initially unseen test data is rather ignored. In this work, we build on recent theoretical results on the generalization performance of supervised manifold learning algorithms. Motivated by these performance bounds, we propose a supervised manifold learning method that computes a nonlinear embedding while constructing a smooth and regular interpolation function that extends the embedding to the whole data space in order to achieve satisfactory generalization. The embedding and the interpolator are jointly learnt such that the Lipschitz regularity of the interpolator is imposed while ensuring the separation between different classes. Experimental results on several image data sets show that the proposed method outperforms traditional classifiers and the supervised dimensionality reduction algorithms in comparison in terms of classification accuracy in most settings

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

Multi-view Subspace Learning for Large-Scale Multi-Modal Data Analysis

Author: Chumachenko Kateryna
Publication venue
Publication date: 06/09/2019
Field of study

Dimensionality reduction methods play a big role within the modern machine learning techniques, and subspace learning is one of the common approaches to it. Although various methods have been proposed over the past years, many of them suffer from limitations related to the unimodality assumptions on the data and low speed in the cases of high-dimensional data (in linear formulations) or large datasets (in kernel-based formulations). In this work, several methods for overcoming these limitations are proposed. In this thesis, the problem of the large-scale multi-modal data analysis for single- and multi-view data is discussed, and several extensions for Subclass Discriminant Analysis (SDA) are proposed. First, a Spectral Regression Subclass Discriminant Analysis method relying on the Graph Embedding-based formulation of SDA is proposed as a way to reduce the training time, and it is shown how the solution can be obtained efficiently, therefore reducing the computational requirements. Secondly, a novel multi-view formulation for Subclass Discriminant Analysis is proposed, allowing to extend it to data coming from multiple views. Besides, a speed-up approach for the multi-view formulation that allows reducing the computational requirements of the method is proposed. Linear and nonlinear kernel-based formulations are proposed for all the extensions. Experiments are performed on nine single-view and nine multi-view datasets and the accuracy and speed of the proposed extensions are evaluated. Experimentally it is shown that the proposed approaches result in a significant reduction of the training time while providing competitive performance, as compared to other subspace-learning based methods

Trepo - Institutional Repository of Tampere University

Speed-up and multi-view extensions to subclass discriminant analysis

Author: Chumachenko Kateryna
Gabbouj Moncef
Iosifidis Alexandros
Raitoharju Jenni
Publication venue: Elsevier
Publication date: 01/01/2021
Field of study

Highlights • We present a speed-up extension to Subclass Discriminant Analysis. • We propose an extension to SDA for multi-view problems and a fast solution to it. • The proposed approaches result in lower training time and competitive performance.In this paper, we propose a speed-up approach for subclass discriminant analysis and formulate a novel efficient multi-view solution to it. The speed-up approach is developed based on graph embedding and spectral regression approaches that involve eigendecomposition of the corresponding Laplacian matrix and regression to its eigenvectors. We show that by exploiting the structure of the between-class Laplacian matrix, the eigendecomposition step can be substituted with a much faster process. Furthermore, we formulate a novel criterion for multi-view subclass discriminant analysis and show that an efficient solution to it can be obtained in a similar manner to the single-view case. We evaluate the proposed methods on nine single-view and nine multi-view datasets and compare them with related existing approaches. Experimental results show that the proposed solutions achieve competitive performance, often outperforming the existing methods. At the same time, they significantly decrease the training time

Helsingin yliopiston digitaalinen arkisto

Trepo - Institutional Repository of Tampere University

Modified Kernel Marginal Fisher Analysis for Feature Extraction and Its Application to Bearing Fault Diagnosis

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Unified functional network and nonlinear time series analysis for complex systems science: The pyunicorn package

Author: Bastian M.
Boyan Beronov
Cover T. M.
Csárdi G.
Erdős P.
Henk A. Dijkstra
Jakob Runge
Jobst Heitzig
Jonathan F. Donges
Jürgen Kurths
Kantz H.
Liubov Tupikina
Marc Wiedermann
Norbert Marwan
Qing Yi Feng
Reik V. Donner
Schult D. A.
Sprott J. C.
Subramaniyam N. P.
Veronika Stolbova
Publication venue: 'AIP Publishing'
Publication date: 01/01/2015
Field of study

We introduce the \texttt{pyunicorn} (Pythonic unified complex network and recurrence analysis toolbox) open source software package for applying and combining modern methods of data analysis and modeling from complex network theory and nonlinear time series analysis. \texttt{pyunicorn} is a fully object-oriented and easily parallelizable package written in the language Python. It allows for the construction of functional networks such as climate networks in climatology or functional brain networks in neuroscience representing the structure of statistical interrelationships in large data sets of time series and, subsequently, investigating this structure using advanced methods of complex network theory such as measures and models for spatial networks, networks of interacting networks, node-weighted statistics or network surrogates. Additionally, \texttt{pyunicorn} provides insights into the nonlinear dynamics of complex systems as recorded in uni- and multivariate time series from a non-traditional perspective by means of recurrence quantification analysis (RQA), recurrence networks, visibility graphs and construction of surrogate time series. The range of possible applications of the library is outlined, drawing on several examples mainly from the field of climatology.Comment: 28 pages, 17 figure

arXiv.org e-Print Archive

Aberdeen University Research

Crossref

Spiral - Imperial College Digital Repository

Utrecht University Repository

Applications of Optimal Transportation in the Natural Sciences (online meeting)

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2021
Field of study

Concepts and methods from the mathematical theory of optimal transportation have reached significant importance in various fields of the natural sciences. The view on classical problems from a "transport perspective'' has lead to the development of powerful problem-adapted mathematical tools, and sometimes to a novel geometric understanding of the matter. The natural sciences, in turn, are the most important source of ideas for the further development of the optimal transport theory, and are a driving force for the design of efficient and reliable numerical methods to approximate Wasserstein distances and the like. The presentations and discussions in this workshop have been centered around recent analytical results and numerical methods in the field of optimal transportation that have been motivated by specific applications in statistical physics, quantum mechanics, and chemistry

Repositorium für Naturwissenschaften und Technik

Recommended from our members

Interpreting Deep Learning for cell differentiation. Supervised and Unsupervised models viewed through the lens of information and perturbation theory.

Author: Andres Terre Helena
Publication venue: University of Cambridge
Publication date: 05/12/2019
Field of study

"Predicting the future isn't magic, it's artificial intelligence" Dave Waters. In the last decades there has been an unprecedented growth in the field of machine learning, and particularly within deep learning models. The combination of big data and computational power has nurtured the evolution of a variety of new methods to predict and interpret future scenarios. These data centric models can achieve exceptional performances on specific tasks, with their prediction boundaries continuously expanding towards new and more complex challenges. However, the model complexity often translates into a lack of interpretability from a scientific c perspective, it is not trivial to identify the factors involved in final outcomes. Explainability may not always be a requirement for some machine learning tasks, specially when it comes in detriment of performance power. But for some applications, such as biological discoveries or medical diagnostics, understanding the output and determining factors that influence decisions is essential. In this thesis we develop both a supervised and unsupervised approach to map from genotype to phenotype. We emphasise the importance of interpretability and feature extraction from the models, by identifying relevant genes for cell differentiation. We then continue to explore the rules and mechanisms behind the models from a theoretical perspective. Using information theory to explain the learning process and applying perturbation theory to transform the results into a generalisable representation. We start by building a supervised approach to mapping cell profiles from genotype to phenotype, using single cell RNA-Seq data. We leverage non-linearities among gene expressions to identify cellular levels of differentiation. The ambiguity and even absence of labels in most biological studies instigated the development of novel unsupervised techniques, leading to a new general and biologically interpretable framework based on Variational Autoencoders. The application and validation of the methods has proven to be successful, but questions regarding the learning process and generative nature of the results remained unanswered. I use information theory to define a new approach to interpret training and the converged solutions of our models. The variational and generative nature of Autoencoders provides a platform to develop general models. Their results should extrapolate and allow generalisation beyond the boundaries of the observed data. To this extent, we introduce for the first time a new interpretation of the embedded generative functions through Perturbation Theory. The embedding multiplicity is addressed by transforming the distributions into a new set of generalisable functions, while characterising their energy spectrum under a particular energy landscape. We outline the combination of theoretical and machine learning based methods, for moving towards interpretable and generalisable models. Developing a theoretical framework to map from genotype to phenotype, we provide both supervised and unsupervised tools to operate over single cell RNA-Seq. data. We have generated a pipeline to identify relevant genes and cell types through Variational Autoencoders (VAEs), validating reconstructed gene expressions to prove the generative performance of the embeddings. The new interpretation of the information learned and extracted by the models de fines a label independent evaluation, particularly useful for unsupervised learning. Lastly, we introduce a novel transformation of the generative embeddings based on quantum and perturbation theory. Our contributions can and have been extended to new datasets, according to the nature of the tasks being explored. For instance, the combination of unsupervised learning and information theory can be applied to a variety of biological or medical data. We have trained several VAE models with additional cancer and metabolic data, proving to extract meaningful representations of the data. The perturbation theory transformation of the embedding can also lead to future research on the generative potential of Variational Autoencoders through a physics perspective, combining statistical and quantum mechanics. We believe that machine learning will only continue its fast expansion and growth through the development of more generalisable more interpretable models. "Prediction is very difficult, especially if it's about the future" Niels Boh

Apollo (Cambridge)

Integration of Constraints into Dimensionality Reduction Methods for Visualization

Author: Vu Viet Minh
Publication venue
Publication date: 01/12/2021
Field of study

Repository of the University of Namur