387 research outputs found

    The Impact of Supervised Manifold Learning on Structure Preserving and Classification Error: A Theoretical Study

    Get PDF
    In recent years, a variety of supervised manifold learning techniques have been proposed to outperform their unsupervised alternative versions in terms of classification accuracy and data structure capturing. Some dissimilarity measures have been used in these techniques to guide the dimensionality reduction process. Their good performance was empirically demonstrated; however, the relevant analysis is still missing. This paper contributes to a theoretical analysis on a) how dissimilarity measures affect maintaining manifold neighbourhood structure, and b) how supervised manifold learning techniques could contribute to the reduction of classification error. This paper also provides a cross-comparison between supervised and unsupervised manifold learning approaches in terms of structure capturing using Kendall’s Tau coefficients and Co-ranking matrices. Four different metrics (including three dissimilarity measures and Euclidean distance) have been considered along with manifold learning methods such as Isomap, t-Stochastic Neighbour Embedding (t-SNE), and Laplacian Eigenmaps (LE), in two datasets: Breast Cancer and Swiss-Roll. This paper concludes that although the dissimilarity measures used in the manifold learning techniques can reduce classification error, they do not learn well or preserve the structure of the hidden manifold in the high dimensional space, but instead, they destroy the structure of the data. Based on the findings of this paper, it is advisable to use supervised manifold learning techniques as a pre-processing step in classification. In addition, it is not advisable to apply supervised manifold learning for visualization purposes since the two-dimensional representation using supervised manifold learning does not improve the preservation of data structure

    Effective and Trustworthy Dimensionality Reduction Approaches for High Dimensional Data Understanding and Visualization

    Get PDF
    In recent years, the huge expansion of digital technologies has vastly increased the volume of data to be explored. Reducing the dimensionality of data is an essential step in data exploration and visualisation. The integrity of a dimensionality reduction technique relates to the goodness of maintaining the data structure. The visualisation of a low dimensional data that has not captured the high dimensional space data structure is untrustworthy. The scale of maintained data structure by a method depends on several factors, such as the type of data considered and tuning parameters. The type of the data includes linear and nonlinear data, and the tuning parameters include the number of neighbours and perplexity. In reality, most of the data under consideration are nonlinear, and the process to tune parameters could be costly since it depends on the number of data samples considered. Currently, the existing dimensionality reduction approaches suffer from the following problems: 1) Only work well with linear data, 2) The scale of maintained data structure is related to the number of data samples considered, and/or 3) Tear problem and false neighbours problem.To deal with all the above-mentioned problems, this research has developed Same Degree Distribution (SDD), multi-SDD (MSDD) and parameter-free SDD approaches , that 1) Saves computational time because its tuning parameter does not 2) Produces more trustworthy visualisation by using degree-distribution that is smooth enough to capture local and global data structure, and 3) Does not suffer from tear and false neighbours problems due to using the same degree-distribution in the high and low dimensional spaces to calculate the similarities between data samples. The developed dimensionality reduction methods are tested with several popu- lar synthetics and real datasets. The scale of the maintained data structure is evaluated using different quality metrics, i.e., Kendall’s Tau coefficient, Trustworthiness, Continuity, LCMC, and Co-ranking matrix. Also, the theoretical analysis of the impact of dissimilarity measure in structure capturing has been supported by simulations results conducted in two different datasets evaluated by Kendall’s Tau and Co-ranking matrix. The SDD, MSDD, and parameter-free SDD methods do not outperform other global methods such as Isomap in data with a large fraction of large pairwise distances, and it remains a further work task. Reducing the computational complexity is another objective for further work

    Improved data visualisation through nonlinear dissimilarity modelling

    Get PDF
    Inherent to state-of-the-art dimension reduction algorithms is the assumption that global distances between observations are Euclidean, despite the potential for altogether non-Euclidean data manifolds. We demonstrate that a non-Euclidean manifold chart can be approximated by implementing a universal approximator over a dictionary of dissimilarity measures, building on recent developments in the field. This approach is transferable across domains such that observations can be vectors, distributions, graphs and time series for instance. Our novel dissimilarity learning method is illustrated with four standard visualisation datasets showing the benefits over the linear dissimilarity learning approach

    Puolivalvottu WLAN-radiokarttojen oppiminen

    Get PDF
    In this thesis a manifold learning method is applied to the problem of WLAN positioning and automatic radio map creation. Due to the nature of WLAN signal strength measurements, a signal map created from raw measurements results in non-linear distance relations between measurement points. These signal strength vectors reside in a high-dimensioned coordinate system. With the help of the so called Isomap-algorithm the dimensionality of this map can be reduced, and thus more easily processed. By embedding position-labeled strategic key points, we can automatically adjust the mapping to match the surveyed environment. The environment is thus learned in a semi-supervised way; gathering training points and embedding them in a two-dimensional manifold gives us a rough mapping of the measured environment. After a calibration phase, where the labeled key points in the training data are used to associate coordinates in the manifold representation with geographical locations, we can perform positioning using the adjusted map. This can be achieved through a traditional supervised learning process, which in our case is a simple nearest neighbors matching of a sampled signal strength vector. We deployed this system in two locations in the Kumpula campus in Helsinki, Finland. Results indicate that positioning based on the learned radio map can achieve good accuracy, especially in hallways or other areas in the environment where the WLAN signal is constrained by obstacles such as walls.Työssä sovelletaan monisto-oppimismenetelmää WLAN-paikannuksen ja automaattisen radiokartan luonnin ongelmaan. WLAN-signaalivoimakkuuksien mittausten luonteen takia käsittelemättömät mittaukset aiheuttavat epälineaarisia suhteita radiokartan mittauspisteiden välille. Nämä signaalivoimakkuusvektorit sijaitsevat avaruudessa jolla on korkea ulottuvuus. Niin kutsutun Isomap-algoritmin avulla kartan ulottuvuuksia voidaan karsia, jolloin sitä on helpompi työstää. Upottamalla karttaan merkittyjä avainpisteitä, se voidaan automaattisesti säätää vastaamaan mitattua ympäristöä. Ympäristö siis opitaan puolivalvotusti; keräämällä harjoituspisteitä ja upottamalla ne kaksiulotteiseen monistoon saadaan karkea kartta ympäristöstä. Kalibrointivaiheen jälkeen, jossa merkittyjä avainpisteitä käytetään yhdistämään moniston koordinaatit maantieteellisiin kohteisiin, voidaan suorittaa paikannusta säädetyn kartan avulla. Tämä voidaan tehdä perinteisen valvotun oppimisen avulla, joka tässä tapauksessa on yksinkertainen lähimmän naapurin löytäminen mitatulle signaalivoimakkuusvektorille. Järjestelmää kokeiltiin kahdessa paikassa Kumpulan kampuksessa Helsingissä. Tulokset viittaavat siihen että opitun radiokartan avulla paikannus voi saavuttaa hyvän tarkkuuden, etenkin käytävissä ja muissa tiloissa jossa esteet kuten seinät rajoittavat WLAN-signaalia
    corecore