6 research outputs found

    Perplexity-free Parametric t-SNE

    Full text link
    The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm is a ubiquitously employed dimensionality reduction (DR) method. Its non-parametric nature and impressive efficacy motivated its parametric extension. It is however bounded to a user-defined perplexity parameter, restricting its DR quality compared to recently developed multi-scale perplexity-free approaches. This paper hence proposes a multi-scale parametric t-SNE scheme, relieved from the perplexity tuning and with a deep neural network implementing the mapping. It produces reliable embeddings with out-of-sample extensions, competitive with the best perplexity adjustments in terms of neighborhood preservation on multiple data sets.Comment: ESANN 2020 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Online event, 2-4 October 2020, i6doc.com publ., ISBN 978-2-87587-074-2. Available from http://www.i6doc.com/en

    Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences

    Get PDF
    We present a systematic approach to the mathematical treatment of the t-distributed stochastic neighbor embedding (t-SNE) and the stochastic neighbor embedding (SNE) method. This allows an easy adaptation of the methods or exchange of their respective modules. In particular, the divergence which measures the difference between probability distributions in the original and the embedding space can be treated independently from other components like, e.g. the similarity of data points or the data distribution. We focus on the extension for different divergences and propose a general framework based on the consideration of Fréchet-derivatives. This way the general approach can be adapted to the user specific needs.

    Hyperparameter optimized classification pipeline for handling unbalanced urban and rural energy consumption patterns

    Get PDF
    This is the author accepted manuscript. The final version is available from Elsevier via the DOI in this record Data availability: Data will be made available on request.Energy consumer locations are required for framing effective energy policies. However, due to privacy concerns, it is becoming increasingly difficult to obtain the locational data of the consumers. Machine learning (ML) based classification strategies can be used to find the locational information of the consumers based on their historical energy consumption patterns. The ML methods in this paper are applied to the Residential Energy Consumption Survey 2009 dataset. In this dataset, the number of consumers in the urban area is higher than the rural area, thus making the classification problem unbalanced. The unbalanced classification problem has been solved in original and transformed or reduced feature space using Monte Carlo based under-sampling of the majority class datapoints. The hyperparameters for each classification algorithm family is represented as an optimized pipeline, obtained using the genetic programming (GP) optimizer. The classification performance metrics are then obtained for different algorithm families on the original and transformed feature spaces. Performance comparisons have been reported using univariate and bivariate distributions of the classification metrics viz. accuracy, geometric mean score (GMS), F1 score, precision, area under the curve (AUC) of receiver operator characteristics (ROC). The energy policy aspects for the urban and rural residential consumers based on the classification results have also been discussed.European Regional Development Fund (ERDF

    Faithful visualization and dimensionality reduction on graphics processing unit

    Get PDF
    Information visualization is a process of transforming data, information and knowledge to the geometric representation in order to see unseen information. Dimensionality reduction (DR) is one of the strategies used to visualize high-dimensional data sets by projecting them onto low-dimensional space where they can be visualized directly. The problem of DR is that the straightforward relationship between the original highdimensional data sets and low-dimensional space is lost, which causes the colours of visualization to have no meaning. A new nonlinear DR method which is called faithful stochastic proximity embedding (FSPE) is proposed in this thesis to visualize more complex data sets. The proposed method depends on the low-dimensional space rather than the high-dimensional data sets to overcome the main shortcomings of the DR by overcoming the false neighbour points, and preserving the neighbourhood relation to the true neighbours. The visualization by our proposed method displays the faithful, useful and meaningful colours, where the objects of the image can be easily distinguished. The experiments that were conducted indicated that the FSPE is higher in accuracy than many dimension reduction methods because it prevents as much as possible the false neighbourhood errors to occur in the results. In addition, in the results of other methods, we have demonstrated that the FSPE has an important role in enhancing the low-dimensional space which are carried by other DR methods. Choosing the worst efficient points to update the rest of the points has helped in improving the visualization information. The results showed the proposed method has an impacting role in increasing the trustworthiness of the visualization by retrieving most of the local neighbourhood points, which they missed during the projection process. The sequential dimensionality reduction (SDR) method is the second proposed method in this thesis. It redefines the problem of DR as a sequence of multiple DR problems, each of which reduces the dimensionality by a small amount. It maintains and preserves the relations among neighbour points in low-dimensional space. The results showed the accuracy of the proposed SDR, which leads to a better visualization with minimum false colours compared to the direct projection of the DR method, where those results are confirmed by comparing our method with 21 other methods. Although there are many measurement metrics, our proposed point-wise correlation metric is the better. In this metric, we evaluate the efficiency of each point in the visualization to generate a grey-scale efficiency image. This type of image gives more details instead of representing the evaluation in one single value. The user can recognize the location of both the false and the true points. We compared the results of our proposed methods (FSPE and SDR) and many other dimension reduction methods when applied to four scenarios: (1) the unfolding curved cylinder data sets; (2) projecting a human face data sets into two dimensions; (3) classifing connected networks and (4) visualizing a remote sensing imagery data sets. The results showed that our methods are able to produce good visualization by preserving the corresponding colour distances between the visualization and the original data sets. The proposed methods are implemented on the graphic processing unit (GPU) to visualize different data sets. The benefit of a parallel implementation is to obtain the results in as short a time as possible. The results showed that compute unified device architecture (CUDA) implementation of FSPE and SDR are faster than their sequential codes on the central processing unit (CPU) in calculating floating-point operations, especially for a large data sets. The GPU is also more suited to the implementation of the metric measurement methods because they do a large computation. We illustrated that this massive speed-up requires a parallel structure to be suitable for running on a GPU
    corecore