7,628 research outputs found

    Clustering Methods for Electricity Consumers: An Empirical Study in Hvaler-Norway

    Get PDF
    The development of Smart Grid in Norway in specific and Europe/US in general will shortly lead to the availability of massive amount of fine-grained spatio-temporal consumption data from domestic households. This enables the application of data mining techniques for traditional problems in power system. Clustering customers into appropriate groups is extremely useful for operators or retailers to address each group differently through dedicated tariffs or customer-tailored services. Currently, the task is done based on demographic data collected through questionnaire, which is error-prone. In this paper, we used three different clustering techniques (together with their variants) to automatically segment electricity consumers based on their consumption patterns. We also proposed a good way to extract consumption patterns for each consumer. The grouping results were assessed using four common internal validity indexes. We found that the combination of Self Organizing Map (SOM) and k-means algorithms produce the most insightful and useful grouping. We also discovered that grouping quality cannot be measured effectively by automatic indicators, which goes against common suggestions in literature.Comment: 12 pages, 3 figure

    SOM-VAE: Interpretable Discrete Representation Learning on Time Series

    Full text link
    High-dimensional time series are common in many domains. Since human cognition is not optimized to work well in high-dimensional spaces, these areas could benefit from interpretable low-dimensional representations. However, most representation learning algorithms for time series data are difficult to interpret. This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time. To address this problem, we propose a new representation learning framework building on ideas from interpretable discrete dimensionality reduction and deep generative modeling. This framework allows us to learn discrete representations of time series, which give rise to smooth and interpretable embeddings with superior clustering performance. We introduce a new way to overcome the non-differentiability in discrete representation learning and present a gradient-based version of the traditional self-organizing map algorithm that is more performant than the original. Furthermore, to allow for a probabilistic interpretation of our method, we integrate a Markov model in the representation space. This model uncovers the temporal transition structure, improves clustering performance even further and provides additional explanatory insights as well as a natural representation of uncertainty. We evaluate our model in terms of clustering performance and interpretability on static (Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST images, a chaotic Lorenz attractor system with two macro states, as well as on a challenging real world medical time series application on the eICU data set. Our learned representations compare favorably with competitor methods and facilitate downstream tasks on the real world data.Comment: Accepted for publication at the Seventh International Conference on Learning Representations (ICLR 2019

    Somoclu: An Efficient Parallel Library for Self-Organizing Maps

    Get PDF
    Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at https://peterwittek.github.io/somoclu

    The self-organizing map as a visual neighbor retrieval method

    Get PDF
    We have recently introduced rigorous goodness criteria for information visualization by posing it as a visual neighbor retrieval problem, where the task is to find proximate high-dimensional data based only on a low-dimensional display. Standard information retrieval criteria such as precision and recall can then be used for information visualization. We introduced an algorithm, Neighbor Retrieval Visualizer (NeRV), to optimize the total cost of retrieval errors. NeRV was shown to outperform alternative methods, but the SOM was not included in the comparison. In empirical experiments of this paper the SOM turns out to be comparable to the best methods in terms of (smoothed) precision but not on recall. On a related measure called trustworthiness, the SOM outperforms all others. Finally, we suggest that for information visualization tasks the free parameters of the SOM could be optimized for information visualization with cross-validation

    Projection-Based Clustering through Self-Organization and Swarm Intelligence

    Get PDF
    It covers aspects of unsupervised machine learning used for knowledge discovery in data science and introduces a data-driven approach to cluster analysis, the Databionic swarm (DBS). DBS consists of the 3D landscape visualization and clustering of data. The 3D landscape enables 3D printing of high-dimensional data structures. The clustering and number of clusters or an absence of cluster structure are verified by the 3D landscape at a glance. DBS is the first swarm-based technique that shows emergent properties while exploiting concepts of swarm intelligence, self-organization and the Nash equilibrium concept from game theory. It results in the elimination of a global objective function and the setting of parameters. By downloading the R package DBS can be applied to data drawn from diverse research fields and used even by non-professionals in the field of data mining

    Clustering of Global Magnetospheric Observations

    Full text link
    The use of supervised methods in space science have demonstrated powerful capability in classification tasks, but unsupervised methods have been less utilized for the clustering of spacecraft observations. We use a combination of unsupervised methods, being principal component analysis, self-organizing maps, and hierarchical agglomerative clustering, to make predictions on if THEMIS and MMS observations occurred in the magnetosphere, magnetosheath, or the solar wind. The resulting predictions are validated visually by analyzing the distribution of predictions and studying individual time series. Particular nodes in the self organizing map are studied to see what data they represent. The capability of deeper hierarchical analysis using this model is briefly explored. Finally, the changes in region prediction can be used to infer magnetopause and bow shock crossings, which can act as an additional method of validation, and are saved for their utility in solar wind validation, understanding magnetopause processes, and the potential to develop a bow shock model.Comment: 36 pages, 22 figure

    Benefits and limits of machine learning for the implicit coordination on SON functions

    Get PDF
    Bedingt durch die Einführung neuer Netzfunktionen in den Mobilfunknetzen der nächsten Generation, z. B. Slicing oder Mehrantennensysteme, sowie durch die Koexistenz mehrerer Funkzugangstechnologien, werden die Optimierungsaufgaben äußerst komplex und erhöhen die OPEX (OPerational EXpenditures). Um den Nutzern Dienste mit wettbewerbsfähiger Dienstgüte (QoS) zu bieten und gleichzeitig die Betriebskosten niedrig zu halten, wurde von den Standardisierungsgremien das Konzept des selbstorganisierenden Netzes (SON) eingeführt, um das Netzmanagement um eine Automatisierungsebene zu erweitern. Es wurden dafür mehrere SON-Funktionen (SFs) vorgeschlagen, um einen bestimmten Netzbereich, wie Abdeckung oder Kapazität, zu optimieren. Bei dem konventionellen Entwurf der SFs wurde jede Funktion als Regler mit geschlossenem Regelkreis konzipiert, der ein lokales Ziel durch die Einstellung bestimmter Netzwerkparameter optimiert. Die Beziehung zwischen mehreren SFs wurde dabei jedoch bis zu einem gewissen Grad vernachlässigt. Daher treten viele widersprüchliche Szenarien auf, wenn mehrere SFs in einem mobilen Netzwerk instanziiert werden. Solche widersprüchlichen Funktionen in den Netzen verschlechtern die QoS der Benutzer und beeinträchtigen die Signalisierungsressourcen im Netz. Es wird daher erwartet, dass eine existierende Koordinierungsschicht (die auch eine Entität im Netz sein könnte) die Konflikte zwischen SFs lösen kann. Da diese Funktionen jedoch eng miteinander verknüpft sind, ist es schwierig, ihre Interaktionen und Abhängigkeiten in einer abgeschlossenen Form zu modellieren. Daher wird maschinelles Lernen vorgeschlagen, um eine gemeinsame Optimierung eines globalen Leistungsindikators (Key Performance Indicator, KPI) so voranzubringen, dass die komplizierten Beziehungen zwischen den Funktionen verborgen bleiben. Wir nennen diesen Ansatz: implizite Koordination. Im ersten Teil dieser Arbeit schlagen wir eine zentralisierte, implizite und auf maschinellem Lernen basierende Koordination vor und wenden sie auf die Koordination zweier etablierter SFs an: Mobility Robustness Optimization (MRO) und Mobility Load Balancing (MLB). Anschließend gestalten wir die Lösung dateneffizienter (d. h. wir erreichen die gleiche Modellleistung mit weniger Trainingsdaten), indem wir eine geschlossene Modellierung einbetten, um einen Teil des optimalen Parametersatzes zu finden. Wir nennen dies einen "hybriden Ansatz". Mit dem hybriden Ansatz untersuchen wir den Konflikt zwischen MLB und Coverage and Capacity Optimization (CCO) Funktionen. Dann wenden wir ihn auf die Koordinierung zwischen MLB, Inter-Cell Interference Coordination (ICIC) und Energy Savings (ES) Funktionen an. Schließlich stellen wir eine Möglichkeit vor, MRO formal in den hybriden Ansatz einzubeziehen, und zeigen, wie der Rahmen erweitert werden kann, um anspruchsvolle Netzwerkszenarien wie Ultra-Reliable Low Latency Communications (URLLC) abzudecken.Due to the introduction of new network functionalities in next-generation mobile networks, e.g., slicing or multi-antenna systems, as well as the coexistence of multiple radio access technologies, the optimization tasks become extremely complex, increasing the OPEX (OPerational EXpenditures). In order to provide services to the users with competitive Quality of Service (QoS) while keeping low operational costs, the Self-Organizing Network (SON) concept was introduced by the standardization bodies to add an automation layer to the network management. Thus, multiple SON functions (SFs) were proposed to optimize a specific network domain, like coverage or capacity. The conventional design of SFs conceived each function as a closed-loop controller optimizing a local objective by tuning specific network parameters. However, the relationship among multiple SFs was neglected to some extent. Therefore, many conflicting scenarios appear when multiple SFs are instantiated in a mobile network. Having conflicting functions in the networks deteriorates the users’ QoS and affects the signaling resources in the network. Thus, it is expected to have a coordination layer (which could also be an entity in the network), conciliating the conflicts between SFs. Nevertheless, due to interleaved linkage among those functions, it is complex to model their interactions and dependencies in a closed form. Thus, machine learning is proposed to drive a joint optimization of a global Key Performance Indicator (KPI), hiding the intricate relationships between functions. We call this approach: implicit coordination. In the first part of this thesis, we propose a centralized, fully-implicit coordination approach based on machine learning (ML), and apply it to the coordination of two well-established SFs: Mobility Robustness Optimization (MRO) and Mobility Load Balancing (MLB). We find that this approach can be applied as long as the coordination problem is decomposed into three functional planes: controllable, environmental, and utility planes. However, the fully-implicit coordination comes at a high cost: it requires a large amount of data to train the ML models. To improve the data efficiency of our approach (i.e., achieving good model performance with less training data), we propose a hybrid approach, which mixes ML with closed-form models. With the hybrid approach, we study the conflict between MLB and Coverage and Capacity Optimization (CCO) functions. Then, we apply it to the coordination among MLB, Inter-Cell Interference Coordination (ICIC), and Energy Savings (ES) functions. With the hybrid approach, we find in one shot, part of the parameter set in an optimal manner, which makes it suitable for dynamic scenarios in which fast response is expected from a centralized coordinator. Finally, we present a manner to formally include MRO in the hybrid approach and show how the framework can be extended to cover challenging network scenarios like Ultra-Reliable Low Latency Communications (URLLC)
    corecore